Regex Lookahead and Lookbehind : Match Prefix and Suffix string but only accepting anything in between

Printer-friendly versionPDF version
Reg ex, lookahead, lookbehind

So there is a little known construct in regex of lookahead and lookbehind, that java regex also supports atleast upwards from java 5. This is exactly what you need if the desired match lives between a known prefix and suffix String. Or depending on if you have a prefix you can use the lookbehind construct and if you have a suffix only, you can use the lookahead construct.

So the lookbehind is represented by (?<=X) and lookahead is represented by (?=X).

An example of a domain problem we are trying to solve, could be the following. I have a long String of the form parameter values seperated by semicolons. For e.g "param1=val1;param2=val2;param3=val3 ..... paramN=valN". And I want to extract all the strings that represent val1, val2, val3 ... valN as a list. These Strings that I need to extract are again exclusively in the Latin Numeral form.

The lookahead and lookbehind pattern that would solve this issue would than look like this :

(?<=\bparam=\b"+ ")([0-9]+)(?=;)

Or as a java String literal in your program as below where paramText is a Java variable that has the value of "param":

(?<=\\b" + paramText + "=\\b"+ ")([0-9]+)(?=;)

Now to make the above fly, you need to add a couple of more things. For one if the String accidently begins with a semi-colon, we should still be able to get our match and if the String does not really end with a semi-colon (which is most probably the case), you should still get your last match. Modifying for these 2 requirements, our java regex than looks like this.

(;|(?<=\\b" + paramText + "=\\b"+ "))([0-9]+)((?=;)|$)

This technique I used in conjunction with a problem I had to solve in ADF about creating a view criteria that was equivalent to an sql IN clause. But that is another article, you can read about that here.






Top level category:

Add new comment