Oracle SQL : Matching Prefix and Suffix string but only accepting whats in between

Printer-friendly versionPDF version
regex replace

A while ago I wrote an article about regex lookahead and lookbehind here. I needed the same thing now in SQL, specifically Oracle flavor. Turns out that Oracle supports the POSIX BRE syntax and POSIX ERE syntax plus PERL influenced extensions, but no look ahead look behind equivalent regex syntax.

So the problem we have to solve is the same. I have a String that is ";" seperated key value pairs that might look like below:

;key1=val1;key2=val2; ... keyN=valN;

Now the probem we want to solve is to extract the value part only and remove the prefix part "keyN". The key part to match is known and the value part for me is digit based.

So now lets take our first crack at this problem. One solution might look like below :

With keyValuePairsTable( keyValuePairs ) as ( select ';key1=123;key2=456;key3=789;' from dual ) 
select REGEXP_REPLACE(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', '\3') val1, REGEXP_REPLACE(keyValuePairs,'.*(key2=)+([0-9]+)(;)*.*', '\2') val2  from keyValuePairsTable;

VAL1 VAL2
---- ----
123  456  


So what did we do above, we divided our problem into 5 capturing groups, any string in beginning "(.)*", than our key1 part "(key1=)+" which has to occur atleast 1 time, than the digit value part "[0-9]+", than the semicolon 0 or more and than lastly the last group is again any 0 or more characters. Than if the match takes place, pick out the 3rd matching group that we want.

So that seems like it or is it ? Well not exactly, after you have your initial solution you need to check those pesty boundary conditions and non-condition, if thats even a word. So see what happens below when I change the test string to not have key1 in it, the first test:

With keyValuePairsTable( keyValuePairs ) as ( select ';key2=456;key3=789;' from dual ) 
select REGEXP_REPLACE(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', '\3') val1, REGEXP_REPLACE(keyValuePairs,'.*(key2=)+([0-9]+)(;)*.*', '\2') val2  from keyValuePairsTable;

VAL1                VAL2
------------------- ----
;key2=456;key3=789; 456 

 

Final Solution

Not exactly what we wanted did we. When a certain matching group is missing completely, in our case key1 does not match and our first group swallows up the whole string matching the whole String. Time to modify our solution as below and run a battary of tests that show that this is indeed now our final solution:

With keyValuePairsTable( keyValuePairs ) as ( select ';key1=123;key2=456;key3=789;' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1 VAL2
---- ----
123  456  

-- Semicolons missing at various location
With keyValuePairsTable( keyValuePairs ) as ( select 'key1=123;key2=456;key3=789' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1 VAL2
---- ----
123  456  

With keyValuePairsTable( keyValuePairs ) as ( select ';key1=123;key2=456;key3=789' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1 VAL2
---- ----
123  456  

With keyValuePairsTable( keyValuePairs ) as ( select 'key1=123;key2=456;key3=789;' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;
VAL1 VAL2
---- ----
123  456  

-- Key1 and value missing
With keyValuePairsTable( keyValuePairs ) as ( select 'key2=456;key3=789' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1  VAL2
---- ----
null     456  

-- Malformed key value pair
With keyValuePairsTable( keyValuePairs ) as ( select '=123;key2=456;key3=789;' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1  VAL2
---- ----
null     456  

With keyValuePairsTable( keyValuePairs ) as ( select 'key1123;key3=789' from dual ) 
select REGEXP_SUBSTR(keyValuePairs,'(.)*(key1=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val1, REGEXP_SUBSTR(keyValuePairs,'(.)*(key2=)+([0-9]+)(;)*(.)*', 1,1,'c','3') val2  from keyValuePairsTable;

VAL1  VAL2
---- ----
null     null

Now thats that.

Top level category:

Comments

Hi, I am very impressed about your blog. It is really helped me a lot. Your article about the topic Matching Prefix and Suffix string but only accepting whats in between is really a nice article and You are explained really well. I am doing some web developments and one of my latest web is of kochi airport taxi. I think this seems something complex but when reading each lines carefully I could understand each of them and I think it was good codes. I will make a try. Thank you so much for sharing this article. Regards, Thomas

Add new comment