REGEXP_MATCHES() strange behavior with '^' and '$' pattern - Mailing list pgsql-hackers

From Jeevan Chalke
Subject REGEXP_MATCHES() strange behavior with '^' and '$' pattern
Date
Msg-id CAM2+6=U6_WxwoDn=UOL7PdadRyAZYV6QLox-FJJwrkTEZS5RJg@mail.gmail.com
Whole thread Raw
Responses Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
List pgsql-hackers
<div dir="ltr">Hi,<br /><br />While playing with regular expression I found some strange behavior of<br
/>regexp_matches()function.<br /><br />Consider following sql query and its output:<br /><br /><font size="1"><span
style="font-family:couriernew,monospace">postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' ||
chr(10)|| '4', '^', 'mg');<br />  regexp_matches <br />----------------<br /> {""}<br /> {""}<br /> {""}<br /> {""}<br
/> {""}<br/> {""}<br /> {""}<br />(7 rows)</span></font><br /><br />It suppose to return me 4 rows and not 7. Similar
behaviorfound with<br /> pattern '$'.<br /><br />It seems that these start and end anchor characters are not
matching<br/>correctly. Or rather they are matching twice.<br /><br />To get a root cause of it, I put elog(INFO,..)
intothe<br />setup_regexp_matches() function where we copy matches into the struct and<br /> found following values.<br
/><br/><br /><font size="1"><span style="font-family:courier new,monospace">postgres=# select regexp_matches('1' ||
chr(10)|| '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');<br /> INFO:  start_search: 0  rm_so: 0  rm_eo: 0<br
/>INFO: updated start_search: 1<br />INFO:  start_search: 1  rm_so: 2  rm_eo: 2<br />INFO:  updated start_search: 2<br
/>INFO: start_search: 2  rm_so: 2  rm_eo: 2<br />INFO:  updated start_search: 3<br /> INFO:  start_search: 3  rm_so: 4 
rm_eo:4<br />INFO:  updated start_search: 4<br />INFO:  start_search: 4  rm_so: 4  rm_eo: 4<br />INFO:  updated
start_search:5<br />INFO:  start_search: 5  rm_so: 6  rm_eo: 6<br />INFO:  updated start_search: 6<br /> INFO: 
start_search:6  rm_so: 6  rm_eo: 6<br />INFO:  updated start_search: 7</span></font><br /><br />Certainly, after second
pass,updated start_search should be 3 as last<br />matched pattern was at 2 and of zero length since so = eo.<br /><br
/>Ihave modified that logic to look similar as that of replace_text_regexp()<br />function. As regexp_replace works
well.<br/><br />Attached patch with test-case. Please have a look and let me know if I<br />assumed something wrong.<br
/><br/>Thanks<br /><br />-- <br />Jeevan B Chalke<br /><br /></div> 

pgsql-hackers by date:

Previous
From: Pavel Golub
Date:
Subject: Error message for CREATE VIEW is confusing
Next
From: Jeevan Chalke
Date:
Subject: Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern