Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
Date
Msg-id CAM2+6=WtoQkTv_vyTNrDw-PC=XheFz8KK1Ng6udS+nK8JAfMCg@mail.gmail.com
Whole thread Raw
In response to REGEXP_MATCHES() strange behavior with '^' and '$' pattern  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Responses Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
List pgsql-hackers
Oops forgot patch.

Attached now.


On Wed, Jul 31, 2013 at 6:03 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi,

While playing with regular expression I found some strange behavior of
regexp_matches() function.

Consider following sql query and its output:

postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');
 regexp_matches
----------------
 {""}
 {""}
 {""}
 {""}
 {""}
 {""}
 {""}
(7 rows)


It suppose to return me 4 rows and not 7. Similar behavior found with
pattern '$'.

It seems that these start and end anchor characters are not matching
correctly. Or rather they are matching twice.

To get a root cause of it, I put elog(INFO,..) into the
setup_regexp_matches() function where we copy matches into the struct and
found following values.


postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3' || chr(10) || '4', '^', 'mg');
INFO:  start_search: 0  rm_so: 0  rm_eo: 0
INFO:  updated start_search: 1
INFO:  start_search: 1  rm_so: 2  rm_eo: 2
INFO:  updated start_search: 2
INFO:  start_search: 2  rm_so: 2  rm_eo: 2
INFO:  updated start_search: 3
INFO:  start_search: 3  rm_so: 4  rm_eo: 4
INFO:  updated start_search: 4
INFO:  start_search: 4  rm_so: 4  rm_eo: 4
INFO:  updated start_search: 5
INFO:  start_search: 5  rm_so: 6  rm_eo: 6
INFO:  updated start_search: 6
INFO:  start_search: 6  rm_so: 6  rm_eo: 6
INFO:  updated start_search: 7


Certainly, after second pass, updated start_search should be 3 as last
matched pattern was at 2 and of zero length since so = eo.

I have modified that logic to look similar as that of replace_text_regexp()
function. As regexp_replace works well.

Attached patch with test-case. Please have a look and let me know if I
assumed something wrong.

Thanks

--
Jeevan B Chalke




--
Jeevan B Chalke

Attachment

pgsql-hackers by date:

Previous
From: Jeevan Chalke
Date:
Subject: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
Next
From: "MauMau"
Date:
Subject: Re: [9.3 bug] disk space in pg_xlog increases during archive recovery