Thread: regexp_matches bug in 9.3.4 and 9.4.1

regexp_matches bug in 9.3.4 and 9.4.1

From
Jeff Certain
Date:
Repro:
drop table if exists regex_test;
create table regex_test(filter citext);
insert into regex_test(filter) values ('Filter Filter Filter');
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from r=
egex_test;

Expected result:
Both select statements should return the same number of rows. Specifically,=
 in this case, I expect to get back 3 rows.

Actual result:
When referencing the filter column, only one row is ever returned. When sup=
plying the string that is the same as the data in the filter column, the co=
rrect number of rows are returned.

Result 1 (incorrect):
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
Filter Filter Filter               {Filter}

Result 2 (correct):
select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from r=
egex_test;
Filter Filter Filter               {Filter}
Filter Filter Filter               {Filter}
Filter Filter Filter               {Filter}

Versions
PostgreSQL 9.3.4, compiled by Visual C++ build 1600, 64-bit
PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 2=
0120313 (Red Hat 4.4.7-11), 64-bit

Re: regexp_matches bug in 9.3.4 and 9.4.1

From
"David G. Johnston"
Date:
Please note...

On Mon, May 4, 2015 at 9:56 AM, Jeff Certain <gcertain@dynamicsignal.com>
wrote:

>  Repro:
>
> drop table if exists regex_test;
>
> create table regex_test(filter citext);
>
> insert into regex_test(filter) values ('Filter Filter Filter');
>
select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
>

=E2=80=8B"filter" is of type "citext"=E2=80=8B

 select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from
> regex_test;
>
>
>

=E2=80=8B'Filter Filter Filter' is of type "text"=E2=80=8B

 Expected result:
>
> Both select statements should return the same number of rows.
> Specifically, in this case, I expect to get back 3 rows.
>
>
>

=E2=80=8BThe following non-citext version of your first query works just fi=
ne on 9.3

=E2=80=8Bselect filter, regexp_matches(filter, 'Filter', 'g') from (values =
('Filter
Filter Filter'::text)) regex_test (filter);

=E2=80=8Bregexp_matches is not mentioned on the following page in the docum=
entation:

=E2=80=8Bhttp://www.postgresql.org/docs/9.4/interactive/citext.html



>  Actual result:
>
> When referencing the filter column, only one row is ever returned. When
> supplying the string that is the same as the data in the filter column, t=
he
> correct number of rows are returned.
>
>
>
> Result 1 (incorrect):
>
> select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
>
> Filter Filter Filter               {Filter}
>
>
>
> Result 2 (correct):
>
> select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from
> regex_test;
>
> Filter Filter Filter               {Filter}
>
> Filter Filter Filter               {Filter}
>
> Filter Filter Filter               {Filter}
>

=E2=80=8BThis fails on 9.3 if the citext type is used instead of text.

Seems the however the citext version of regexp_matches is invoked it
ignores the global flag. Haven't explored deeper.

David J.
=E2=80=8B

Re: regexp_matches bug in 9.3.4 and 9.4.1

From
"David G. Johnston"
Date:
Please don't top-post...

On Mon, May 4, 2015 at 3:41 PM, Jeff Certain <gcertain@dynamicsignal.com>
wrote:

>  Thanks for the quick response.
>
>
>
> From my end of things, I would expect either a) an exception; b) 3
> matches; or c) 0 matches. The current behavior (returning one match and
> ignoring the global flag) strikes me as being misleading and challenging =
to
> debug.
>
>
>
=E2=80=8BUnless someone tries to argue that this is not a bug the solution =
will be
"b) 3 matches".  I stopped where I did because I'm not a -hacker.

David J.
=E2=80=8B

Re: regexp_matches bug in 9.3.4 and 9.4.1

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Seems the however the citext version of regexp_matches is invoked it
> ignores the global flag. Haven't explored deeper.

Hm ...

regression=# create extension citext;
CREATE EXTENSION
regression=# \df regexp_matches
                               List of functions
   Schema   |      Name      | Result data type | Argument data types  |  Type
------------+----------------+------------------+----------------------+--------
 pg_catalog | regexp_matches | SETOF text[]     | text, text           | normal
 pg_catalog | regexp_matches | SETOF text[]     | text, text, text     | normal
 public     | regexp_matches | text[]           | citext, citext       | normal
 public     | regexp_matches | text[]           | citext, citext, text | normal
(4 rows)

Looks like somebody just fat-fingered the declaration of citext's
regexp_matches() functions: they should be marked proretset and
they ain't.  Sigh.  I wonder if any of the other ones are wrong.

            regards, tom lane

Re: regexp_matches bug in 9.3.4 and 9.4.1

From
Jeff Certain
Date:
VGhhbmtzIGZvciB0aGUgcXVpY2sgcmVzcG9uc2UuDQoNCkZyb20gbXkgZW5kIG9mIHRoaW5ncywg
SSB3b3VsZCBleHBlY3QgZWl0aGVyIGEpIGFuIGV4Y2VwdGlvbjsgYikgMyBtYXRjaGVzOyBvciBj
KSAwIG1hdGNoZXMuIFRoZSBjdXJyZW50IGJlaGF2aW9yIChyZXR1cm5pbmcgb25lIG1hdGNoIGFu
ZCBpZ25vcmluZyB0aGUgZ2xvYmFsIGZsYWcpIHN0cmlrZXMgbWUgYXMgYmVpbmcgbWlzbGVhZGlu
ZyBhbmQgY2hhbGxlbmdpbmcgdG8gZGVidWcuDQoNCkplZmYNCg0KDQpGcm9tOiBEYXZpZCBHLiBK
b2huc3RvbiBbbWFpbHRvOmRhdmlkLmcuam9obnN0b25AZ21haWwuY29tXQ0KU2VudDogTW9uZGF5
LCBNYXkgNCwgMjAxNSAzOjM3IFBNDQpUbzogSmVmZiBDZXJ0YWluDQpDYzogcGdzcWwtYnVnc0Bw
b3N0Z3Jlc3FsLm9yZw0KU3ViamVjdDogUmU6IFtCVUdTXSByZWdleHBfbWF0Y2hlcyBidWcgaW4g
OS4zLjQgYW5kIDkuNC4xDQoNClBsZWFzZSBub3RlLi4uDQoNCk9uIE1vbiwgTWF5IDQsIDIwMTUg
YXQgOTo1NiBBTSwgSmVmZiBDZXJ0YWluIDxnY2VydGFpbkBkeW5hbWljc2lnbmFsLmNvbTxtYWls
dG86Z2NlcnRhaW5AZHluYW1pY3NpZ25hbC5jb20+PiB3cm90ZToNClJlcHJvOg0KZHJvcCB0YWJs
ZSBpZiBleGlzdHMgcmVnZXhfdGVzdDsNCmNyZWF0ZSB0YWJsZSByZWdleF90ZXN0KGZpbHRlciBj
aXRleHQpOw0KaW5zZXJ0IGludG8gcmVnZXhfdGVzdChmaWx0ZXIpIHZhbHVlcyAoJ0ZpbHRlciBG
aWx0ZXIgRmlsdGVyJyk7DQpzZWxlY3QgZmlsdGVyLCByZWdleHBfbWF0Y2hlcyhmaWx0ZXIsICdG
aWx0ZXInLCAnZycpIGZyb20gcmVnZXhfdGVzdDsNCg0K4oCLImZpbHRlciIgaXMgb2YgdHlwZSAi
Y2l0ZXh0IuKAiw0KDQpzZWxlY3QgZmlsdGVyLCByZWdleHBfbWF0Y2hlcygnRmlsdGVyIEZpbHRl
ciBGaWx0ZXInLCAnRmlsdGVyJywgJ2cnKSBmcm9tIHJlZ2V4X3Rlc3Q7DQoNCg0K4oCLJ0ZpbHRl
ciBGaWx0ZXIgRmlsdGVyJyBpcyBvZiB0eXBlICJ0ZXh0IuKAiw0KDQpFeHBlY3RlZCByZXN1bHQ6
DQpCb3RoIHNlbGVjdCBzdGF0ZW1lbnRzIHNob3VsZCByZXR1cm4gdGhlIHNhbWUgbnVtYmVyIG9m
IHJvd3MuIFNwZWNpZmljYWxseSwgaW4gdGhpcyBjYXNlLCBJIGV4cGVjdCB0byBnZXQgYmFjayAz
IHJvd3MuDQoNCg0K4oCLVGhlIGZvbGxvd2luZyBub24tY2l0ZXh0IHZlcnNpb24gb2YgeW91ciBm
aXJzdCBxdWVyeSB3b3JrcyBqdXN0IGZpbmUgb24gOS4zDQoNCuKAi3NlbGVjdCBmaWx0ZXIsIHJl
Z2V4cF9tYXRjaGVzKGZpbHRlciwgJ0ZpbHRlcicsICdnJykgZnJvbSAodmFsdWVzICgnRmlsdGVy
IEZpbHRlciBGaWx0ZXInOjp0ZXh0KSkgcmVnZXhfdGVzdCAoZmlsdGVyKTsNCg0K4oCLcmVnZXhw
X21hdGNoZXMgaXMgbm90IG1lbnRpb25lZCBvbiB0aGUgZm9sbG93aW5nIHBhZ2UgaW4gdGhlIGRv
Y3VtZW50YXRpb246DQoNCuKAi2h0dHA6Ly93d3cucG9zdGdyZXNxbC5vcmcvZG9jcy85LjQvaW50
ZXJhY3RpdmUvY2l0ZXh0Lmh0bWwNCg0KDQpBY3R1YWwgcmVzdWx0Og0KV2hlbiByZWZlcmVuY2lu
ZyB0aGUgZmlsdGVyIGNvbHVtbiwgb25seSBvbmUgcm93IGlzIGV2ZXIgcmV0dXJuZWQuIFdoZW4g
c3VwcGx5aW5nIHRoZSBzdHJpbmcgdGhhdCBpcyB0aGUgc2FtZSBhcyB0aGUgZGF0YSBpbiB0aGUg
ZmlsdGVyIGNvbHVtbiwgdGhlIGNvcnJlY3QgbnVtYmVyIG9mIHJvd3MgYXJlIHJldHVybmVkLg0K
DQpSZXN1bHQgMSAoaW5jb3JyZWN0KToNCnNlbGVjdCBmaWx0ZXIsIHJlZ2V4cF9tYXRjaGVzKGZp
bHRlciwgJ0ZpbHRlcicsICdnJykgZnJvbSByZWdleF90ZXN0Ow0KRmlsdGVyIEZpbHRlciBGaWx0
ZXIgICAgICAgICAgICAgICB7RmlsdGVyfQ0KDQpSZXN1bHQgMiAoY29ycmVjdCk6DQpzZWxlY3Qg
ZmlsdGVyLCByZWdleHBfbWF0Y2hlcygnRmlsdGVyIEZpbHRlciBGaWx0ZXInLCAnRmlsdGVyJywg
J2cnKSBmcm9tIHJlZ2V4X3Rlc3Q7DQpGaWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAg
IHtGaWx0ZXJ9DQpGaWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAgIHtGaWx0ZXJ9DQpG
aWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAgIHtGaWx0ZXJ9DQoNCuKAi1RoaXMgZmFp
bHMgb24gOS4zIGlmIHRoZSBjaXRleHQgdHlwZSBpcyB1c2VkIGluc3RlYWQgb2YgdGV4dC4NCg0K
U2VlbXMgdGhlIGhvd2V2ZXIgdGhlIGNpdGV4dCB2ZXJzaW9uIG9mIHJlZ2V4cF9tYXRjaGVzIGlz
IGludm9rZWQgaXQgaWdub3JlcyB0aGUgZ2xvYmFsIGZsYWcuIEhhdmVuJ3QgZXhwbG9yZWQgZGVl
cGVyLg0KDQpEYXZpZCBKLg0K4oCLDQoNCg==

Re: regexp_matches bug in 9.3.4 and 9.4.1

From
Tom Lane
Date:
Jeff Certain <gcertain@dynamicsignal.com> writes:
> create table regex_test(filter citext);
> insert into regex_test(filter) values ('Filter Filter Filter');
> select filter, regexp_matches(filter, 'Filter', 'g') from regex_test;
> select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test;

> Expected result:
> Both select statements should return the same number of rows. Specifically, in this case, I expect to get back 3
rows.

> Actual result:
> When referencing the filter column, only one row is ever returned. When supplying the string that is the same as the
datain the filter column, the correct number of rows are returned. 

Fixed, although after discussion we concluded that we should not change
the default behavior in the back branches.  See commit log message:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b1ec45994e5108d734c45876c25593823fcf8644

Thanks for the report!

            regards, tom lane