Thread: regexp_matches bug in 9.3.4 and 9.4.1
Repro: drop table if exists regex_test; create table regex_test(filter citext); insert into regex_test(filter) values ('Filter Filter Filter'); select filter, regexp_matches(filter, 'Filter', 'g') from regex_test; select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from r= egex_test; Expected result: Both select statements should return the same number of rows. Specifically,= in this case, I expect to get back 3 rows. Actual result: When referencing the filter column, only one row is ever returned. When sup= plying the string that is the same as the data in the filter column, the co= rrect number of rows are returned. Result 1 (incorrect): select filter, regexp_matches(filter, 'Filter', 'g') from regex_test; Filter Filter Filter {Filter} Result 2 (correct): select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from r= egex_test; Filter Filter Filter {Filter} Filter Filter Filter {Filter} Filter Filter Filter {Filter} Versions PostgreSQL 9.3.4, compiled by Visual C++ build 1600, 64-bit PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 2= 0120313 (Red Hat 4.4.7-11), 64-bit
Please note... On Mon, May 4, 2015 at 9:56 AM, Jeff Certain <gcertain@dynamicsignal.com> wrote: > Repro: > > drop table if exists regex_test; > > create table regex_test(filter citext); > > insert into regex_test(filter) values ('Filter Filter Filter'); > select filter, regexp_matches(filter, 'Filter', 'g') from regex_test; > =E2=80=8B"filter" is of type "citext"=E2=80=8B select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from > regex_test; > > > =E2=80=8B'Filter Filter Filter' is of type "text"=E2=80=8B Expected result: > > Both select statements should return the same number of rows. > Specifically, in this case, I expect to get back 3 rows. > > > =E2=80=8BThe following non-citext version of your first query works just fi= ne on 9.3 =E2=80=8Bselect filter, regexp_matches(filter, 'Filter', 'g') from (values = ('Filter Filter Filter'::text)) regex_test (filter); =E2=80=8Bregexp_matches is not mentioned on the following page in the docum= entation: =E2=80=8Bhttp://www.postgresql.org/docs/9.4/interactive/citext.html > Actual result: > > When referencing the filter column, only one row is ever returned. When > supplying the string that is the same as the data in the filter column, t= he > correct number of rows are returned. > > > > Result 1 (incorrect): > > select filter, regexp_matches(filter, 'Filter', 'g') from regex_test; > > Filter Filter Filter {Filter} > > > > Result 2 (correct): > > select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from > regex_test; > > Filter Filter Filter {Filter} > > Filter Filter Filter {Filter} > > Filter Filter Filter {Filter} > =E2=80=8BThis fails on 9.3 if the citext type is used instead of text. Seems the however the citext version of regexp_matches is invoked it ignores the global flag. Haven't explored deeper. David J. =E2=80=8B
Please don't top-post... On Mon, May 4, 2015 at 3:41 PM, Jeff Certain <gcertain@dynamicsignal.com> wrote: > Thanks for the quick response. > > > > From my end of things, I would expect either a) an exception; b) 3 > matches; or c) 0 matches. The current behavior (returning one match and > ignoring the global flag) strikes me as being misleading and challenging = to > debug. > > > =E2=80=8BUnless someone tries to argue that this is not a bug the solution = will be "b) 3 matches". I stopped where I did because I'm not a -hacker. David J. =E2=80=8B
"David G. Johnston" <david.g.johnston@gmail.com> writes: > Seems the however the citext version of regexp_matches is invoked it > ignores the global flag. Haven't explored deeper. Hm ... regression=# create extension citext; CREATE EXTENSION regression=# \df regexp_matches List of functions Schema | Name | Result data type | Argument data types | Type ------------+----------------+------------------+----------------------+-------- pg_catalog | regexp_matches | SETOF text[] | text, text | normal pg_catalog | regexp_matches | SETOF text[] | text, text, text | normal public | regexp_matches | text[] | citext, citext | normal public | regexp_matches | text[] | citext, citext, text | normal (4 rows) Looks like somebody just fat-fingered the declaration of citext's regexp_matches() functions: they should be marked proretset and they ain't. Sigh. I wonder if any of the other ones are wrong. regards, tom lane
VGhhbmtzIGZvciB0aGUgcXVpY2sgcmVzcG9uc2UuDQoNCkZyb20gbXkgZW5kIG9mIHRoaW5ncywg SSB3b3VsZCBleHBlY3QgZWl0aGVyIGEpIGFuIGV4Y2VwdGlvbjsgYikgMyBtYXRjaGVzOyBvciBj KSAwIG1hdGNoZXMuIFRoZSBjdXJyZW50IGJlaGF2aW9yIChyZXR1cm5pbmcgb25lIG1hdGNoIGFu ZCBpZ25vcmluZyB0aGUgZ2xvYmFsIGZsYWcpIHN0cmlrZXMgbWUgYXMgYmVpbmcgbWlzbGVhZGlu ZyBhbmQgY2hhbGxlbmdpbmcgdG8gZGVidWcuDQoNCkplZmYNCg0KDQpGcm9tOiBEYXZpZCBHLiBK b2huc3RvbiBbbWFpbHRvOmRhdmlkLmcuam9obnN0b25AZ21haWwuY29tXQ0KU2VudDogTW9uZGF5 LCBNYXkgNCwgMjAxNSAzOjM3IFBNDQpUbzogSmVmZiBDZXJ0YWluDQpDYzogcGdzcWwtYnVnc0Bw b3N0Z3Jlc3FsLm9yZw0KU3ViamVjdDogUmU6IFtCVUdTXSByZWdleHBfbWF0Y2hlcyBidWcgaW4g OS4zLjQgYW5kIDkuNC4xDQoNClBsZWFzZSBub3RlLi4uDQoNCk9uIE1vbiwgTWF5IDQsIDIwMTUg YXQgOTo1NiBBTSwgSmVmZiBDZXJ0YWluIDxnY2VydGFpbkBkeW5hbWljc2lnbmFsLmNvbTxtYWls dG86Z2NlcnRhaW5AZHluYW1pY3NpZ25hbC5jb20+PiB3cm90ZToNClJlcHJvOg0KZHJvcCB0YWJs ZSBpZiBleGlzdHMgcmVnZXhfdGVzdDsNCmNyZWF0ZSB0YWJsZSByZWdleF90ZXN0KGZpbHRlciBj aXRleHQpOw0KaW5zZXJ0IGludG8gcmVnZXhfdGVzdChmaWx0ZXIpIHZhbHVlcyAoJ0ZpbHRlciBG aWx0ZXIgRmlsdGVyJyk7DQpzZWxlY3QgZmlsdGVyLCByZWdleHBfbWF0Y2hlcyhmaWx0ZXIsICdG aWx0ZXInLCAnZycpIGZyb20gcmVnZXhfdGVzdDsNCg0K4oCLImZpbHRlciIgaXMgb2YgdHlwZSAi Y2l0ZXh0IuKAiw0KDQpzZWxlY3QgZmlsdGVyLCByZWdleHBfbWF0Y2hlcygnRmlsdGVyIEZpbHRl ciBGaWx0ZXInLCAnRmlsdGVyJywgJ2cnKSBmcm9tIHJlZ2V4X3Rlc3Q7DQoNCg0K4oCLJ0ZpbHRl ciBGaWx0ZXIgRmlsdGVyJyBpcyBvZiB0eXBlICJ0ZXh0IuKAiw0KDQpFeHBlY3RlZCByZXN1bHQ6 DQpCb3RoIHNlbGVjdCBzdGF0ZW1lbnRzIHNob3VsZCByZXR1cm4gdGhlIHNhbWUgbnVtYmVyIG9m IHJvd3MuIFNwZWNpZmljYWxseSwgaW4gdGhpcyBjYXNlLCBJIGV4cGVjdCB0byBnZXQgYmFjayAz IHJvd3MuDQoNCg0K4oCLVGhlIGZvbGxvd2luZyBub24tY2l0ZXh0IHZlcnNpb24gb2YgeW91ciBm aXJzdCBxdWVyeSB3b3JrcyBqdXN0IGZpbmUgb24gOS4zDQoNCuKAi3NlbGVjdCBmaWx0ZXIsIHJl Z2V4cF9tYXRjaGVzKGZpbHRlciwgJ0ZpbHRlcicsICdnJykgZnJvbSAodmFsdWVzICgnRmlsdGVy IEZpbHRlciBGaWx0ZXInOjp0ZXh0KSkgcmVnZXhfdGVzdCAoZmlsdGVyKTsNCg0K4oCLcmVnZXhw X21hdGNoZXMgaXMgbm90IG1lbnRpb25lZCBvbiB0aGUgZm9sbG93aW5nIHBhZ2UgaW4gdGhlIGRv Y3VtZW50YXRpb246DQoNCuKAi2h0dHA6Ly93d3cucG9zdGdyZXNxbC5vcmcvZG9jcy85LjQvaW50 ZXJhY3RpdmUvY2l0ZXh0Lmh0bWwNCg0KDQpBY3R1YWwgcmVzdWx0Og0KV2hlbiByZWZlcmVuY2lu ZyB0aGUgZmlsdGVyIGNvbHVtbiwgb25seSBvbmUgcm93IGlzIGV2ZXIgcmV0dXJuZWQuIFdoZW4g c3VwcGx5aW5nIHRoZSBzdHJpbmcgdGhhdCBpcyB0aGUgc2FtZSBhcyB0aGUgZGF0YSBpbiB0aGUg ZmlsdGVyIGNvbHVtbiwgdGhlIGNvcnJlY3QgbnVtYmVyIG9mIHJvd3MgYXJlIHJldHVybmVkLg0K DQpSZXN1bHQgMSAoaW5jb3JyZWN0KToNCnNlbGVjdCBmaWx0ZXIsIHJlZ2V4cF9tYXRjaGVzKGZp bHRlciwgJ0ZpbHRlcicsICdnJykgZnJvbSByZWdleF90ZXN0Ow0KRmlsdGVyIEZpbHRlciBGaWx0 ZXIgICAgICAgICAgICAgICB7RmlsdGVyfQ0KDQpSZXN1bHQgMiAoY29ycmVjdCk6DQpzZWxlY3Qg ZmlsdGVyLCByZWdleHBfbWF0Y2hlcygnRmlsdGVyIEZpbHRlciBGaWx0ZXInLCAnRmlsdGVyJywg J2cnKSBmcm9tIHJlZ2V4X3Rlc3Q7DQpGaWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAg IHtGaWx0ZXJ9DQpGaWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAgIHtGaWx0ZXJ9DQpG aWx0ZXIgRmlsdGVyIEZpbHRlciAgICAgICAgICAgICAgIHtGaWx0ZXJ9DQoNCuKAi1RoaXMgZmFp bHMgb24gOS4zIGlmIHRoZSBjaXRleHQgdHlwZSBpcyB1c2VkIGluc3RlYWQgb2YgdGV4dC4NCg0K U2VlbXMgdGhlIGhvd2V2ZXIgdGhlIGNpdGV4dCB2ZXJzaW9uIG9mIHJlZ2V4cF9tYXRjaGVzIGlz IGludm9rZWQgaXQgaWdub3JlcyB0aGUgZ2xvYmFsIGZsYWcuIEhhdmVuJ3QgZXhwbG9yZWQgZGVl cGVyLg0KDQpEYXZpZCBKLg0K4oCLDQoNCg==
Jeff Certain <gcertain@dynamicsignal.com> writes: > create table regex_test(filter citext); > insert into regex_test(filter) values ('Filter Filter Filter'); > select filter, regexp_matches(filter, 'Filter', 'g') from regex_test; > select filter, regexp_matches('Filter Filter Filter', 'Filter', 'g') from regex_test; > Expected result: > Both select statements should return the same number of rows. Specifically, in this case, I expect to get back 3 rows. > Actual result: > When referencing the filter column, only one row is ever returned. When supplying the string that is the same as the datain the filter column, the correct number of rows are returned. Fixed, although after discussion we concluded that we should not change the default behavior in the back branches. See commit log message: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b1ec45994e5108d734c45876c25593823fcf8644 Thanks for the report! regards, tom lane