pg_regprefix() doesn't handle lookahead constraints correctly - Mailing list pgsql-bugs

From Tom Lane
Subject pg_regprefix() doesn't handle lookahead constraints correctly
Date
Msg-id 1218.1445273095@sss.pgh.pa.us
Whole thread Raw
List pgsql-bugs
Whilst poking around the topic of lookbehind constraints, I realized
that pg_regprefix(), which is code we added to extract any fixed prefix
that a regex might have, is already broken for lookahead constraints.
Observe:

regression=# create table pp (f1 text unique);
CREATE TABLE
regression=# explain select * from pp where f1 ~ '^abcde(f|(?=\d))';
                               QUERY PLAN
------------------------------------------------------------------------
 Bitmap Heap Scan on pp  (cost=4.22..14.37 rows=7 width=32)
   Filter: (f1 ~ '^abcde(f|(?=\d))'::text)
   ->  Bitmap Index Scan on pp_f1_key  (cost=0.00..4.22 rows=7 width=0)
         Index Cond: ((f1 >= 'abcdef'::text) AND (f1 < 'abcdeg'::text))
(4 rows)

It thinks that "f" is part of the fixed prefix of the regex, whereas
in point of fact the regex will match strings that have a digit there.

The reason is that pg_regprefix thinks it can ignore LACON arcs
altogether, whereas really what it had better do is treat them as
a reason to give up searching.

In principle we could traverse across the LACON arc (which consumes no
input) and look to see if the following state has a single outarc color.
But that would greatly complicate the code and I doubt it corresponds to
any case that's of real-world interest.  That is, it'd allow us to
recognize that, say, '^abcde(?=\w)f' has a fixed prefix of 'abcdef' not
just 'abcde', but who writes regexes like that?  Most likely, a lookahead
constraint is at the end of the regex anyway.  (Similarly, it won't be
very interesting to optimize for lookbehind constraints here, because
nobody would write such a thing in a fixed-prefix regex.)

This is such a corner case that it's not too surprising we've had no
field reports about it.  But still, a bug is a bug.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: BUG #13687: bug in row_to_json function with '-infinity' timestamp
Next
From: Tom Lane
Date:
Subject: Re: BUG #13687: bug in row_to_json function with '-infinity' timestamp