pgsql: Rethink regexp engine's backref-related compilation state. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Rethink regexp engine's backref-related compilation state.
Date
Msg-id E1mClAF-0004UU-9x@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Rethink regexp engine's backref-related compilation state.

I had committer's remorse almost immediately after pushing cb76fbd7e,
upon finding that removing capturing subexpressions' subREs from the
data structure broke my proposed patch for REG_NOSUB optimization.
Revert that data structure change.  Instead, address the concern
about not changing capturing subREs' endpoints by not changing the
endpoints.  We don't need to, because the point of that bit was just
to ensure that the atom has endpoints distinct from the outer state
pair that we're stringing the branch between.  We already made
suitable states in the parenthesized-subexpression case, so the
additional ones were just useless overhead.  This seems more
understandable than Spencer's original coding, and it ought to be
a shade faster too by saving a few state creations and arc changes.
(I actually see a couple percent improvement on Jacobson's web
corpus, though that's barely above the noise floor so I wouldn't
put much stock in that result.)

Also, fix the logic added by ea1268f63 to ensure that the subRE
recorded in v->subs[subno] is exactly the one with capno == subno.
Spencer's original coding recorded the child subRE of the capture
node, which is okay so far as having the right endpoint states is
concerned, but as of cb76fbd7e the capturing subRE itself always
has those endpoints too.  I think the inconsistency is confusing
for the REG_NOSUB optimization.

As before, backpatch to v14.

Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/00116dee5ad4c1964777c91e687bb98b1d9f7ea0

Modified Files
--------------
src/backend/regex/regcomp.c | 84 +++++++++++++++++++++++++--------------------
1 file changed, 47 insertions(+), 37 deletions(-)


pgsql-committers by date:

Previous
From: David Rowley
Date:
Subject: pgsql: Remove unused function declaration
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Change NestPath node to contain JoinPath node