Re: Another regexp performance improvement: skip useless paren-captures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Another regexp performance improvement: skip useless paren-captures
Date
Msg-id 3591269.1628547816@sss.pgh.pa.us
Whole thread Raw
In response to Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> Hmmm ... yeah, I see it too.  This points up something I'd wondered
> about before, which is whether the code that "cancels everything"
> after detecting {0} is really OK.  It throws away the outer subre
> *and children* without worrying about what might be inside, and
> here we see that that's not good enough --- there's still a v->subs
> pointer to the first capturing paren set, which we just deleted,
> so that the \1 later on messes up.  I'm not sure why the back
> branches are managing not to crash, but that might just be a memory
> management artifact.

... yeah, it is.  For me, this variant hits the assertion in all
branches:

regression=# select regexp_split_to_array('', '((.)){0}(\2){0}');
server closed the connection unexpectedly

So that's a pre-existing (and very long-standing) bug.  I'm not
sure if it has any serious impact in non-assert builds though.
Failure to clean out some disconnected arcs probably has no
real effect on the regex's behavior later.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Next
From: Andres Freund
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)