Re: Another regexp performance improvement: skip useless paren-captures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Another regexp performance improvement: skip useless paren-captures
Date
Msg-id 3581981.1628544376@sss.pgh.pa.us
Whole thread Raw
In response to Re: Another regexp performance improvement: skip useless paren-captures  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Mark Dilger <mark.dilger@enterprisedb.com> writes:
> I can still trigger the old bug for which we thought we'd pushed a fix.  The test case below crashes on master
(e12694523e7e4482a052236f12d3d8b58be9a22c),and also on the fixed version "Make regexp engine's backref-related
compilationstate more bulletproof." (cb76fbd7ec87e44b3c53165d68dc2747f7e26a9a). 

> Can you test if it crashes for you, too?  I'm not sure I see why this one fails when millions of others pass.

> The backtrace is still complaining about regc_nfa.c:1265:

> +select regexp_split_to_array('', '(?:((?:q+))){0}(\1){0,0}?*[^]');
> +server closed the connection unexpectedly

Hmmm ... yeah, I see it too.  This points up something I'd wondered
about before, which is whether the code that "cancels everything"
after detecting {0} is really OK.  It throws away the outer subre
*and children* without worrying about what might be inside, and
here we see that that's not good enough --- there's still a v->subs
pointer to the first capturing paren set, which we just deleted,
so that the \1 later on messes up.  I'm not sure why the back
branches are managing not to crash, but that might just be a memory
management artifact.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE
Next
From: Melanie Plageman
Date:
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.