Re: WIP: lookbehind constraints for our regexp engine - Mailing list pgsql-hackers

From Thom Brown
Subject Re: WIP: lookbehind constraints for our regexp engine
Date
Msg-id CAA-aLv5Etwt9t1yQ7AJDewfkht1_+_hyCBxbWrZ--B5AkuuU6g@mail.gmail.com
Whole thread Raw
In response to WIP: lookbehind constraints for our regexp engine  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
<p dir="ltr">On Oct 17, 2015 12:53 AM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > I've occasionally heard complaints that our regex engine only has<br /> > lookahead
constraintsnot lookbehind constraints, while Perl's for example<br /> > does have those.  While I was fooling around
withthe other issues in that<br /> > code, I learned enough to realize that it would not be that hard to<br /> >
implementsuch things, and the attached proof-of-concept patch does so<br /> > (using Perl-compatible syntax, in case
you'rewondering).<br /> ><br /> > However, I don't feel like this is done to the point of being committable,<br
/>> because its performance leaves something to be desired in a lot of cases.<br /> > The difficulty is that
becausethe engine can only match in a forward<br /> > direction, in the worst case you have to check a lookbehind
constraintby<br /> > trying to match starting from the string start to see if a match exists<br /> > ending at
thecurrent-location where you're testing the constraint.  That<br /> > makes it, worst case, O(N^2) to search a
stringof length N.  Now, you can<br /> > improve on that greatly if you can determine that possible matches don't<br
/>> extend all the way back to the string start.  In the attached patch I use<br /> > the "cold start pointer"
returnedby shortest() to decide that characters<br /> > before the coldstart point no longer have to be rechecked. 
Thathelps<br /> > some, but it's not good enough because there are too many cases where the<br /> > engine
doesn'tmove the coldstart point forward very aggressively.<br /> ><br /> > It might be that that behavior itself
canbe improved on, which would<br /> > be nice because there are also non-lookbehind-related scenarios where<br />
>smarter coldstart detection would help performance.  But I have no very<br /> > good ideas about how to do
it.<br/> ><br /> > Another idea I've been toying with is to add logic that tries to determine<br /> > the
maximumpossible match length for a regex.  If we can determine that<br /> > for the lookbehind sub-regex, then we'd
knowwe have to back up at most<br /> > that far before applying the match check.  This seems promising because<br />
>a lot of other engines don't even support variable-length lookbehind<br /> > patterns at all, and so we'd be
assuredof good performance in all the<br /> > cases that competitors can handle at all.<br /> ><br /> >
Anyway,I'm not planning to do much more work on this right now, but<br /> > I thought I'd throw it out there just to
letpeople know that this seems<br /> > within reach.  I'm curious to know how many people care, and how much.<p
dir="ltr">+1<pdir="ltr">I'm one of those who wanted lookbehind, and it would give us complete lookaround so very much
welcome.<pdir="ltr">Thom 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”