Re: Death by regexp_replace - Mailing list pgsql-hackers

From Benedikt Grundmann
Subject Re: Death by regexp_replace
Date
Msg-id CADbMkNOn=qSqRovK8k-URzPW18JsLEV-c_WdNN2t0x4enJeCEw@mail.gmail.com
Whole thread Raw
In response to Re: Death by regexp_replace  (Benedikt Grundmann <bgrundmann@janestreet.com>)
Responses Re: Death by regexp_replace  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jan 15, 2016 at 4:39 PM, Benedikt Grundmann <bgrundmann@janestreet.com> wrote:

On Fri, Jan 15, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kevin Grittner <kgrittn@gmail.com> writes:
> On Fri, Jan 15, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (FWIW, I think you probably wanted ,+ not ,* in the regex, else there's
>> practically no constraint there, leading to having to consider O(N^2)
>> or more possibilities.)

> On master (commit cf7dfbf2) it responds to pg_cancel_backend(),
> but it seems to be in an endless loop until you do that.

A bit of further experimentation suggests the runtime growth is actually
more like O(2^N).  It will terminate in a reasonable amount of time if the
input string is about half as long as the given example.

The problem is that so far as the DFA engine is concerned, the pattern
substring '(,*\1)+' can match almost anything at all, because it's
equivalent to '(,*[^,]+)+' which is easily seen to match any string
whatever that's got at least one non-comma.  So, for each possible match
to the substring '([^,]+)', of which there are lots, it has to consider
every possible way of breaking up all the rest of the string into one or
more substrings.  The vast majority of those ways will fail when the
backref match is checked, but there's no way to realize it before that.

To be clear I'm perfectly happy with that query taking forever (I didn't write it ;-)).  The only thing I was unhappy about was that pg_cancel/terminate_backend didn't work.  If that is fixed great.  
 
                        regards, tom lane


Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but it looks like you haven't released at least the link on this page for 9.2


says 7 in the filename which is certainly not 14 ;-)


Is that expected? 

Thanks,

Bene

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: dealing with extension dependencies that aren't quite 'e'
Next
From: Petr Jelinek
Date:
Subject: Re: pglogical - logical replication contrib module