Home > mailing lists

Re: uninterruptable regexp_replace in 9.2 and 9.3 - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: uninterruptable regexp_replace in 9.2 and 9.3
Date	March 2, 2014 07:39:05
Msg-id	17158.1393735118@sss.pgh.pa.us Whole thread Raw
In response to	Re: uninterruptable regexp_replace in 9.2 and 9.3 (Pedro Gimeno <pgsql-004@personal.formauri.es>)
Responses	Re: uninterruptable regexp_replace in 9.2 and 9.3
List	pgsql-bugs

Tree view

Pedro Gimeno <pgsql-004@personal.formauri.es> writes:
> I'm adding this note in case it helps anyone get a bigger picture as to
> what other implementations do about this problem.

I spent some time looking at pcre to try to understand what it was doing
backtracking-wise, and eventually realized that it's not actually trying
very hard.  Consider this problem:

# select regexp_matches('123456789z', '(([0-9]+|9z)+)');
 regexp_matches
-----------------
 {123456789z,9z}
(1 row)

To obtain the longest possible match, it's necessary to decide that the
first iteration of the + operator matches '12345678', leaving '9z' to be
matched by the second iteration.  Postgres and Tcl get this right.
Perl and pcre, not so much: they report the match as '123456789'.

$ perl -e "if ('123456789z' =~ m/(([0-9]+|9z)+)/) {print \"\$1\\n\";}"
123456789

In fact, Perl doesn't even get this simplified case right, though no
backtracking is required:

$ perl -e "if ('123456789z' =~ m/(([0-9]|9z)+)/) {print \"\$1\\n\";}"
123456789

It seems to just fail to notice that on the 9th iteration, a longer match
is available from the second OR-alternative than the first.  (No doubt
this is documented behavior somewhere, but it sure flies in the face of
what I'd consider to be expected regex behavior.)

The performance problem we're looking at comes directly from the
backtracking that's done to ensure that we detect a match in case the
pattern has this sort of pathological behavior.  The test case doesn't
actually need any such backtracking, because there aren't multiple ways
to match any particular substring; but I'm not sure if there's any easy
way to recognize that.

            regards, tom lane

pgsql-bugs by date:

From: Michael Paquier
Date: 02 March 2014, 06:15:44
Subject: Re: BUG #9398: DELETE refering to a materialized view produces "cannot lock rows in materialized view" error

From: Haribabu Kommi
Date: 03 March 2014, 02:25:57
Subject: Re: BUG #9370: PostgreSQL Service Unexpectedly closing in SERVER Computer

Re: uninterruptable regexp_replace in 9.2 and 9.3 - Mailing list pgsql-bugs

Previous

Next