Re: [E] Regexp_replace bug / does not terminate on long strings - Mailing list pgsql-general

From Mark Dilger
Subject Re: [E] Regexp_replace bug / does not terminate on long strings
Date
Msg-id D84B8669-286F-4F90-89D6-6566D93C2C08@enterprisedb.com
Whole thread Raw
In response to Re: [E] Re: Regexp_replace bug / does not terminate on long strings  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [E] Regexp_replace bug / does not terminate on long strings  (Miles Elam <miles.elam@productops.com>)
List pgsql-general

> On Aug 20, 2021, at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "a*" is easy.  "(a*)\1" is less easy --- if you let the a* consume the
> whole string, you will not get a match, even though one is possible.
> In general, backrefs create a mess in what would otherwise be a pretty
> straightforward concept :-(.

The following queries take radically different time to run:

\timing
select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1)+(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 16476.529 ms (00:16.477)

select regexp_replace(
  repeat('someone,one,one,one,one,one,one,', 60),
  '(?<=^|,)([^,]+)(?:,\1){5}(?=$|,)',
  '\1', -- replacement
  'g'  -- apply globally (all matches)
);

Time: 1.452 ms

The only difference in the patterns is the + vs. the {5}.  It looks to me like the first pattern should greedily match
five",one" matches and be forced to stop since ",someone" doesn't match, and the second pattern should grab the five
",one"matches it was told to grab and not try to grab the ",someone", but other than that, they should be performing
thesame work.  I don't see why the performance should be so different. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: [E] Re: Regexp_replace bug / does not terminate on long strings
Next
From: Miles Elam
Date:
Subject: Re: [E] Regexp_replace bug / does not terminate on long strings