Home > mailing lists

Re: [E] Re: Regexp_replace bug / does not terminate on long strings - Mailing list pgsql-general

From	Markhof, Ingolf
Subject	Re: [E] Re: Regexp_replace bug / does not terminate on long strings
Date	August 23, 2021 09:01:09
Msg-id	CALZg0g44=af4xoXcrWqnje3=sGK8f2P1mNTR9OiFBb1msYgiCg@mail.gmail.com Whole thread Raw
In response to	Re: [E] Re: Regexp_replace bug / does not terminate on long strings (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

Right. Considering a longer sequence of a's, "(a*)\1" allows a wide variety of matches. But in fact, this is not what I was trying to use. I was more looking at "(a)\1*" which shall match exactly what "a+" matches. As matching is greedy, "(a)\1*" shall consume all a's in a sequence in one go, just like "a+" does...?!

Regards,

Ingolf

On Fri, Aug 20, 2021 at 6:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Markhof, Ingolf" <ingolf.markhof@de.verizon.com> writes:
> thank you very much for your reply. Actually, I was assuming all these
> regular expressions are based on the same core implementation.

They are not. There are at least three fundamentally different
implementation technologies (DFA, NFA, hybrid). Friedl's "Mastering
Regular Expressions" cites multiple different programs using each
of those, every one of which behaves a bit differently when you start
poking at corner cases. And that's just in the open-source world;
I don't know what Oracle is using, but I bet it ain't open source.

> I am also surprised that you say the (\1)+ subpattern is computationally
> expensive. Regular expressions are greedy by default. I.e. in case of a*
> matching against a string of 1000 a's, the system will not try a, aa, aaa,
> ... and so on, right? Instead, it will consume all the a's in one go.

"a*" is easy. "(a*)\1" is less easy --- if you let the a* consume the
whole string, you will not get a match, even though one is possible.
In general, backrefs create a mess in what would otherwise be a pretty
straightforward concept :-(.

regards, tom lane

Verizon Deutschland GmbH - Sebrathweg 20, 44149 Dortmund, Germany - Amtsgericht Dortmund, HRB 14952 - Geschäftsführer: Detlef Eppig - Vorsitzender des Aufsichtsrats: Francesco de Maio

pgsql-general by date:

From: Kelvin Lau
Date: 23 August 2021, 07:34:53
Subject: Connecton timeout issues and JDBC

From: Laurenz Albe
Date: 23 August 2021, 09:19:28
Subject: Re: Connecton timeout issues and JDBC

Re: [E] Re: Regexp_replace bug / does not terminate on long strings - Mailing list pgsql-general

Previous

Next