Re: Our regex vs. POSIX on "longest match" - Mailing list pgsql-hackers

From Brendan Jurd
Subject Re: Our regex vs. POSIX on "longest match"
Date
Msg-id CADxJZo1fbE9FA+pW89dNqqiPpLstSxYKug9TLQcS_q+J7wF+_A@mail.gmail.com
Whole thread Raw
In response to Re: Our regex vs. POSIX on "longest match"  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 5 March 2012 17:23, Robert Haas <robertmhaas@gmail.com> wrote:
> This is different from what Perl does, but I think Perl's behavior
> here is batty: given a+|a+b+ and the string aaabbb, it picks the first
> branch and matches only aaa.

Yeah, this is sometimes referred to as "ordered alternation",
basically that the branches of the alternation are prioritised in the
same order in which they are described.  It is fairly commonplace
among regex implementations.

> apparently, it selects the syntactically first
> branch that can match, regardless of the length of the match, which
> strikes me as nearly pure evil.

As long as it's documented that alternation prioritises in this way, I
don't feel upset about it.  At least it still provides you with a
sensible way to get whatever you want from your RE; if you want a
shorter alternative to be preferred, put it up the front.  Ordered
alternation also gives you a way to specify which of several
same-length alternatives you would prefer to be matched, which can
come in handy.  It also means you can specify less-complex
alternatives before more-complex ones, which can have performance
advantages.

I do agree with you that if you *don't* do ordered alternation, then
it is right to treat alternation as greedy by default.

Cheers,
BJ


pgsql-hackers by date:

Previous
From: Shigeru Hanada
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Next
From: Gregg Jaskiewicz
Date:
Subject: Re: autovacuum locks