Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present) - Mailing list pgsql-bugs

From David G. Johnston
Subject Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Date
Msg-id CAKFQuwbn0nYSQL99rn=WSsfKYrSra5cd3GiQ3iH_rnHHGic1_g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Tue, Aug 4, 2015 at 8:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
> > As David says, these examples appear to be following what's stated in
> >
> http://www.postgresql.org/docs/9.4/static/functions-matching.html#POSIX-M=
ATCHING-RULES
> > The Spencer regex engine we use has a notion of greediness or
> > non-greediness of the entire regex, and further that that takes
> precedence
> > for determining the overall match length over greediness of individual
> > subexpressions.  That behavior might be inconvenient for this particula=
r
> > use-case, but that doesn't make it a bug.
>
> BTW, perhaps it would be worth adding an example to that section that
> shows how to control this behavior.  The trick is obvious once you've see=
n
> it, but not so much otherwise: you add something to the start of the rege=
x
> that establishes the overall greediness you want, but can never actually
> match any characters.  "\0*" or "\0*?" will work fine in Postgres
> use-cases since there can never be a NUL character in the data.
>
> regression=3D# select regexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
>  regexp_matches
> -----------------
>  {abc0123,4,xyz}
> (1 row)
>
> regression=3D# select regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
>  regexp_matches
> ----------------
>  {abc,0,""}
> (1 row)
>
> regression=3D# select regexp_matches('abc01234xyz', '\0*(.*?)(\d+)(.*)');
>  regexp_matches
> -----------------
>  {abc,01234,xyz}
> (1 row)
>
>
=E2=80=8B+1

David J.=E2=80=8B

pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)
Next
From: Christian Mächler
Date:
Subject: Re: BUG #13538: REGEX non-greedy is working incorrectly (and also greedy matches fail if non-greedy is present)