Re: regexp_replace not respecting greediness - Mailing list pgsql-bugs

From Tom Lane
Subject Re: regexp_replace not respecting greediness
Date
Msg-id 2134893.1758297713@sss.pgh.pa.us
Whole thread Raw
In response to regexp_replace not respecting greediness  (Simon Ellmann <simon.ellmann@tum.de>)
List pgsql-bugs
Simon Ellmann <simon.ellmann@tum.de> writes:
> With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should
matchgreedily: 
> postgres=# SELECT REGEXP_REPLACE('jane.smith@example.com', '.*?@.*', 'ab');

This is correct according to the rules given at

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute."  Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall.  The discussion in that manual section shows what to do
when you don't like the results.

> Other database systems (e.g., DuckDB, Umbra) match the whole input:

If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.

            regards, tom lane



pgsql-bugs by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: regexp_replace not respecting greediness
Next
From: PG Bug reporting form
Date:
Subject: BUG #19059: PostgreSQL fails to evaluate the cheaper expression first, leading to 45X performance degradation