Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug - Mailing list pgsql-bugs

From Robert Schreiber
Subject Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug
Date
Msg-id 3db2d381-c94d-c6b8-10a8-7cf0ddda6c09@charter.net
Whole thread Raw
In response to Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Just to be clear here. It seems to me that I am right in that the leading/trailing Q should NOT be returned...

From Section 9.7.2 in the manual:

The substring function with three parameters, substring(string from pattern for escape-character), provides extraction of a substring that matches an SQL regular expression pattern. As with SIMILAR TO, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern that should be returned on success, the pattern must contain two occurrences of the escape character followed by a double quote ("). The text matching the portion of the pattern between these markers is returned.

In my mind I see this:

QMy Q NAmeQ  is interpreted as Q/My Q Name/Q rather than Q/My Q Name Q//.

bob



On 5/12/2019 12:27 AM, Andrew Gierth wrote:
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:>> I looked up the spec on this point. As far as I can see, we're not>> following it, but neither does the spec do what the OP wanted; in>> fact the result should have included the _leading_ Q as well as the>> trailing one.
Tom> Huh, interesting. So we should be translating the initialTom> substring to a non-greedy pattern. I believe Spencer's engine canTom> handle that by sticking (?:...){1,1}? around it.
Tom> Come to think of it, we probably need to be putting (?:...) aroundTom> the trailing substring as well. I suspect what we're doing todayTom> produces non-spec results if "|" appears in the trailing part.

Digging into it more:

SUBSTRING(x FROM 'expr' FOR 'escape') is from sql92/sql99 and is gone by
sql2008, replaced by SUBSTRING(x SIMILAR 'expr' ESCAPE 'escape'). sql99
defines the matching rule using different language, but with the same
actual effect (requiring shortest matches for the leading and trailing
strings).

Your suggested fix doesn't seem to work. If the leading/trailing
substrings do not have | or parens in then it seems to work to wrap them
in (?:(?:)??...), thanks to the rule that the first quantified atom in a
subexpression sets the whole subexpression's greediness, but handling |
or parens correctly seems harder.

Are there any other dbs that implement this feature that we can compare
against?


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #15803: Autocomplete issues in new 11.3 and 9.6.13 psql client
Next
From: Tom Lane
Date:
Subject: Re: BUG #15802: Comparison of a function returning boolean value using relational (<, >, <= or >=) operator