Re: Notes about behaviour of SIMILAR TO operator - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Notes about behaviour of SIMILAR TO operator
Date
Msg-id 12635.1069430716@sss.pgh.pa.us
Whole thread Raw
In response to Re: Notes about behaviour of SIMILAR TO operator  (Adam Buraczewski <adamb@nor.pl>)
List pgsql-bugs
Adam Buraczewski <adamb@nor.pl> writes:
>     ('a' similar to 'a' escape null) is true (should be unknown!)

Yeah, you are right; this is because we are overloading a "null" second
parameter to mean "the ESCAPE part wasn't present", which in hindsight
wasn't such a hot idea.

> I think that either PostgreSQL should check for nulls in SIMILAR TO
> construct before calling similar_escape(), or there should be two
> versions of similar_escape() function: one getting only one argument
> (for SIMILAR TO without ESCAPE) and second, getting two arguments
> (a pattern and an escape char).  Which solution is better?

I think the latter is the only reasonable solution, but it will be
something that we cannot implement in the 7.4.* series, because adding
another function implies initdb.  I'd suggest submitting one patch that
fixes everything but the NULL problem, which we could back-patch into
7.4, and then a second patch that splits the function into two for 7.5.

>> As near as I can tell, the SQL spec requires special characters to be
>> escaped when they are inside a bracket construct.  So indeed the above
>> are invalid SQL regexes.

> How the function should behave when such an invalid pattern is passed
> as its argument?  Should it throw an error (this is what SQL spec
> says) or tolerate as much mistakes as possible, generating some
> warnings only?

I don't have a strong opinion --- could go with either behavior.  You
might want to take it up on the pgsql-sql list.

>> Good point.  Actually, do we want to force ARE mode, or something simpler?
>> Perhaps ERE or even BRE would be a better match to the SQL spec.

> I think that there is no difference which regexp dialect is choosen,
> only the speed matters.  Function translating SIMILAR TO patterns into
> POSIX regular expressions will be more or less the same.  What should
> I choose then?

I doubt there would be any speed difference.  The advantage of a dumber
RE flavor is that it would have fewer "extra" features that might be
unintentionally triggered by a translated pattern, leading to just the
sort of non-SQL-compliant behavior you are complaining of ...

> BTW, should I write some regression tests for SIMILAR TO?

Sure.  Look at some of the existing regression tests for examples.

> Should the changes be written for CVS HEAD only or 7.4/7.3 branches
> either?

I don't see that we'd bother applying it to 7.3, but 7.4 branch yes,
if you avoid any changes in the function's API for the 7.4 version.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Adam Buraczewski
Date:
Subject: Re: Notes about behaviour of SIMILAR TO operator
Next
From: david@fetter.org (David Fetter)
Date:
Subject: PERFORM bug with FOUND?