Re: Notes about behaviour of SIMILAR TO operator - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Notes about behaviour of SIMILAR TO operator
Date
Msg-id 7420.1069369047@sss.pgh.pa.us
Whole thread Raw
In response to Notes about behaviour of SIMILAR TO operator  (Adam Buraczewski <adamb@nor.pl>)
Responses Re: Notes about behaviour of SIMILAR TO operator
List pgsql-bugs
Adam Buraczewski <adamb@nor.pl> writes:
> ... for example the pattern 'a|z' (which should match single 'a' or 'z'
> characters only, according to SQL spec) is converted into POSIX
> regular expression in the form of '^a|z$' which matches all strings
> beginning with 'a' ('abcdef' for example) and all strings ending with
> 'z' ('xyz' for example).  So the meaning of the pattern is changed,
> which is not good.

Hm, that's a mistake, it should probably translate to ^(a|z)$ instead.

> The behaviour above is also caused by similar_escape(), which converts
> '[_]' to '^[.]$' and '[%]' to '^[.*]$', not noticing the simple fact
> that these characters are inside brackets.

As near as I can tell, the SQL spec requires special characters to be
escaped when they are inside a bracket construct.  So indeed the above
are invalid SQL regexes.

> Talking about square brackets, it should be noticed that there is a
> slight difference between SIMILAR TO and POSIX way of describing named
> character classes.

Mmm, yeah, that looks like a mess.

> This at least could be avoided simply by prepending regular expression
> returned by similar_escape() with a magic sequence '***:' which
> switches regexp engine into ARE mode.

Good point.  Actually, do we want to force ARE mode, or something simpler?
Perhaps ERE or even BRE would be a better match to the SQL spec.

> I think I am able to write such a patch in my spare time,

Go to it ...

            regards, tom lane

pgsql-bugs by date:

Previous
From: Adam Buraczewski
Date:
Subject: Notes about behaviour of SIMILAR TO operator
Next
From: Neil Conway
Date:
Subject: Re: pg_dumpall does not save CREATE permission on databases