SIMILAR TO incorrect with alternation - Mailing list pgsql-bugs

From Aaron Bingham
Subject SIMILAR TO incorrect with alternation
Date
Msg-id 458BD893.60803@cenix-bioscience.com
Whole thread Raw
Responses Re: SIMILAR TO incorrect with alternation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello,

We just had a case where invalid data entered our database, causing
application failure.  A constraint was in place to prevent this and
appeared correct, however, I was able to verify that the expression was
evaluating to TRUE when I expected it to be FALSE.  As a work around we
have fallen back to the nonstandard POSIX regular expression support,
which works correctly.

Here is a simple test case which triggers the problematic behavior.  I
would expect the SQL regular expression and POSIX regular expression
below to give the same results.  However, they produce different
answers.  The difference seems to be caused by a naive implementation of
similar_escape(text, text) as also demonstrated below.

unidb=# select version();

version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5
20050117 (prerelease) (SUSE Linux)
(1 row)

unidb=# select 'ab' similar to 'a|b';
 ?column?
----------
 t
(1 row)

unidb=# select 'ab' ~ '^(a|b)$';
 ?column?
----------
 f
(1 row)

unidb=# select similar_escape('a|b', NULL);
 similar_escape
----------------
 ^a|b$
(1 row)

Thanks,

--
--------------------------------------------------------------------
Aaron Bingham
Senior Software Engineer
Cenix BioScience GmbH
Tatzberg 47                       phone: +49 (351) 4173-146
D-01307 Dresden, Germany          fax:   +49 (351) 4173-198
--------------------------------------------------------------------

pgsql-bugs by date:

Previous
From: "Robert Locke"
Date:
Subject: BUG #2863: vacuum failing, weird record in table...
Next
From: "goodsforyou"
Date:
Subject: BUG #2861: when backup UTF-8 database,pgdump add comments not encoded in UTF-8