Thread: BUG #5257: wrong results of SUBSTRING with SQL regular expressions

BUG #5257: wrong results of SUBSTRING with SQL regular expressions

From
"Roman Kononov"
Date:
The following bug has been logged online:

Bug reference:      5257
Logged by:          Roman Kononov
Email address:      kononov@ftml.net
PostgreSQL version: 8.4.2
Operating system:   GNU/Linux x86_64
Description:        wrong results of SUBSTRING with SQL regular expressions
Details:

test=# select substring('34' from '(2|3)#"4#"' for '#');
 substring
-----------
 3
(1 row)

test=# select substring('^' from '#"^#"' for '#');
 substring
-----------

(1 row)

test=# select substring('$' from '#"$#"' for '#');
 substring
-----------

(1 row)

These look wrong according to the PG documentation.

Re: BUG #5257: wrong results of SUBSTRING with SQL regular expressions

From
Tom Lane
Date:
"Roman Kononov" <kononov@ftml.net> writes:
> test=# select substring('34' from '(2|3)#"4#"' for '#');
>  substring
> -----------
>  3
> (1 row)

Hmm.  I guess we need to translate ( and ) to non-capturing parens.

> test=# select substring('^' from '#"^#"' for '#');
>  substring
> -----------
>
> (1 row)

> test=# select substring('$' from '#"$#"' for '#');
>  substring
> -----------
>
> (1 row)

These cases are already fixed in HEAD.
http://archives.postgresql.org/pgsql-committers/2009-10/msg00048.php

            regards, tom lane

Re: BUG #5257: wrong results of SUBSTRING with SQL regular expressions

From
Roman Kononov
Date:
I don't know what SQL2008 says about SUBSTRING, but SQL2003 says in
ISO/IEC 9075-2:2003 (E), 6.29 <string value function>, General Rules,
5), page 261:

d) If R [the regular expression] does not contain exactly two
occurrences of the two-character sequence consisting of E [the escape
character], each immediately followed by <double quote>, then an
exception condition is raised: data exception - invalid use of escape
character.

This means that the following is wrong:

test-std=# select substring('a' from 'a' for '#');
  substring
-----------
  a
(1 row)