Thread: BUG #5257: wrong results of SUBSTRING with SQL regular expressions
The following bug has been logged online: Bug reference: 5257 Logged by: Roman Kononov Email address: kononov@ftml.net PostgreSQL version: 8.4.2 Operating system: GNU/Linux x86_64 Description: wrong results of SUBSTRING with SQL regular expressions Details: test=# select substring('34' from '(2|3)#"4#"' for '#'); substring ----------- 3 (1 row) test=# select substring('^' from '#"^#"' for '#'); substring ----------- (1 row) test=# select substring('$' from '#"$#"' for '#'); substring ----------- (1 row) These look wrong according to the PG documentation.
"Roman Kononov" <kononov@ftml.net> writes: > test=# select substring('34' from '(2|3)#"4#"' for '#'); > substring > ----------- > 3 > (1 row) Hmm. I guess we need to translate ( and ) to non-capturing parens. > test=# select substring('^' from '#"^#"' for '#'); > substring > ----------- > > (1 row) > test=# select substring('$' from '#"$#"' for '#'); > substring > ----------- > > (1 row) These cases are already fixed in HEAD. http://archives.postgresql.org/pgsql-committers/2009-10/msg00048.php regards, tom lane
I don't know what SQL2008 says about SUBSTRING, but SQL2003 says in ISO/IEC 9075-2:2003 (E), 6.29 <string value function>, General Rules, 5), page 261: d) If R [the regular expression] does not contain exactly two occurrences of the two-character sequence consisting of E [the escape character], each immediately followed by <double quote>, then an exception condition is raised: data exception - invalid use of escape character. This means that the following is wrong: test-std=# select substring('a' from 'a' for '#'); substring ----------- a (1 row)