Thread: BUG #3645: regular expression back references seem broken
The following bug has been logged online: Bug reference: 3645 Logged by: Eric Haszlakiewicz Email address: erh+pgsql@swapsimple.com PostgreSQL version: 8.2.5 Operating system: NetBSD Description: regular expression back references seem broken Details: I was attempting to create a simple regular expression that uses back references and I noticed some very odd behaviour. This regexp is supposed to match a string where all the characters are the same: ^(.)\1*$ If I try it, it doesn't work. I would expect this to return false: template1=# select 'xyz' ~ E'^(.)\\1*$'; ?column? ---------- t (1 row) But adding some extra parens does: template1=# select 'xyz' ~ E'^(.)(\\1)*$'; ?column? ---------- f (1 row) As does changing the "." to an "x": template1=# select 'xyz' ~ E'^(x)\\1*$'; ?column? ---------- f (1 row) As does forcing it to be a extended regular expression: template1=# select 'xyz' ~ E'(?e)^(.)\\1*$'; ?column? ---------- f (1 row) The docs claim: "A single non-zero digit, not followed by another digit, is always taken as a back reference." (The note at the end of 9.7.3.3) It's relatively easy to work around the problem, but it certainly led to a fair bit of head scratching while trying to debug some code. :)
"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes: > I would expect this to return false: > template1=# select 'xyz' ~ E'^(.)\\1*$'; > ?column? > ---------- > t > (1 row) Seems to be a bug in the Tcl regexp library we use. It's already reported upstream: https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894 regards, tom lane
Tom Lane wrote: > "Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes: >> I would expect this to return false: > >> template1=# select 'xyz' ~ E'^(.)\\1*$'; >> ?column? >> ---------- >> t >> (1 row) > > Seems to be a bug in the Tcl regexp library we use. It's already > reported upstream: > https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894 > > regards, tom lane er.. it's been languishing there for over 2 years. That doesn't sound very promising for getting it fixed. :( eric
Added to TODO: * Fix regular expression bug when using complex back-references http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php --------------------------------------------------------------------------- Eric Haszlakiewicz wrote: > > The following bug has been logged online: > > Bug reference: 3645 > Logged by: Eric Haszlakiewicz > Email address: erh+pgsql@swapsimple.com > PostgreSQL version: 8.2.5 > Operating system: NetBSD > Description: regular expression back references seem broken > Details: > > I was attempting to create a simple regular expression that uses back > references and I noticed some very odd behaviour. This regexp is supposed > to match a string where all the characters are the same: > > ^(.)\1*$ > > If I try it, it doesn't work. I would expect this to return false: > > template1=# select 'xyz' ~ E'^(.)\\1*$'; > ?column? > ---------- > t > (1 row) > > But adding some extra parens does: > template1=# select 'xyz' ~ E'^(.)(\\1)*$'; > ?column? > ---------- > f > (1 row) > > As does changing the "." to an "x": > > template1=# select 'xyz' ~ E'^(x)\\1*$'; > ?column? > ---------- > f > (1 row) > > As does forcing it to be a extended regular expression: > > > template1=# select 'xyz' ~ E'(?e)^(.)\\1*$'; > ?column? > ---------- > f > (1 row) > > The docs claim: "A single non-zero digit, not followed by another digit, is > always taken as a back reference." (The note at the end of 9.7.3.3) > > It's relatively easy to work around the problem, but it certainly led to a > fair bit of head scratching while trying to debug some code. :) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +