Thread: BUG #3645: regular expression back references seem broken

BUG #3645: regular expression back references seem broken

From
"Eric Haszlakiewicz"
Date:
The following bug has been logged online:

Bug reference:      3645
Logged by:          Eric Haszlakiewicz
Email address:      erh+pgsql@swapsimple.com
PostgreSQL version: 8.2.5
Operating system:   NetBSD
Description:        regular expression back references seem broken
Details:

I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour.  This regexp is supposed
to match a string where all the characters are the same:

^(.)\1*$

If I try it, it doesn't work.  I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
 ?column?
----------
 t
(1 row)

But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
 ?column?
----------
 f
(1 row)

As does changing the "." to an "x":

template1=# select 'xyz' ~ E'^(x)\\1*$';
 ?column?
----------
 f
(1 row)

As does forcing it to be a extended regular expression:


template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
 ?column?
----------
 f
(1 row)

The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference."  (The note at the end of 9.7.3.3)

It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)

Re: BUG #3645: regular expression back references seem broken

From
Tom Lane
Date:
"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:
> I would expect this to return false:

> template1=# select 'xyz' ~ E'^(.)\\1*$';
>  ?column?
> ----------
>  t
> (1 row)

Seems to be a bug in the Tcl regexp library we use.  It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894

            regards, tom lane

Re: BUG #3645: regular expression back references seem broken

From
Eric Haszlakiewicz
Date:
Tom Lane wrote:
> "Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:
>> I would expect this to return false:
>
>> template1=# select 'xyz' ~ E'^(.)\\1*$';
>>  ?column?
>> ----------
>>  t
>> (1 row)
>
> Seems to be a bug in the Tcl regexp library we use.  It's already
> reported upstream:
> https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894
>
>             regards, tom lane

er.. it's been languishing there for over 2 years.  That doesn't sound
very promising for getting it fixed. :(

eric

Re: BUG #3645: regular expression back references seem broken

From
Bruce Momjian
Date:
Added to TODO:

* Fix regular expression bug when using complex back-references

  http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php


---------------------------------------------------------------------------

Eric Haszlakiewicz wrote:
>
> The following bug has been logged online:
>
> Bug reference:      3645
> Logged by:          Eric Haszlakiewicz
> Email address:      erh+pgsql@swapsimple.com
> PostgreSQL version: 8.2.5
> Operating system:   NetBSD
> Description:        regular expression back references seem broken
> Details:
>
> I was attempting to create a simple regular expression that uses back
> references and I noticed some very odd behaviour.  This regexp is supposed
> to match a string where all the characters are the same:
>
> ^(.)\1*$
>
> If I try it, it doesn't work.  I would expect this to return false:
>
> template1=# select 'xyz' ~ E'^(.)\\1*$';
>  ?column?
> ----------
>  t
> (1 row)
>
> But adding some extra parens does:
> template1=# select 'xyz' ~ E'^(.)(\\1)*$';
>  ?column?
> ----------
>  f
> (1 row)
>
> As does changing the "." to an "x":
>
> template1=# select 'xyz' ~ E'^(x)\\1*$';
>  ?column?
> ----------
>  f
> (1 row)
>
> As does forcing it to be a extended regular expression:
>
>
> template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
>  ?column?
> ----------
>  f
> (1 row)
>
> The docs claim: "A single non-zero digit, not followed by another digit, is
> always taken as a back reference."  (The note at the end of 9.7.3.3)
>
> It's relatively easy to work around the problem, but it certainly led to a
> fair bit of head scratching while trying to debug some code. :)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +