Thread: Regex bug

Regex bug

From
David Fetter
Date:
Kind people,

Here's a symptom as reported by John Hansen aka applejack:

SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";
 OMG
-----
 t
(1 row)

I have produced this behavior in 7.4.3 and CVS tip.

This should be false, shouldn't it?

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

Re: Regex bug

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> Here's a symptom as reported by John Hansen aka applejack:

> SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";

This is not a regex bug: it has to do with the fact that we don't
support embedded nulls in text values.  This may enlighten you
a bit as to what's happening:

regression=# select length ('\000\125');
 length
--------
      0
(1 row)


            regards, tom lane

Re: Regex bug

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Fri, Aug 06, 2004 at 01:32:32PM -0400, Tom Lane wrote:
>> regression=# select length ('\000\125');
>> length
>> --------
>> 0
>> (1 row)

> Ah, right.  John was testing his unicode patch, so there must be some
> magick underneath that distinguishes characters from bytes :)

> Cheers,
> D (feeling a little sheepish. again.)

It occurs to me that a case could be made for having text_in throw an
error if it sees '\000'.  I cannot really see that there's any benefit
to the current behavior of (effectively) silently truncating the string.

Comments?

            regards, tom lane

Re: Regex bug

From
David Fetter
Date:
On Fri, Aug 06, 2004 at 01:32:32PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > Here's a symptom as reported by John Hansen aka applejack:
>
> > SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";
>
> This is not a regex bug: it has to do with the fact that we don't
> support embedded nulls in text values.  This may enlighten you
> a bit as to what's happening:
>
> regression=# select length ('\000\125');
>  length
> --------
>       0
> (1 row)

Ah, right.  John was testing his unicode patch, so there must be some
magick underneath that distinguishes characters from bytes :)

Cheers,
D (feeling a little sheepish. again.)
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!