Thread: Strange behavior

Strange behavior

From
Christophe Labouisse
Date:
[It may be a repost, I had a little mess in my mail last weeks, sorry]

I tried to make a simple select on a single table and I find some
strange results.

First of all I execute the following query :

=> select crit_url from crit where crit_url like 'films/%' order by crit_url;

crit_url
-------------------------------------
films/7-samurai.html

[...]

films/victor-pendant-qu-il.html
films/violent-cop.html
films/wild-things.html
(53 rows)

Now if it try this one :

select crit_url from crit where crit_url like '%films/%' order by crit_url;

crit_url
-------------------------------------
films/7-samurai.html

[...]

films/victor-pendant-qu-il.html
films/violent-cop.html
films/wild-things.html
films/y-aura-t-il-de-la-neige.html
films/you-ve-got-mail.html#ab
films/you-ve-got-mail.html#fvd
films/you-ve-got-mail.html#fvd
films/you-ve-got-mail.html#thb
(58 rows)

As you can notice there is 5 more rows in the second query that in the
first and, as far as my SQL understanding goes, they should have in
the first selection. Since there is an index (partly) on crit_url I
tried to remove it but it does change anything. Any idea ?

lumiere=> \d crit

Table    = crit
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| crit_id                          | int4 not null default nextval('c |     4 |
| films_id                         | int4 not null                    |     4 |
| crit_texte                       | text                             |   var |
| crit_date                        | date                             |     4 |
| crit_url                         | text                             |   var |
+----------------------------------+----------------------------------+-------+
Indices:  crit_crit_id_key
          crit_skey
lumiere=> \d crit_crit_id_key

Table    = crit_crit_id_key
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| crit_id                          | int4                             |     4 |
+----------------------------------+----------------------------------+-------+
lumiere=> \d crit_skey

Table    = crit_skey
+----------------------------------+----------------------------------+-------+
|              Field               |              Type                | Length|
+----------------------------------+----------------------------------+-------+
| films_id                         | int4                             |     4 |
| crit_url                         | text                             |   var |
+----------------------------------+----------------------------------+-------+

Christophe Labouisse : Cinéma, typographie, Unix
christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/
Le cinéma en Lumière : http://www.lumiere.org/

Re: [SQL] Strange behavior

From
Tom Lane
Date:
Christophe Labouisse <labouiss@cybercable.fr> writes:
> I tried to make a simple select on a single table and I find some
> strange results.
> select crit_url from crit where crit_url like 'films/%' order by crit_url;
> [ gives wrong results, whereas this gives right results: ]
> select crit_url from crit where crit_url like '%films/%' order by crit_url;

What locale and/or MULTIBYTE setting are you using?

There is a hack in the parser that tries to transform the first of these
into this:

select crit_url from crit where crit_url like 'films/%'
AND crit_url >= 'films/' AND crit_url <= 'films/\377'
order by crit_url;

in order to make it possible to use an index to restrict the scan.
(Since the parser doesn't know whether any indexes are available,
it does this whether there is an index or not.  But it doesn't do it
for patterns that don't require a specific match at the left.)

It occurs to me, however, that '\377' may not be a valid character
in all the character sets we support, and may not sort above all other
characters even if it is valid.  That could result in the <= clause
rejecting some entries that ought to match...
        regards, tom lane


Re: [SQL] Strange behavior

From
Christophe Labouisse
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > select crit_url from crit where crit_url like 'films/%' order by crit_url;
> > [ gives wrong results, whereas this gives right results: ]
> > select crit_url from crit where crit_url like '%films/%' order by crit_url;
>
> What locale and/or MULTIBYTE setting are you using?

fr_FR

>
> There is a hack in the parser that tries to transform the first of these
> into this:
>
> select crit_url from crit where crit_url like 'films/%'
> AND crit_url >= 'films/' AND crit_url <= 'films/\377'
> order by crit_url;

Well I guess that's it since \377 in fr_FR is ÿ (ydierisis).

Christophe Labouisse : Cinéma, typographie, Unix
christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/
Le cinéma en Lumière : http://www.lumiere.org/

Re: [SQL] Strange behavior

From
Tom Lane
Date:
Christophe Labouisse <labouiss@cybercable.fr> writes:
>> There is a hack in the parser that tries to transform the first of these
>> into this:
>> 
>> select crit_url from crit where crit_url like 'films/%'
>> AND crit_url >= 'films/' AND crit_url <= 'films/\377'
>> order by crit_url;

> Well I guess that's it since \377 in fr_FR is � (ydierisis).

And I suppose that sorts before, or the same as, plain y?  Oops.

As a temporary measure you could rebuild with USE_LOCALE turned on;
the parser code is set up not to generate the <= clause if that's
defined.  But we need a better solution.
        regards, tom lane


Re: [SQL] Strange behavior

From
Christophe Labouisse
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:


> > Well I guess that's it since \377 in fr_FR is ÿ (ydierisis).
>
> And I suppose that sorts before, or the same as, plain y?  Oops.

Yep.

>
> As a temporary measure you could rebuild with USE_LOCALE turned on;
> the parser code is set up not to generate the <= clause if that's
> defined.  But we need a better solution.

Hmm that's strange I though I compiled already with locale
enabled. Here is the configure line from config.status :

# ./configure  --with-tcl --with-perl --with-odbc --with-CC=/opt/pgcc/bin/gcc --
with-CXX=/opt/pgcc/bin/g++ --enable-locale

That's should be good no ?

Anyway I'll check it again. Anyway I think that even without locale
this optimization can lead to incorrect result if you have a a field
starting with \377.

Christophe Labouisse : Cinéma, typographie, Unix
christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/
Le cinéma en Lumière : http://www.lumiere.org/

Re: [SQL] Strange behavior

From
Tom Lane
Date:
Christophe Labouisse <labouiss@cybercable.fr> writes:
>> As a temporary measure you could rebuild with USE_LOCALE turned on;
>> the parser code is set up not to generate the <= clause if that's
>> defined.  But we need a better solution.

> Hmm that's strange I though I compiled already with locale
> enabled.

Hmm.  According to the cvs logs, G�ran Thyni applied a patch on Feb 2
to disable the \377 thing when USE_LOCALE is defined.  How old did you
say your source was?  Look in backend/parser/gram.y, routine
makeIndexable(), to see if there is an #ifdef USE_LOCALE; also check
gram.c to make sure it has the same text for the routine (I'm wondering
if you might have an out-of-date gram.c...)

> Anyway I'll check it again. Anyway I think that even without locale
> this optimization can lead to incorrect result if you have a a field
> starting with \377.

I agree, the code is wrong even without funny sort ordering.  We'll
need to find another way.
        regards, tom lane


Re: [SQL] Strange behavior

From
Christophe Labouisse
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Christophe Labouisse <labouiss@cybercable.fr> writes:
> > Hmm that's strange I though I compiled already with locale
> > enabled.
>
> Hmm.  According to the cvs logs, Göran Thyni applied a patch on Feb 2
> to disable the \377 thing when USE_LOCALE is defined.  How old did you
> say your source was?  Look in backend/parser/gram.y, routine

Ok I use postgresl 6.4.2 so before that patch. So I need to upgrade to
6.5 beta or to change my queries if I want to be sure of the results.

Cheers

Christophe Labouisse : Cinéma, typographie, Unix
christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/
Le cinéma en Lumière : http://www.lumiere.org/

Re: [SQL] Strange behavior

From
Bruce Momjian
Date:
Can someone comment on this?


[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
>  
> > > Well I guess that's it since \377 in fr_FR is _ (ydierisis).
> > 
> > And I suppose that sorts before, or the same as, plain y?  Oops.
> 
> Yep.
> 
> > 
> > As a temporary measure you could rebuild with USE_LOCALE turned on;
> > the parser code is set up not to generate the <= clause if that's
> > defined.  But we need a better solution.
> 
> Hmm that's strange I though I compiled already with locale
> enabled. Here is the configure line from config.status :
> 
> # ./configure  --with-tcl --with-perl --with-odbc --with-CC=/opt/pgcc/bin/gcc --
> with-CXX=/opt/pgcc/bin/g++ --enable-locale
> 
> That's should be good no ?
> 
> Anyway I'll check it again. Anyway I think that even without locale
> this optimization can lead to incorrect result if you have a a field
> starting with \377.
> 

[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> Christophe Labouisse : Cin_ma, typographie, Unix
> christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/
> Le cin_ma en Lumi_re : http://www.lumiere.org/


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Strange behavior

From
Bruce Momjian
Date:
> Christophe Labouisse <labouiss@cybercable.fr> writes:
> >> As a temporary measure you could rebuild with USE_LOCALE turned on;
> >> the parser code is set up not to generate the <= clause if that's
> >> defined.  But we need a better solution.
> 
> > Hmm that's strange I though I compiled already with locale
> > enabled.
> 
> Hmm.  According to the cvs logs, G�ran Thyni applied a patch on Feb 2
> to disable the \377 thing when USE_LOCALE is defined.  How old did you
> say your source was?  Look in backend/parser/gram.y, routine
> makeIndexable(), to see if there is an #ifdef USE_LOCALE; also check
> gram.c to make sure it has the same text for the routine (I'm wondering
> if you might have an out-of-date gram.c...)
> 
> > Anyway I'll check it again. Anyway I think that even without locale
> > this optimization can lead to incorrect result if you have a a field
> > starting with \377.
> 
> I agree, the code is wrong even without funny sort ordering.  We'll
> need to find another way.

OK, I agree, but I could never think of a better way.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026