Thread: Strange behavior
[It may be a repost, I had a little mess in my mail last weeks, sorry] I tried to make a simple select on a single table and I find some strange results. First of all I execute the following query : => select crit_url from crit where crit_url like 'films/%' order by crit_url; crit_url ------------------------------------- films/7-samurai.html [...] films/victor-pendant-qu-il.html films/violent-cop.html films/wild-things.html (53 rows) Now if it try this one : select crit_url from crit where crit_url like '%films/%' order by crit_url; crit_url ------------------------------------- films/7-samurai.html [...] films/victor-pendant-qu-il.html films/violent-cop.html films/wild-things.html films/y-aura-t-il-de-la-neige.html films/you-ve-got-mail.html#ab films/you-ve-got-mail.html#fvd films/you-ve-got-mail.html#fvd films/you-ve-got-mail.html#thb (58 rows) As you can notice there is 5 more rows in the second query that in the first and, as far as my SQL understanding goes, they should have in the first selection. Since there is an index (partly) on crit_url I tried to remove it but it does change anything. Any idea ? lumiere=> \d crit Table = crit +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | crit_id | int4 not null default nextval('c | 4 | | films_id | int4 not null | 4 | | crit_texte | text | var | | crit_date | date | 4 | | crit_url | text | var | +----------------------------------+----------------------------------+-------+ Indices: crit_crit_id_key crit_skey lumiere=> \d crit_crit_id_key Table = crit_crit_id_key +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | crit_id | int4 | 4 | +----------------------------------+----------------------------------+-------+ lumiere=> \d crit_skey Table = crit_skey +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | films_id | int4 | 4 | | crit_url | text | var | +----------------------------------+----------------------------------+-------+ Christophe Labouisse : Cinéma, typographie, Unix christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/ Le cinéma en Lumière : http://www.lumiere.org/
Christophe Labouisse <labouiss@cybercable.fr> writes: > I tried to make a simple select on a single table and I find some > strange results. > select crit_url from crit where crit_url like 'films/%' order by crit_url; > [ gives wrong results, whereas this gives right results: ] > select crit_url from crit where crit_url like '%films/%' order by crit_url; What locale and/or MULTIBYTE setting are you using? There is a hack in the parser that tries to transform the first of these into this: select crit_url from crit where crit_url like 'films/%' AND crit_url >= 'films/' AND crit_url <= 'films/\377' order by crit_url; in order to make it possible to use an index to restrict the scan. (Since the parser doesn't know whether any indexes are available, it does this whether there is an index or not. But it doesn't do it for patterns that don't require a specific match at the left.) It occurs to me, however, that '\377' may not be a valid character in all the character sets we support, and may not sort above all other characters even if it is valid. That could result in the <= clause rejecting some entries that ought to match... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > > select crit_url from crit where crit_url like 'films/%' order by crit_url; > > [ gives wrong results, whereas this gives right results: ] > > select crit_url from crit where crit_url like '%films/%' order by crit_url; > > What locale and/or MULTIBYTE setting are you using? fr_FR > > There is a hack in the parser that tries to transform the first of these > into this: > > select crit_url from crit where crit_url like 'films/%' > AND crit_url >= 'films/' AND crit_url <= 'films/\377' > order by crit_url; Well I guess that's it since \377 in fr_FR is ÿ (ydierisis). Christophe Labouisse : Cinéma, typographie, Unix christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/ Le cinéma en Lumière : http://www.lumiere.org/
Christophe Labouisse <labouiss@cybercable.fr> writes: >> There is a hack in the parser that tries to transform the first of these >> into this: >> >> select crit_url from crit where crit_url like 'films/%' >> AND crit_url >= 'films/' AND crit_url <= 'films/\377' >> order by crit_url; > Well I guess that's it since \377 in fr_FR is � (ydierisis). And I suppose that sorts before, or the same as, plain y? Oops. As a temporary measure you could rebuild with USE_LOCALE turned on; the parser code is set up not to generate the <= clause if that's defined. But we need a better solution. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > > Well I guess that's it since \377 in fr_FR is ÿ (ydierisis). > > And I suppose that sorts before, or the same as, plain y? Oops. Yep. > > As a temporary measure you could rebuild with USE_LOCALE turned on; > the parser code is set up not to generate the <= clause if that's > defined. But we need a better solution. Hmm that's strange I though I compiled already with locale enabled. Here is the configure line from config.status : # ./configure --with-tcl --with-perl --with-odbc --with-CC=/opt/pgcc/bin/gcc -- with-CXX=/opt/pgcc/bin/g++ --enable-locale That's should be good no ? Anyway I'll check it again. Anyway I think that even without locale this optimization can lead to incorrect result if you have a a field starting with \377. Christophe Labouisse : Cinéma, typographie, Unix christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/ Le cinéma en Lumière : http://www.lumiere.org/
Christophe Labouisse <labouiss@cybercable.fr> writes: >> As a temporary measure you could rebuild with USE_LOCALE turned on; >> the parser code is set up not to generate the <= clause if that's >> defined. But we need a better solution. > Hmm that's strange I though I compiled already with locale > enabled. Hmm. According to the cvs logs, G�ran Thyni applied a patch on Feb 2 to disable the \377 thing when USE_LOCALE is defined. How old did you say your source was? Look in backend/parser/gram.y, routine makeIndexable(), to see if there is an #ifdef USE_LOCALE; also check gram.c to make sure it has the same text for the routine (I'm wondering if you might have an out-of-date gram.c...) > Anyway I'll check it again. Anyway I think that even without locale > this optimization can lead to incorrect result if you have a a field > starting with \377. I agree, the code is wrong even without funny sort ordering. We'll need to find another way. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Christophe Labouisse <labouiss@cybercable.fr> writes: > > Hmm that's strange I though I compiled already with locale > > enabled. > > Hmm. According to the cvs logs, Göran Thyni applied a patch on Feb 2 > to disable the \377 thing when USE_LOCALE is defined. How old did you > say your source was? Look in backend/parser/gram.y, routine Ok I use postgresl 6.4.2 so before that patch. So I need to upgrade to 6.5 beta or to change my queries if I want to be sure of the results. Cheers Christophe Labouisse : Cinéma, typographie, Unix christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/ Le cinéma en Lumière : http://www.lumiere.org/
Can someone comment on this? [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > > > Well I guess that's it since \377 in fr_FR is _ (ydierisis). > > > > And I suppose that sorts before, or the same as, plain y? Oops. > > Yep. > > > > > As a temporary measure you could rebuild with USE_LOCALE turned on; > > the parser code is set up not to generate the <= clause if that's > > defined. But we need a better solution. > > Hmm that's strange I though I compiled already with locale > enabled. Here is the configure line from config.status : > > # ./configure --with-tcl --with-perl --with-odbc --with-CC=/opt/pgcc/bin/gcc -- > with-CXX=/opt/pgcc/bin/g++ --enable-locale > > That's should be good no ? > > Anyway I'll check it again. Anyway I think that even without locale > this optimization can lead to incorrect result if you have a a field > starting with \377. > [Charset ISO-8859-1 unsupported, filtering to ASCII...] > Christophe Labouisse : Cin_ma, typographie, Unix > christophe.labouisse@dial.oleane.com http://www.multimania.com/gabuzo/ > Le cin_ma en Lumi_re : http://www.lumiere.org/ -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Christophe Labouisse <labouiss@cybercable.fr> writes: > >> As a temporary measure you could rebuild with USE_LOCALE turned on; > >> the parser code is set up not to generate the <= clause if that's > >> defined. But we need a better solution. > > > Hmm that's strange I though I compiled already with locale > > enabled. > > Hmm. According to the cvs logs, G�ran Thyni applied a patch on Feb 2 > to disable the \377 thing when USE_LOCALE is defined. How old did you > say your source was? Look in backend/parser/gram.y, routine > makeIndexable(), to see if there is an #ifdef USE_LOCALE; also check > gram.c to make sure it has the same text for the routine (I'm wondering > if you might have an out-of-date gram.c...) > > > Anyway I'll check it again. Anyway I think that even without locale > > this optimization can lead to incorrect result if you have a a field > > starting with \377. > > I agree, the code is wrong even without funny sort ordering. We'll > need to find another way. OK, I agree, but I could never think of a better way. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026