Thread: rules regression test failed on mingw
Hi, i'm seeing a fail in the rules regression, seems like it is not ordering the results right even when the regression has an explicit order by... i'm in a mingw32 5.1 on xp sp2 using msys 1.0.10 and gcc 3.4.2 attached the regression.diffs please make me know if i can provide more info -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Attachment
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > i'm seeing a fail in the rules regression, seems like it is not > ordering the results right even when the regression has an explicit > order by... What locale is this running in? regards, tom lane
On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: >> i'm seeing a fail in the rules regression, seems like it is not >> ordering the results right even when the regression has an explicit >> order by... > > What locale is this running in? > Seems this is Spanish_Spain.1252 and the encoding WIN1252 -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What locale is this running in? > Seems this is Spanish_Spain.1252 and the encoding WIN1252 What it looks like is that the locale is intentionally sorting h after k (or more likely the rule is ch after ck). My Spanish is just about gone ... is that a sane behavior at all? regards, tom lane
On Mon, Dec 15, 2008 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What it looks like is that the locale is intentionally sorting h after k > (or more likely the rule is ch after ck). My Spanish is just about gone > ... is that a sane behavior at all? > not at all... where can i check those rules? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > On Mon, Dec 15, 2008 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> What it looks like is that the locale is intentionally sorting h after k >> (or more likely the rule is ch after ck). My Spanish is just about gone >> ... is that a sane behavior at all? > not at all... where can i check those rules? Well, one thing you should try is select 'wieck'::text < 'wiech'::text;select 'wieck'::text > 'wiech'::text; just to confirm whether the comparisons are actually working that way or we've got some other issue. You could also try initdb'ing in other locales to see if the behavior changes. I have no idea how to poke into the internals of Windows' locale definitions. regards, tom lane
Tom Lane wrote: > "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: > > On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> What locale is this running in? > > > Seems this is Spanish_Spain.1252 and the encoding WIN1252 > > What it looks like is that the locale is intentionally sorting h after k > (or more likely the rule is ch after ck). My Spanish is just about gone > ... is that a sane behavior at all? It was sane behavior a couple of decades ago -- dictionaries used to sort like this ("ch" was considered an independent letter, and sorted between c and d). I'm not sure if RAE did actually revoke this behavior, or it's just that us are now too used to the idea that it's obsolete. If the former, we should be complaining to the glibc developers. If the latter, we should complain to our school Spanish teachers ;-) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Dec 15, 2008 at 10:26 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Tom Lane wrote: >> "Jaime Casanova" <jcasanov@systemguards.com.ec> writes: >> > On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> What locale is this running in? >> >> > Seems this is Spanish_Spain.1252 and the encoding WIN1252 >> >> What it looks like is that the locale is intentionally sorting h after k >> (or more likely the rule is ch after ck). My Spanish is just about gone >> ... is that a sane behavior at all? > > It was sane behavior a couple of decades ago -- dictionaries used to > sort like this ("ch" was considered an independent letter, and sorted > between c and d). while 'ch' and 'll' are independent letters they sort as they were 'c' and 'l'... that means that 'ch' should go before 'ck' http://www.rae.es/rae/gestores/gespub000018.nsf/(voAnexos)/arch8100821B76809110C12571B80038BA4A/$File/CuestionesparaelFAQdeconsultas.htm#ap31 -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova wrote: > On Mon, Dec 15, 2008 at 10:26 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > It was sane behavior a couple of decades ago -- dictionaries used to > > sort like this ("ch" was considered an independent letter, and sorted > > between c and d). > > while 'ch' and 'll' are independent letters they sort as they were 'c' > and 'l'... that means that 'ch' should go before 'ck' > > http://www.rae.es/rae/gestores/gespub000018.nsf/(voAnexos)/arch8100821B76809110C12571B80038BA4A/$File/CuestionesparaelFAQdeconsultas.htm#ap31 Interesting. So they are both wrong, glibc and teachers. We can file a bug with glibc but I'm not sure we can do a lot about the other "bug". Thanks for the research. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Jaime Casanova wrote: >> while 'ch' and 'll' are independent letters they sort as they were 'c' >> and 'l'... that means that 'ch' should go before 'ck' > Interesting. So they are both wrong, glibc and teachers. We can file a > bug with glibc but I'm not sure we can do a lot about the other "bug". > Thanks for the research. But I don't see this sorting behavior with glibc on Linux (Fedora 9 to be exact, testing LC_COLLATE=es_ES.utf8). Does the mingw build actually use glibc's strcoll() code, or is it somehow depending on Windows system functionality? I'm also wondering if the behavior is somehow affected by encoding ... regards, tom lane
I wrote: > But I don't see this sorting behavior with glibc on Linux (Fedora 9 to > be exact, testing LC_COLLATE=es_ES.utf8). BTW, I *do* see wieck < wiech in es_ES locale on HPUX 10.20, released ~1996. So I think we have correctly identified the core issue, and the only interesting question is why mingw isn't following a more up-to-date sorting rule. Is it worth installing a variant rules regression output file for this? I'd rather not, since that file tends to change often. regards, tom lane
On Mon, Dec 15, 2008 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Well, one thing you should try is > > select 'wieck'::text < 'wiech'::text; > select 'wieck'::text > 'wiech'::text; > Administrador@casanova10 ~/pg.build/8.4dev $ bin/psql -a -f test.sql postgres select 'wieck'::text < 'wiech'::text;?column? ----------t (1 row) select 'wiech'::text < 'wieck'::text;?column? ----------f (1 row) > just to confirm whether the comparisons are actually working that way > or we've got some other issue. ok, confirmed... > You could also try initdb'ing in other > locales to see if the behavior changes. > Actually, using Spanish_Ecuador.1252 (wich is the one a i should use from the beginning anyway ;) gives correct results, maybe the other behaviour is correct in spain... we have a lot of spanish languages ;) Administrador@casanova10 ~/pg.build/8.4dev $ bin/psql -a -f test.sql postgres select 'wieck'::text < 'wiech'::text;?column? ----------f (1 row) select 'wiech'::text < 'wieck'::text;?column? ----------t (1 row) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Mon, Dec 15, 2008 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> But I don't see this sorting behavior with glibc on Linux (Fedora 9 to >> be exact, testing LC_COLLATE=es_ES.utf8). > doh! i'm seeing this again in HEAD (and in 8.3.5) when executing make installcheck on openSuse 11 when initdb'ing i get this, that i think is right 'cause i was using --locale=es_EC.UTF8: The database cluster will be initialized with locale es_EC.UTF8. The default database encoding has accordingly been set to UTF8. The default text search configuration will be set to "spanish". then i can confirm that in psql: postgres=# show LC_COLLATE; lc_collate ------------ es_EC.UTF8 (1 row) nevertheless i get (and of course failed regression tests): postgres=# select 'wieck'::text < 'wiech'::text; ?column? ---------- t (1 row) postgres=# select 'wieck'::text > 'wiech'::text; ?column? ---------- f (1 row) even worse, seems like the ordering is case insensitive in both 8.3.5 and HEAD, is this intended? regression=# select 'S1' union all select 's1' regression-# union all regression-# select 'S2' union all select 's2' regression-# order by 1; ?column? ---------- s1 S1 s2 S2 (4 rows) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157