Thread: rules regression test failed on mingw

rules regression test failed on mingw

From
"Jaime Casanova"
Date:
Hi,

i'm seeing a fail in the rules regression, seems like it is not
ordering the results right even when the regression has an explicit
order by...

i'm in a mingw32 5.1 on xp sp2 using msys 1.0.10 and gcc 3.4.2

attached the regression.diffs
please make me know if i can provide more info

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachment

Re: rules regression test failed on mingw

From
Tom Lane
Date:
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
> i'm seeing a fail in the rules regression, seems like it is not
> ordering the results right even when the regression has an explicit
> order by...

What locale is this running in?
        regards, tom lane


Re: rules regression test failed on mingw

From
"Jaime Casanova"
Date:
On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
>> i'm seeing a fail in the rules regression, seems like it is not
>> ordering the results right even when the regression has an explicit
>> order by...
>
> What locale is this running in?
>

Seems this is Spanish_Spain.1252 and the encoding WIN1252


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: rules regression test failed on mingw

From
Tom Lane
Date:
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
> On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What locale is this running in?

> Seems this is Spanish_Spain.1252 and the encoding WIN1252

What it looks like is that the locale is intentionally sorting h after k
(or more likely the rule is ch after ck).  My Spanish is just about gone
... is that a sane behavior at all?
        regards, tom lane


Re: rules regression test failed on mingw

From
"Jaime Casanova"
Date:
On Mon, Dec 15, 2008 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What it looks like is that the locale is intentionally sorting h after k
> (or more likely the rule is ch after ck).  My Spanish is just about gone
> ... is that a sane behavior at all?
>

not at all... where can i check those rules?


--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: rules regression test failed on mingw

From
Tom Lane
Date:
"Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
> On Mon, Dec 15, 2008 at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What it looks like is that the locale is intentionally sorting h after k
>> (or more likely the rule is ch after ck).  My Spanish is just about gone
>> ... is that a sane behavior at all?

> not at all... where can i check those rules?

Well, one thing you should try is
select 'wieck'::text < 'wiech'::text;select 'wieck'::text > 'wiech'::text;

just to confirm whether the comparisons are actually working that way
or we've got some other issue.  You could also try initdb'ing in other
locales to see if the behavior changes.

I have no idea how to poke into the internals of Windows' locale
definitions.
        regards, tom lane


Re: rules regression test failed on mingw

From
Alvaro Herrera
Date:
Tom Lane wrote:
> "Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
> > On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> What locale is this running in?
> 
> > Seems this is Spanish_Spain.1252 and the encoding WIN1252
> 
> What it looks like is that the locale is intentionally sorting h after k
> (or more likely the rule is ch after ck).  My Spanish is just about gone
> ... is that a sane behavior at all?

It was sane behavior a couple of decades ago -- dictionaries used to
sort like this ("ch" was considered an independent letter, and sorted
between c and d).  I'm not sure if RAE did actually revoke this
behavior, or it's just that us are now too used to the idea that it's
obsolete.  If the former, we should be complaining to the glibc
developers.  If the latter, we should complain to our school Spanish
teachers ;-)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: rules regression test failed on mingw

From
"Jaime Casanova"
Date:
On Mon, Dec 15, 2008 at 10:26 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Tom Lane wrote:
>> "Jaime Casanova" <jcasanov@systemguards.com.ec> writes:
>> > On Mon, Dec 15, 2008 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> What locale is this running in?
>>
>> > Seems this is Spanish_Spain.1252 and the encoding WIN1252
>>
>> What it looks like is that the locale is intentionally sorting h after k
>> (or more likely the rule is ch after ck).  My Spanish is just about gone
>> ... is that a sane behavior at all?
>
> It was sane behavior a couple of decades ago -- dictionaries used to
> sort like this ("ch" was considered an independent letter, and sorted
> between c and d).

while 'ch' and 'll' are independent letters they sort as they were 'c'
and 'l'... that means that 'ch' should go before 'ck'


http://www.rae.es/rae/gestores/gespub000018.nsf/(voAnexos)/arch8100821B76809110C12571B80038BA4A/$File/CuestionesparaelFAQdeconsultas.htm#ap31

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: rules regression test failed on mingw

From
Alvaro Herrera
Date:
Jaime Casanova wrote:
> On Mon, Dec 15, 2008 at 10:26 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:

> > It was sane behavior a couple of decades ago -- dictionaries used to
> > sort like this ("ch" was considered an independent letter, and sorted
> > between c and d).
> 
> while 'ch' and 'll' are independent letters they sort as they were 'c'
> and 'l'... that means that 'ch' should go before 'ck'
> 
>
http://www.rae.es/rae/gestores/gespub000018.nsf/(voAnexos)/arch8100821B76809110C12571B80038BA4A/$File/CuestionesparaelFAQdeconsultas.htm#ap31

Interesting.  So they are both wrong, glibc and teachers.  We can file a
bug with glibc but I'm not sure we can do a lot about the other "bug".
Thanks for the research.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: rules regression test failed on mingw

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Jaime Casanova wrote:
>> while 'ch' and 'll' are independent letters they sort as they were 'c'
>> and 'l'... that means that 'ch' should go before 'ck'

> Interesting.  So they are both wrong, glibc and teachers.  We can file a
> bug with glibc but I'm not sure we can do a lot about the other "bug".
> Thanks for the research.

But I don't see this sorting behavior with glibc on Linux (Fedora 9 to
be exact, testing LC_COLLATE=es_ES.utf8).  Does the mingw build actually
use glibc's strcoll() code, or is it somehow depending on Windows system
functionality?

I'm also wondering if the behavior is somehow affected by encoding ...
        regards, tom lane


Re: rules regression test failed on mingw

From
Tom Lane
Date:
I wrote:
> But I don't see this sorting behavior with glibc on Linux (Fedora 9 to
> be exact, testing LC_COLLATE=es_ES.utf8).

BTW, I *do* see wieck < wiech in es_ES locale on HPUX 10.20, released
~1996.  So I think we have correctly identified the core issue, and the
only interesting question is why mingw isn't following a more up-to-date
sorting rule.

Is it worth installing a variant rules regression output file for this?
I'd rather not, since that file tends to change often.
        regards, tom lane


Re: rules regression test failed on mingw

From
"Jaime Casanova"
Date:
On Mon, Dec 15, 2008 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Well, one thing you should try is
>
>        select 'wieck'::text < 'wiech'::text;
>        select 'wieck'::text > 'wiech'::text;
>

Administrador@casanova10 ~/pg.build/8.4dev
$ bin/psql -a -f test.sql postgres
select 'wieck'::text < 'wiech'::text;?column?
----------t
(1 row)

select 'wiech'::text < 'wieck'::text;?column?
----------f
(1 row)


> just to confirm whether the comparisons are actually working that way
> or we've got some other issue.

ok, confirmed...

> You could also try initdb'ing in other
> locales to see if the behavior changes.
>

Actually, using Spanish_Ecuador.1252 (wich is the one a i should use
from the beginning anyway ;) gives correct results, maybe the other
behaviour is correct in spain... we have a lot of spanish languages ;)

Administrador@casanova10 ~/pg.build/8.4dev
$ bin/psql -a -f test.sql postgres
select 'wieck'::text < 'wiech'::text;?column?
----------f
(1 row)

select 'wiech'::text < 'wieck'::text;?column?
----------t
(1 row)



--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: rules regression test failed on mingw

From
"Jaime Casanova"
Date:
On Mon, Dec 15, 2008 at 11:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> But I don't see this sorting behavior with glibc on Linux (Fedora 9 to
>> be exact, testing LC_COLLATE=es_ES.utf8).
>

doh! i'm seeing this again in HEAD (and in 8.3.5) when executing make
installcheck on openSuse 11

when initdb'ing i get this, that i think is right 'cause i was using
--locale=es_EC.UTF8:

The database cluster will be initialized with locale es_EC.UTF8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "spanish".

then i can confirm that in psql:

postgres=# show LC_COLLATE;
 lc_collate
------------
 es_EC.UTF8
(1 row)

nevertheless i get (and of course failed regression tests):

postgres=# select 'wieck'::text < 'wiech'::text;
 ?column?
----------
 t
(1 row)

postgres=# select 'wieck'::text > 'wiech'::text;
 ?column?
----------
 f
(1 row)

even worse, seems like the ordering is case insensitive in both 8.3.5
and HEAD, is this intended?

regression=# select 'S1' union all select 's1'
regression-# union all
regression-# select 'S2' union all select 's2'
regression-# order by 1;
 ?column?
----------
 s1
 S1
 s2
 S2
(4 rows)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachment