Re: foreign_data test fails with non-C locale - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: foreign_data test fails with non-C locale
Date
Msg-id 200901111254.03722.peter_e@gmx.net
Whole thread Raw
In response to Re: foreign_data test fails with non-C locale  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: foreign_data test fails with non-C locale  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: foreign_data test fails with non-C locale  (Devrim GÜNDÜZ <devrim@gunduz.org>)
Re: foreign_data test fails with non-C locale  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
On Friday 09 January 2009 18:24:55 Tom Lane wrote:
> I don't think we are prepared to buy into a general policy that the
> regression tests should pass in *any* locale; maintaining a large
> number of variant expected-files isn't very practical.  However, the
> de facto policy is that we try to keep them passing in locales that
> are used by any of the regular developers.  I think it would be useful
> to have buildfarm members testing in a few common locales.

This called for an extensive test ... :-)

My glibc installation supplies 668 locales (locale -a), which appear to
represent about 225 distinct language/country combinations.  (The rest are
encoding variants.)

I ran the regression tests with all of them, and got 95 failures (out of 668).

15 out of the 95 failures are initdb not completing because the encoding
specified by the locale is not supported by PostgreSQL.  But it appears that
at least xx_XX.utf8 works for each of these cases, so the language is
supported in some way.

The remaining 80 failures are more-or-less linguistic issues that belong to
the following 26 language/country combinations:

az_AZ    sorts k < q < l; Turkish i
br_FR    sorts ch separately
crh_UA    Turkish i
cs_CZ    sorts ch separately; sorts st = s
cy_GB    sorts ch separately
da_DK    sorts aa = å > z
es_EC    sorts ch separately
es_US    sorts ch separately
et_EE    sorts v = w
fo_FO    sorts aa = å > z
ha_NG    sorts sh separately
hsb_DE    sorts ch separately
ig_NG    sorts ch separately; sorts sh separately
ik_CA    sorts ch separately
kl_GL    sorts aa = å > z
nb_NO    sorts aa = å > z
nn_NO    sorts aa = å > z
om_ET    sorts ch separately (> z); sorts sh separately
om_KE    sorts ch separately (> z); sorts sh separately
pl_PL    (some other inexplicable sorting regression)
sk_SK    sorts ch separately; sorts st = s
sv_SE    sorts v = w
tk_TM    sorts v = w
tr_CY    Turkish i
tr_TR    Turkish i
tt_RU    sorts k < q < l

The "Turkish i" failures are in the tsearch tests.  I'm not completely
comfortable that it's doing the right thing there.

We could easily get rid of the aa, ch, and v/w failures by adjusting the test
data, since the data is completely coincidental anyway.  I propose to do
that, and document these issues so that they can be avoided in future tests.

I'm not so worried about the other cases.

Also, considering that some of these alternative sorting rules appear to be
controversial even among users of the language (e.g., we have had actual bug
reports that the es_EC rule is wrong, and the sv_SE rule is also obsolete
according to the language regulators), it might be interesting to write a
small test program that can tell users how their current locale behaves in
known corner cases.


pgsql-hackers by date:

Previous
From: "Fujii Masao"
Date:
Subject: Re: Multiplexing SUGUSR1
Next
From: Simon Riggs
Date:
Subject: Re: Hot standby, slot ids and stuff