Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5); - Mailing list pgsql-bugs

From Reko Turja
Subject Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);
Date
Msg-id 651C85285757450B8B88749830A53680@Rivendell
Whole thread Raw
Responses Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);
List pgsql-bugs
Tom Lane wrote:

> Indeed.  To try to put some scope on the problem, I made an idiot
> little
> program that just generates some random UTF8 strings and sees
> whether
> strcoll and strxfrm sort them alike.  Attached are that program, a
> even
> more idiot little shell script that runs it over all available UTF8
> locales, and the results on my RHEL6 box.  While de_DE seems to be
> the
> worst-broken locale, it's far from the only one.
>
> Please try this on as many platforms as you can get hold of ...

Platform - FreeBSD 10.2, everything built from source using clang:

./tryalllocales.sh
Using LC_COLLATE = "af_ZA.UTF-8"
Using LC_CTYPE = "af_ZA.UTF-8"
af_ZA.UTF-8 good
Using LC_COLLATE = "am_ET.UTF-8"
Using LC_CTYPE = "am_ET.UTF-8"
am_ET.UTF-8 good
Using LC_COLLATE = "be_BY.UTF-8"
Using LC_CTYPE = "be_BY.UTF-8"
be_BY.UTF-8 good
Using LC_COLLATE = "bg_BG.UTF-8"
Using LC_CTYPE = "bg_BG.UTF-8"
bg_BG.UTF-8 good
Using LC_COLLATE = "ca_AD.UTF-8"
Using LC_CTYPE = "ca_AD.UTF-8"
ca_AD.UTF-8 good
Using LC_COLLATE = "ca_ES.UTF-8"
Using LC_CTYPE = "ca_ES.UTF-8"
ca_ES.UTF-8 good
Using LC_COLLATE = "ca_FR.UTF-8"
Using LC_CTYPE = "ca_FR.UTF-8"
ca_FR.UTF-8 good
Using LC_COLLATE = "ca_IT.UTF-8"
Using LC_CTYPE = "ca_IT.UTF-8"
ca_IT.UTF-8 good
Using LC_COLLATE = "cs_CZ.UTF-8"
Using LC_CTYPE = "cs_CZ.UTF-8"
cs_CZ.UTF-8 good
Using LC_COLLATE = "da_DK.UTF-8"
Using LC_CTYPE = "da_DK.UTF-8"
da_DK.UTF-8 good
Using LC_COLLATE = "de_AT.UTF-8"
Using LC_CTYPE = "de_AT.UTF-8"
de_AT.UTF-8 good
Using LC_COLLATE = "de_CH.UTF-8"
Using LC_CTYPE = "de_CH.UTF-8"
de_CH.UTF-8 good
Using LC_COLLATE = "de_DE.UTF-8"
Using LC_CTYPE = "de_DE.UTF-8"
de_DE.UTF-8 good
Using LC_COLLATE = "el_GR.UTF-8"
Using LC_CTYPE = "el_GR.UTF-8"
el_GR.UTF-8 good
Using LC_COLLATE = "en_AU.UTF-8"
Using LC_CTYPE = "en_AU.UTF-8"
en_AU.UTF-8 good
Using LC_COLLATE = "en_CA.UTF-8"
Using LC_CTYPE = "en_CA.UTF-8"
en_CA.UTF-8 good
Using LC_COLLATE = "en_GB.UTF-8"
Using LC_CTYPE = "en_GB.UTF-8"
en_GB.UTF-8 good
Using LC_COLLATE = "en_IE.UTF-8"
Using LC_CTYPE = "en_IE.UTF-8"
en_IE.UTF-8 good
Using LC_COLLATE = "en_NZ.UTF-8"
Using LC_CTYPE = "en_NZ.UTF-8"
en_NZ.UTF-8 good
Using LC_COLLATE = "en_US.UTF-8"
Using LC_CTYPE = "en_US.UTF-8"
en_US.UTF-8 good
Using LC_COLLATE = "es_ES.UTF-8"
Using LC_CTYPE = "es_ES.UTF-8"
es_ES.UTF-8 good
Using LC_COLLATE = "et_EE.UTF-8"
Using LC_CTYPE = "et_EE.UTF-8"
et_EE.UTF-8 good
Using LC_COLLATE = "eu_ES.UTF-8"
Using LC_CTYPE = "eu_ES.UTF-8"
eu_ES.UTF-8 good
Using LC_COLLATE = "fi_FI.UTF-8"
Using LC_CTYPE = "fi_FI.UTF-8"
fi_FI.UTF-8 good
Using LC_COLLATE = "fr_BE.UTF-8"
Using LC_CTYPE = "fr_BE.UTF-8"
fr_BE.UTF-8 good
Using LC_COLLATE = "fr_CA.UTF-8"
Using LC_CTYPE = "fr_CA.UTF-8"
fr_CA.UTF-8 good
Using LC_COLLATE = "fr_CH.UTF-8"
Using LC_CTYPE = "fr_CH.UTF-8"
fr_CH.UTF-8 good
Using LC_COLLATE = "fr_FR.UTF-8"
Using LC_CTYPE = "fr_FR.UTF-8"
fr_FR.UTF-8 good
Using LC_COLLATE = "he_IL.UTF-8"
Using LC_CTYPE = "he_IL.UTF-8"
he_IL.UTF-8 good
Using LC_COLLATE = "hr_HR.UTF-8"
Using LC_CTYPE = "hr_HR.UTF-8"
hr_HR.UTF-8 good
Using LC_COLLATE = "hu_HU.UTF-8"
Using LC_CTYPE = "hu_HU.UTF-8"
hu_HU.UTF-8 good
Using LC_COLLATE = "hy_AM.UTF-8"
Using LC_CTYPE = "hy_AM.UTF-8"
hy_AM.UTF-8 good
Using LC_COLLATE = "is_IS.UTF-8"
Using LC_CTYPE = "is_IS.UTF-8"
is_IS.UTF-8 good
Using LC_COLLATE = "it_CH.UTF-8"
Using LC_CTYPE = "it_CH.UTF-8"
it_CH.UTF-8 good
Using LC_COLLATE = "it_IT.UTF-8"
Using LC_CTYPE = "it_IT.UTF-8"
it_IT.UTF-8 good
Using LC_COLLATE = "ja_JP.UTF-8"
Using LC_CTYPE = "ja_JP.UTF-8"
ja_JP.UTF-8 good
Using LC_COLLATE = "kk_KZ.UTF-8"
Using LC_CTYPE = "kk_KZ.UTF-8"
kk_KZ.UTF-8 good
Using LC_COLLATE = "ko_KR.UTF-8"
Using LC_CTYPE = "ko_KR.UTF-8"
ko_KR.UTF-8 good
Using LC_COLLATE = "lt_LT.UTF-8"
Using LC_CTYPE = "lt_LT.UTF-8"
lt_LT.UTF-8 good
Using LC_COLLATE = "lv_LV.UTF-8"
Using LC_CTYPE = "lv_LV.UTF-8"
lv_LV.UTF-8 good
Using LC_COLLATE = "mn_MN.UTF-8"
Using LC_CTYPE = "mn_MN.UTF-8"
mn_MN.UTF-8 good
Using LC_COLLATE = "nb_NO.UTF-8"
Using LC_CTYPE = "nb_NO.UTF-8"
nb_NO.UTF-8 good
Using LC_COLLATE = "nl_BE.UTF-8"
Using LC_CTYPE = "nl_BE.UTF-8"
nl_BE.UTF-8 good
Using LC_COLLATE = "nl_NL.UTF-8"
Using LC_CTYPE = "nl_NL.UTF-8"
nl_NL.UTF-8 good
Using LC_COLLATE = "nn_NO.UTF-8"
Using LC_CTYPE = "nn_NO.UTF-8"
nn_NO.UTF-8 good
Using LC_COLLATE = "no_NO.UTF-8"
Using LC_CTYPE = "no_NO.UTF-8"
no_NO.UTF-8 good
Using LC_COLLATE = "pl_PL.UTF-8"
Using LC_CTYPE = "pl_PL.UTF-8"
pl_PL.UTF-8 good
Using LC_COLLATE = "pt_BR.UTF-8"
Using LC_CTYPE = "pt_BR.UTF-8"
pt_BR.UTF-8 good
Using LC_COLLATE = "pt_PT.UTF-8"
Using LC_CTYPE = "pt_PT.UTF-8"
pt_PT.UTF-8 good
Using LC_COLLATE = "ro_RO.UTF-8"
Using LC_CTYPE = "ro_RO.UTF-8"
ro_RO.UTF-8 good
Using LC_COLLATE = "ru_RU.UTF-8"
Using LC_CTYPE = "ru_RU.UTF-8"
ru_RU.UTF-8 good
Using LC_COLLATE = "sk_SK.UTF-8"
Using LC_CTYPE = "sk_SK.UTF-8"
sk_SK.UTF-8 good
Using LC_COLLATE = "sl_SI.UTF-8"
Using LC_CTYPE = "sl_SI.UTF-8"
sl_SI.UTF-8 good
Using LC_COLLATE = "sr_YU.UTF-8"
Using LC_CTYPE = "sr_YU.UTF-8"
sr_YU.UTF-8 good
Using LC_COLLATE = "sv_SE.UTF-8"
Using LC_CTYPE = "sv_SE.UTF-8"
sv_SE.UTF-8 good
Using LC_COLLATE = "tr_TR.UTF-8"
Using LC_CTYPE = "tr_TR.UTF-8"
tr_TR.UTF-8 good
Using LC_COLLATE = "uk_UA.UTF-8"
Using LC_CTYPE = "uk_UA.UTF-8"
uk_UA.UTF-8 good
Using LC_COLLATE = "zh_CN.UTF-8"
Using LC_CTYPE = "zh_CN.UTF-8"
zh_CN.UTF-8 good
Using LC_COLLATE = "zh_HK.UTF-8"
Using LC_CTYPE = "zh_HK.UTF-8"
zh_HK.UTF-8 good
Using LC_COLLATE = "zh_TW.UTF-8"
Using LC_CTYPE = "zh_TW.UTF-8"
zh_TW.UTF-8 good

-Reko

pgsql-bugs by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Next
From: Tom Lane
Date:
Subject: Re: BUG #14042: bug, PostgreSQL not cleanup temp table info after crash.