Thread: Is this a BUG? Is there anyone has the same problem?
Providing there is simple table, called "test_xx", which has only one column, called "aa" CHAR(16), and the column is set as primary key. Put in some data such as following. ( _ stands for a space character) 'AA_B' 'AAB' 'BB_123' 'BB123' Execute following statement with psql. SELECT * FROM test_xx ORDER BY aa; I 've got ... 'AAB' 'AA_B' 'BB123' 'BB_123' But if excute SELECT subStr(aa, 1,4) FROM test_xx ORDER BY aa; I've got .. 'AA_B' 'AAB' BB_12' BB123' jack
"jack" <datactrl@tpg.com.au> writes: > [ funny ordering behavior ] What locale is your database running in? regards, tom lane
Thanks, Tom > What locale is your database running in? My locale is en_AU (on Redhat 7.2) jack
"jack" <datactrl@tpg.com.au> writes: >> What locale is your database running in? > My locale is en_AU (on Redhat 7.2) Hmph. It seems to be a peculiarity of the locale sorting rules for English. Using RedHat 7.2, I made a file containing 3 lines, the last of which has one trailing blank: [tgl@rh1 tgl]$ cat test AAB AA B AAB -- hmm, can't see the spaces very well, so do this: [tgl@rh1 tgl]$ sed 's/ /_/g' test AAB AA_B AAB_ -- Now sort under Aussie rules: [tgl@rh1 tgl]$ LANG=en_AU sort test AAB AA B AAB -- uh, let's try looking to see where the spaces are: [tgl@rh1 tgl]$ LANG=en_AU sort test | sed 's/ /_/g' AAB AA_B AAB_ -- Not too consistent, eh? I get the same results with en_US though: [tgl@rh1 tgl]$ LANG=en_US sort test | sed 's/ /_/g' AAB AA_B AAB_ -- but traditional "C" locale does this: [tgl@rh1 tgl]$ LANG=C sort test | sed 's/ /_/g' AA_B AAB AAB_ The reason that your SQL tests reflect this is that comparisons for type CHAR(n) remove any trailing blanks before comparing; but the result of substr() is of type TEXT, so it assumes trailing blanks are significant. So the data you were sorting were in the one case effectively 'AA B' 'AAB' 'BB 123' 'BB123' and in the other case 'AA B' 'AAB ' 'BB 1' 'BB12' and the locale sort rules treat 'AAB' differently from 'AAB '. If you think that's a bug, you can take it up with whoever maintains Linux's locale rules. It ain't our bug though. (You might prefer to initdb under C locale if you'd rather sort according to C rules.) regards, tom lane
Hi Tom. This seems to be a bit of a FAQ at the moment... Do you think it should be added to the release notes? Maybe the people that build the packages for different distributions of Linux need to have some kind of warning in the installation scripts which let the users pic the locale used for Postgres by default. Cheers... -- David Stanaway On Mon, 2002-04-22 at 11:35, Tom Lane wrote: > "jack" <datactrl@tpg.com.au> writes: > >> What locale is your database running in? > > > My locale is en_AU (on Redhat 7.2) > > Hmph. It seems to be a peculiarity of the locale sorting rules for > English. Using RedHat 7.2, I made a file containing 3 lines, the last > of which has one trailing blank: > > [tgl@rh1 tgl]$ cat test > AAB > AA B > AAB > > -- hmm, can't see the spaces very well, so do this: > > [tgl@rh1 tgl]$ sed 's/ /_/g' test > AAB > AA_B > AAB_ > > -- Now sort under Aussie rules: > > [tgl@rh1 tgl]$ LANG=en_AU sort test > AAB > AA B > AAB > > -- uh, let's try looking to see where the spaces are: > > [tgl@rh1 tgl]$ LANG=en_AU sort test | sed 's/ /_/g' > AAB > AA_B > AAB_ > > -- Not too consistent, eh? I get the same results with en_US though: > > [tgl@rh1 tgl]$ LANG=en_US sort test | sed 's/ /_/g' > AAB > AA_B > AAB_ > > -- but traditional "C" locale does this: > > [tgl@rh1 tgl]$ LANG=C sort test | sed 's/ /_/g' > AA_B > AAB > AAB_ > > > The reason that your SQL tests reflect this is that comparisons for type > CHAR(n) remove any trailing blanks before comparing; but the result of > substr() is of type TEXT, so it assumes trailing blanks are significant. > So the data you were sorting were in the one case effectively > > 'AA B' > 'AAB' > 'BB 123' > 'BB123' > > and in the other case > > 'AA B' > 'AAB ' > 'BB 1' > 'BB12' > > and the locale sort rules treat 'AAB' differently from 'AAB '. > > If you think that's a bug, you can take it up with whoever maintains > Linux's locale rules. It ain't our bug though. (You might prefer > to initdb under C locale if you'd rather sort according to C rules.) > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
David Stanaway <david@stanaway.net> writes: > This seems to be a bit of a FAQ at the moment... Do you think it should > be added to the release notes? Maybe the people that build the packages > for different distributions of Linux need to have some kind of warning > in the installation scripts which let the users pic the locale used for > Postgres by default. I think it is mentioned someplace. Maybe we could make it more prominent. There's been some talk of having initdb always report the locale settings being used, but as of 7.2 it doesn't warn you about non-C locales. (7.1 did; 7.2 fails to because initdb is overenthusiastic about hiding backend output :-() regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
David Stanaway <david@stanaway.net> writes: > This seems to be a bit of a FAQ at the moment... Do you think it should > be added to the release notes? Maybe the people that build the packages > for different distributions of Linux need to have some kind of warning > in the installation scripts which let the users pic the locale used for > Postgres by default. I think it is mentioned someplace. Maybe we could make it more prominent. There's been some talk of having initdb always report the locale settings being used, but as of 7.2 it doesn't warn you about non-C locales. (7.1 did; 7.2 fails to because initdb is overenthusiastic about hiding backend output :-() regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html