Thread: illegal sort order
I insalled Postgres 8 in Windows XP with default settings I xreated database with encoding unicode. I noticed that the sort order of accented characters is B � � C. � � C this is totally incorrect! It is interesting that names beginning with C are not contiguous: between C. and C are accented characters. The correct sort order (by using character code values) shoud be B C C. � � � � How to force the correct sort order or at least move accented characters ���� to end of sorted list ? Andrus.
On Tue, Jul 12, 2005 at 20:45:37 +0300, Andrus <eetasoft@online.ee> wrote: > > How to force the correct sort order or at least move accented characters > ÕÄÖÜ to end of sorted list ? Sort order depends on the locale used in initdb. If you want data sorted by the codes used to represent the data, then you might want to initdb with a locale of "C". Doing an initdb will require a dump and reload.
> Sort order depends on the locale used in initdb. If you want data sorted > by the codes used to represent the data, then you might want to initdb > with a locale of "C". Doing an initdb will require a dump and reload. Bruno, thank you. SHOW ALL command returns the following: "client_encoding";"UNICODE" "lc_collate";"Estonian_Estonia.1257" "lc_ctype";"Estonian_Estonia.1257" "lc_messages";"Estonian_Estonia.1257" "lc_monetary";"Estonian_Estonia.1257" "lc_numeric";"Estonian_Estonia.1257" "lc_time";"Estonian_Estonia.1257" Unfortunately, the sort order is incorrect: "A" "S" "B" "C" "�" "�" "�" "�" "D" "E" "F" "G" accented charactes must be at the end of alphabet. Why Postgres uses VERY stange sort order ? I do'nt believe that this order exists in any locale. Also, UPPER() function causes error ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. Is it possible to fix the sort order and use UPPER() function in this locale without dump and reload ? Andrus.
On Tue, Jul 12, 2005 at 22:09:40 +0300, Andrus <eetasoft@online.ee> wrote: > > Sort order depends on the locale used in initdb. If you want data sorted > > by the codes used to represent the data, then you might want to initdb > > with a locale of "C". Doing an initdb will require a dump and reload. > > Bruno, thank you. > > SHOW ALL command returns the following: > > "client_encoding";"UNICODE" > "lc_collate";"Estonian_Estonia.1257" > "lc_ctype";"Estonian_Estonia.1257" > "lc_messages";"Estonian_Estonia.1257" > "lc_monetary";"Estonian_Estonia.1257" > "lc_numeric";"Estonian_Estonia.1257" > "lc_time";"Estonian_Estonia.1257" Are you running this on Windows? There is a problem with Unicode there. I don't know what the exact symptoms are, so I don't know whether or not that explains your problem. > > Unfortunately, the sort order is incorrect: > > "A" > "S" > "B" > "C" > "Ü" > "Ö" > "Ä" > "Õ" > "D" > "E" > "F" > "G" > > accented charactes must be at the end of alphabet. > > Why Postgres uses VERY stange sort order ? I do'nt believe that this order > exists in any locale. Sort order is determined by the locale. If the locale doesn't have the correct sort order you need to fix the locale definition in your OS or use another locale. > > Also, UPPER() function causes error > > ERROR: invalid multibyte character for locale > HINT: The server's LC_CTYPE locale is probably incompatible with the > database encoding. > > Is it possible to fix the sort order and use UPPER() function in this > locale without dump and reload ? > > Andrus. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Bruno Wolff III <bruno@wolff.to> writes: > Andrus <eetasoft@online.ee> wrote: >> SHOW ALL command returns the following: >> >> "client_encoding";"UNICODE" >> "lc_collate";"Estonian_Estonia.1257" >> "lc_ctype";"Estonian_Estonia.1257" >> "lc_messages";"Estonian_Estonia.1257" >> "lc_monetary";"Estonian_Estonia.1257" >> "lc_numeric";"Estonian_Estonia.1257" >> "lc_time";"Estonian_Estonia.1257" > Are you running this on Windows? There is a problem with Unicode there. Windows or not, the locale name suggests *very* strongly that it is not a Unicode-using locale. If the database encoding doesn't match what the locale is expecting then strange sort order is exactly what you are likely to get. We should defend against these sorts of configuration problems more thoroughly, but it seems to be hard in general to determine what character set a locale is actually using :-( regards, tom lane
> I insalled Postgres 8 in Windows XP with default settings I > xreated database with encoding unicode. Unicode is not currently supported when the server runs on Win32, see http://www.postgresql.org/docs/faqs.FAQ_windows.html#2.6. You'll need to pick a different encoding if you need locale-aware sorting. This is something we hope to have fixed in 8.1. //Magnus
> Unicode is not currently supported when the server runs on Win32, see > http://www.postgresql.org/docs/faqs.FAQ_windows.html#2.6. You'll need to > pick a different encoding if you need locale-aware sorting. I need to upsize tables from other dbms. This dbms does not support unicode. I have tables which contain estonian language characters in one column (using windows-1257 encoding) and russian langugage characters in other column (using windows-1251 encoding) as single byte characters. I use Postgres through ODBC. I tried the following encodings for upsize this table with the following results: SQL_ASCII - non ascii characters are converted to question marks. This is problably by ODBC driver since pgAdmin shows them. LATIN1, LATIN4 - some characters cause error if I try to upsize my data to Postgres (I use odbc for upsize). If unicode is not working, I need fully transparent access through ODBC: all bytes in range 32 .. 255 must be stored in unmodified form in CHARACTER and TEXT fields and returned in unmodified form to my ODBC client. This does not allow UPPER() function to work but at least sorting is somewhat meaningful (when sorted by unsigned binary values.) Unfortunately it seems that ODBC driver replaces some charactes to question marks in case on SQL_ASCII . So I need SQL_UNSIGNED_BINARY or SQL_TRANSPARENT encoding. I tried to use binary data types insted of char and text types but pgADMIN show them as octal strings. I have no idea, meybe it is possible configure or modify odbc driver. I looked into odbc drivers sources but havent found place which converts chars>0x80 to question marks. Any idea ? When fixed unicode version (8.1?) will be available ? Andrus.