Thread: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Adam PAPAI
Date:
Dear List, I'm trying to solve this problem for more than 2 days, but I have no other idea. My system is: FreeBSD titanium 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Tested PgSQL versions are: 8.4 and 9.0.3 (fresh install using ports) initdb flags are: --encoding=UTF-8 --locale=hu_HU.UTF-8 Initdb output: The files belonging to this database system will be owned by user "pgsql". This user must also own the server process. The database cluster will be initialized with locale hu_HU.UTF-8. The default text search configuration will be set to "hungarian". creating directory /usr/local/pgsql/data ... ok creating subdirectories ... ok selecting default max_connections ... 40 selecting default shared_buffers ... 28MB creating configuration files ... ok creating template1 database in /usr/local/pgsql/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok step 1: createdb -U wooh test --encoding=UTF-8 --locale=hu_HU.UTF-8 step 2: psql -U wooh -l List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -----------+-------+----------+-------------+-------------+------------------- postgres | pgsql | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8 | template0 | pgsql | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8 | =c/pgsql + | | | | | pgsql=CTc/pgsql template1 | pgsql | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8 | =c/pgsql + | | | | | pgsql=CTc/pgsql test | wooh | UTF8 | hu_HU.UTF-8 | hu_HU.UTF-8 | Everything seems to be OK. let's create a table with idn, and name. step 3: test=# create table test (idn serial primary key not null, name varchar(255)); NOTICE: CREATE TABLE will create implicit sequence "test_idn_seq" for serial column "test.idn" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE step 4: [root@titanium /home/wooh]# file insert.sql insert.sql: UTF-8 Unicode text step 5: [root@titanium /home/wooh]# cat insert.sql INSERT INTO test (name) values ( 'a' ); INSERT INTO test (name) values ( 'á' ); INSERT INTO test (name) values ( 'b' ); INSERT INTO test (name) values ( 'c' ); INSERT INTO test (name) values ( 'd' ); INSERT INTO test (name) values ( 'e' ); INSERT INTO test (name) values ( 'é' ); INSERT INTO test (name) values ( 'f' ); INSERT INTO test (name) values ( 'g' ); INSERT INTO test (name) values ( 'h' ); step 6: [root@titanium /home/wooh]# psql -U wooh test < insert.sql INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 INSERT 0 1 step 7: [root@titanium /home/wooh]# psql -U wooh test psql (9.0.3) Type "help" for help. test=# select * from test order by name asc; idn | name -----+------ 1 | a 3 | b 4 | c 5 | d 6 | e 8 | f 9 | g 10 | h 2 | á 7 | é (10 rows) test=# show lc_collate; lc_collate ------------- hu_HU.UTF-8 (1 row) test=# show lc_ctype; lc_ctype ------------- hu_HU.UTF-8 (1 row) The sort order is incorrect, and I cannot find out why. Everything seems to be Ok. (locale, collate, encoding) But this whole thing works if I chose "LATIN2" encoding with "hu_HU.ISO8859-2" locale. Unfortunately we use UTF-8 databases, so I need to use the UTF8 Encoding with hu_HU.UTF-8 Any ideas? Additional information: [root@titanium /home/wooh]# pg_controldata /usr/local/pgsql/data/ pg_control version number: 903 Catalog version number: 201008051 Database system identifier: 5570517093231621070 Database cluster state: in production pg_control last modified: Sun Feb 6 11:19:27 2011 Latest checkpoint location: 0/1610CA0 Prior checkpoint location: 0/15F8028 Latest checkpoint's REDO location: 0/1610CA0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/675 Latest checkpoint's NextOID: 24576 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 654 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Time of latest checkpoint: Sun Feb 6 11:19:19 2011 Minimum recovery ending location: 0/0 Backup start location: 0/0 Current wal_level setting: minimal Current max_connections setting: 40 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value -- Adam PAPAI BSD Support Service http://www.bsdsupportservice.hu E-mail: adam.papai@bsdsupportservice.hu Phone: +36 30 33-55-735 (Hungary)
Re: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Adam PAPAI
Date:
Adam PAPAI wrote: The correct encoding of the values are: INSERT INTO test (name) values ( 'a' ); INSERT INTO test (name) values ( 'á' ); INSERT INTO test (name) values ( 'b' ); INSERT INTO test (name) values ( 'c' ); INSERT INTO test (name) values ( 'd' ); INSERT INTO test (name) values ( 'e' ); INSERT INTO test (name) values ( 'é' ); INSERT INTO test (name) values ( 'f' ); INSERT INTO test (name) values ( 'g' ); INSERT INTO test (name) values ( 'h' ); I forgot to change the encoding of my mail client. test=# select * from test order by name asc; idn | name -----+------ 1 | a 3 | b 4 | c 5 | d 6 | e 8 | f 9 | g 10 | h 2 | á 7 | é (10 rows) -- Adam PAPAI BSD Support Service http://www.bsdsupportservice.hu E-mail: adam.papai@bsdsupportservice.hu Phone: +36 30 33-55-735 (Hungary)
Re: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Tom Lane
Date:
Adam PAPAI <adam.papai@bsdsupportservice.hu> writes: > [ hu_HU.UTF-8 locale doesn't sort as expected on ] > FreeBSD titanium 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 > UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 Do you get the expected sort ordering from the "sort" command either? I'm betting not. This is not something Postgres controls -- we just rely on the results of strcoll(). I don't know the state of affairs on FreeBSD for sure, but I do know that the UTF-8 locales are fairly broken on Mac OS X, which is a BSD derived platform too. It looks like nobody in BSD-land has bothered to put in the work to make multibyte characters sort properly. You might have better luck with hu_HU using a suitable LATINn character set instead of UTF8. If that still doesn't do what you want, you might have no choice but to move off BSD :-( regards, tom lane
Re: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Adam PAPAI
Date:
Tom Lane wrote: > Adam PAPAI <adam.papai@bsdsupportservice.hu> writes: >> [ hu_HU.UTF-8 locale doesn't sort as expected on ] >> FreeBSD titanium 8.1-RELEASE FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 >> UTC 2010 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 > > Do you get the expected sort ordering from the "sort" command either? > I'm betting not. This is not something Postgres controls -- we just > rely on the results of strcoll(). > > I don't know the state of affairs on FreeBSD for sure, but I do know > that the UTF-8 locales are fairly broken on Mac OS X, which is a BSD > derived platform too. It looks like nobody in BSD-land has bothered to > put in the work to make multibyte characters sort properly. > > You might have better luck with hu_HU using a suitable LATINn character > set instead of UTF8. If that still doesn't do what you want, you might > have no choice but to move off BSD :-( > > regards, tom lane Hi Tom, I've tested the "sort" command. It fails too. Perhaps it's really with the *BSD and not with PostgreSQL. Thanks, -- Adam PAPAI BSD Support Service http://www.bsdsupportservice.hu E-mail: adam.papai@bsdsupportservice.hu Phone: +36 30 33-55-735 (Hungary)
Re: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Peter Eisentraut
Date:
On sön, 2011-02-06 at 11:06 -0500, Tom Lane wrote: > I don't know the state of affairs on FreeBSD for sure, but I do know > that the UTF-8 locales are fairly broken on Mac OS X, which is a BSD > derived platform too. It looks like nobody in BSD-land has bothered > to put in the work to make multibyte characters sort properly. Yeah, same on FreeBSD. Using LATINn is the right choice there.
Re: sort order (ORDER BY) hu_HU.UTF-8 locale with UTF-8 encoding is not working correctly on FreeBSD
From
Sergey Burladyan
Date:
QWRhbSBQQVBBSSA8YWRhbS5wYXBhaUBic2RzdXBwb3J0c2VydmljZS5odT4g d3JpdGVzOg0KDQo+IEZyZWVCU0QgdGl0YW5pdW0gOC4xLVJFTEVBU0UgRnJl ZUJTRCA4LjEtUkVMRUFTRSAjMDogTW9uIEp1bCAxOSAwMjozNjo0OQ0KPiBV VEMgMjAxMCAgICAgcm9vdEBtYXNvbi5jc2UuYnVmZmFsby5lZHU6L3Vzci9v YmovdXNyL3NyYy9zeXMvR0VORVJJQyAgYW1kNjQNCj4NCj4gVGVzdGVkIFBn U1FMIHZlcnNpb25zIGFyZTogOC40IGFuZCA5LjAuMyAoZnJlc2ggaW5zdGFs bCB1c2luZyBwb3J0cykNCg0KRGlkIHlvdSB0cnkgSUNVIHBhdGNoPyBJTUhP IGl0IGNhbiBoZWxwIHdpdGggeW91ciBwcm9ibGVtLiBCdXQgaSBhbSBub3Qg c3VyZQ0KaG93IGl0IGlzIHN0YWJsZS4NCg0KIyBjZCAvdXNyL3BvcnRzL2Rh dGFiYXNlcy9wb3N0Z3Jlc3FsODQtc2VydmVyDQojIG1ha2UgY29uZmlnDQri lIzilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDi lIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDi lIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDi lIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDilIDi lIDilIDilIDilIDilIDilIDilIDilIDilIDilJANCuKUgiAgICAgICAgICAg ICAgT3B0aW9ucyBmb3IgcG9zdGdyZXNxbC1zZXJ2ZXIgOC40LjVfMiAgICAg ICAgICAgICAgICAg4pSCDQrilIIg4pSM4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSQIOKUgg0K 4pSCIOKUgltYXSBOTFMgICAgICAgICAgICAgICBVc2UgaW50ZXJuYXRpb25h bGl6ZWQgbWVzc2FnZXMgICAgICAgICAgICDilIIg4pSCDQrilIIg4pSCWyBd IFBBTSAgICAgICAgICAgICAgIEJ1aWxkIHdpdGggUEFNIHN1cHBvcnQgKHNl cnZlciBvbmx5KSAgICAgIOKUgiDilIINCuKUgiDilIJbIF0gTERBUCAgICAg ICAgICAgICAgQnVpbGQgd2l0aCBMREFQIGF1dGhlbnRpY2F0aW9uIHN1cHBv cnQgICAg4pSCIOKUgg0K4pSCIOKUglsgXSBNSVRfS1JCNSAgICAgICAgICBC dWlsZCB3aXRoIE1JVCdzIGtlcmJlcm9zIHN1cHBvcnQgICAgICAgICDilIIg 4pSCDQrilIIg4pSCWyBdIEhFSU1EQUxfS1JCNSAgICAgIEJ1aWxkcyB3aXRo IEhlaW1kYWwga2VyYmVyb3Mgc3VwcG9ydCAgICAgIOKUgiDilIINCuKUgiDi lIJbIF0gT1BUSU1JWkVEX0NGTEFHUyAgQnVpbGRzIHdpdGggY29tcGlsZXIg b3B0aW1pemF0aW9ucyAoLU8zKSAg4pSCIOKUgg0K4pSCIOKUgltYXSBYTUwg ICAgICAgICAgICAgICBCdWlsZCB3aXRoIFhNTCBkYXRhIHR5cGUgKHNlcnZl cikgICAgICAgICDilIIg4pSCDQrilIIg4pSCW1hdIFRaREFUQSAgICAgICAg ICAgIFVzZSBpbnRlcm5hbCB0aW1lem9uZSBkYXRhYmFzZSAoc2VydmVyKSAg IOKUgiDilIINCuKUgiDilIJbIF0gREVCVUcgICAgICAgICAgICAgQnVpbGRz IHdpdGggZGVidWdnaW5nIHN5bWJvbHMgICAgICAgICAgICAg4pSCIOKUgg0K 4pSCIOKUgltYXSBHU1NBUEkgICAgICAgICAgICBCVWlsZCB3aXRoIEdTU0FQ SSBzdXBwb3J0ICAgICAgICAgICAgICAgICDilIIg4pSCDQo9PT7ilIIg4pSC WyBdIElDVSAgICAgICAgICAgICAgIFVzZSBJQ1UgZm9yIHVuaWNvZGUgY29s bGF0aW9uIChzZXJ2ZXIpICAgIOKUgiDilIINCuKUgiDilIJbWF0gSU5UREFU RSAgICAgICAgICAgQnVpbGRzIHdpdGggNjQtYml0IGRhdGUvdGltZSB0eXBl IChzZXJ2ZXIp4pSCIOKUgg0K4pSCIOKUgiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICDi lIIg4pSCDQrilIIg4pSCICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIOKUgiDilIINCuKU giDilIIgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAg4pSCIOKUgg0K4pSc4pSA4pSU4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSY4pSA4pSkDQrilIIgICAgICAgICAgICAgICAgICAgICAg IFsgIE9LICBdICAgICAgIENhbmNlbCAgICAgICAgICAgICAgICAgICAgICAg IOKUgg0K4pSU4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA 4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSA4pSYDQoNCi0tIA0K U2VyZ2V5IEJ1cmxhZHlhbg0K