Thread: Is it bad sorting in UTF ??
Why PG sort's my data in case insensitive manner? masterdb=# SELECT name FROM enterprises ORDER BY name; name -------------------------------------- ................ abc AKS 514 aks518 AKSFree11111 .......................... The worst (totally mess) example is: masterdb=# SELECT name, replace(name, ' ', '<20>') FROM enterprises ORDER BY name; name | replace -------------------------------------- +-------------------------------------------------------- ............................... alinatestfree | alinatestfree Alina Test Free | Alina<20>Test<20>Free alinatestfree220 | alinatestfree220 ............................... More info here: masterdb=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 8.3.7 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27) (1 row) masterdb=# select name, setting from pg_settings WHERE name ilike 'lc %'; name | setting -------------+------------- lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 (6 rows) masterdb=# \l List of databases Name | Owner | Encoding ------------------------------------+----------+---------- masterdb | postgres | UTF8 ............... masterdb=# \d+ enterprises Table "public.enterprises" Column | Type | Modifiers | Description --------------------+--------------------------+----------- +------------- ............... name | text | | ................... masterdb=# EXPLAIN ANALYZE SELECT name FROM enterprises ORDER BY name; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Sort (cost=29.62..30.68 rows=422 width=11) (actual time=1.537..1.596 rows=367 loops=1) Sort Key: name Sort Method: quicksort Memory: 44kB -> Seq Scan on enterprises (cost=0.00..11.22 rows=422 width=11) (actual time=0.012..0.181 rows=367 loops=1) Total runtime: 1.677 ms (5 rows)
wstrzalka wrote: > Why PG sort's my data in case insensitive manner? > masterdb=# select name, setting from pg_settings WHERE name ilike 'lc > %'; > name | setting > -------------+------------- > lc_collate | en_US.UTF-8 Because that's what en_US.UTF-8 does. If you want "computer-style" sorting choose the "C" locale. You'll need to dump all your databases an re-initdb to do that I'm afraid. I believe 8.4 is going to allow different locales for each database though. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > wstrzalka wrote: >> Why PG sort's my data in case insensitive manner? > >> masterdb=# select name, setting from pg_settings WHERE name ilike 'lc >> %'; >> name | setting >> -------------+------------- >> lc_collate | en_US.UTF-8 > > Because that's what en_US.UTF-8 does. If you want "computer-style" > sorting choose the "C" locale. You'll need to dump all your databases an > re-initdb to do that I'm afraid. I believe 8.4 is going to allow > different locales for each database though. Even if it were slower, having a way to specify that sorting be WITH a particular collation method/locale would be appealing. I guess this could be done with a custom operator/opclass implementing non-libc-based sorting and collation, which is probably where PostgreSQL will land up heading in the end. -- Craig Ringer
At 07:57 PM 4/14/2009, Richard Huxton wrote: >wstrzalka wrote: >>Why PG sort's my data in case insensitive manner? > >>masterdb=# select name, setting from pg_settings WHERE name ilike 'lc >>%'; >> name | setting >>-------------+------------- >> lc_collate | en_US.UTF-8 > >Because that's what en_US.UTF-8 does. If you want "computer-style" >sorting choose the "C" locale. You'll need to dump all your >databases an re-initdb to do that I'm afraid. I believe 8.4 is going >to allow different locales for each database though. Is it possible to have something like: select name from sometable order by collate(name,'en_US.UTF-8'); select name from sometable order by collate(name,'C'); Link.