Thread: ORDER BY different locales for 8.0

ORDER BY different locales for 8.0

From
Honza Pazdziora
Date:
Hello,

the nls_string function that makes it possible to sort by arbitrary
locale has been updated to reflect the changes in error handling in
PostgreSQL 8.0, due to users using the nls_string sorting on 7.4 and
requesting it for 8.0 as well. The distribution can be downloaded from
 http://www.fi.muni.cz/~adelton/l10n/
http://www.fi.muni.cz/~adelton/l10n/postgresql-nls-string/postgresql-nls-string-8.01.tar.gz

I'll appreciate any comments.

The README is as follows:

-----------------
Why this function:

PostgreSQL, at least until version 8.0, has rather weak support for
various collating sequences -- what you get when you do
 select ... order by column.

The sorting is closely tied to indexes used throughout the database
cluster and is specified by locale settings at the initdb time.
Yet, people asked for ways of specifying the collating rules at runtime,
even if the sorting will not use indexes. Just take the records and
sort them. It is reasonable request to want one select to order by
using English rules, another one to run with German rules and yet
another with Czech ones, without having to dump, initdb, restore.

------------
How it works:

In this distribution you will find file nls_string.c. It contains the
definition of function nls_string(text, text) which takes a string
parameter and a locale name and returns string describing the ordering.
So you can run
 select * from table order by nls_string(name, 'en_US.UTF-8')

or
 select * from table order by nls_string(name, 'cs_CZ.UTF-8')

or
 select * from table order by nls_string(name, 'C')

and get what you expect -- the result is sorted the same way as it
would be with LC_COLLATE=locate sort on the command line.

Internally, the function sets the locale for LC_COLLATE category, runs
strxfrm on the first parameter and encodes the result as octal values.
Thus, it depends on your PostgreSQL collate setting (that which you
did upon initdb, you can check it with show lc_collate) to sort
numbers in the natural way. I believe this is reasonable assumption.

------------
Installation:

Please check the INSTALL file.

---------
Versions:

This version of nls_string targets PostgreSQL server in version 8.0+.
To use nls_string on version 7.4, download nls_string 0.53.

-------------
Bugs and ToDo:

If your default collation settings does not sort numbers in the
natural way (eg., 0123 is not sorted before 1234), the nls_string
will not work.

Nonetheless, the function does the work for me.

--------------------
Support, bug reports:

This piece of software is provided as-is, in the hope that you will
find it useful. However, no warranty is provided.

I appreciate any bug reports, enhancement suggestions and patches.
Please, _please_, use a meaningful Subject line and describe the
situation in detail. Also make sure you've read and understood this
README and the PostgreSQL documentation concerning C-language
functions. I will not be helpful with installation problems if you did
not read the documentation.

-------------------
If it works for you:

If the function works for you, I'd appreciate a message from you.
Just curious for what tasks people use the software.

---------
Available:
 http://www.fi.muni.cz/~adelton/l10n/

------
Author:

Copyright: (c) 2004--2005 Jan Pazdziora, adelton@fi.muni.cz. All
rights reserved. Permission to use, distribute, modify, an copy this
software and this documentation for any purpose is hereby granted.

Contributors:
Karel Zak <zakkr@zff.jcu.cz>ftp://ftp2.zf.jcu.cz/users/zakkr/pg/

-- 
------------------------------------------------------------------------Honza Pazdziora | adelton@fi.muni.cz |
http://www.fi.muni.cz/~adelton/.project:Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ...    Only
self-confidentpeople can be simple.
 


Re: ORDER BY different locales for 8.0

From
Mario Weilguni
Date:
Am Dienstag, 19. April 2005 09:18 schrieb Honza Pazdziora:
> Hello,
>
> the nls_string function that makes it possible to sort by arbitrary
> locale has been updated to reflect the changes in error handling in
> PostgreSQL 8.0, due to users using the nls_string sorting on 7.4 and
> requesting it for 8.0 as well. The distribution can be downloaded from
>
>   http://www.fi.muni.cz/~adelton/l10n/
>  
> http://www.fi.muni.cz/~adelton/l10n/postgresql-nls-string/postgresql-nls-st
>ring-8.01.tar.gz
>
> I'll appreciate any comments.
>
> The README is as follows:
>
> -----------------
> Why this function:
>
> PostgreSQL, at least until version 8.0, has rather weak support for
> various collating sequences -- what you get when you do
>
>   select ... order by column.
>
> The sorting is closely tied to indexes used throughout the database
> cluster and is specified by locale settings at the initdb time.
> Yet, people asked for ways of specifying the collating rules at runtime,
> even if the sorting will not use indexes. Just take the records and
> sort them. It is reasonable request to want one select to order by
> using English rules, another one to run with German rules and yet
> another with Czech ones, without having to dump, initdb, restore.

Cool. I'm working on a multi-lingual database, so this is exactly what I 
looked for. The only disatvantage is the length of the resulting strings, 
it's around 9 time longer than the original string.

Just a suggestion:
Could it be done better by not returing a string, but a non-locale affected 
representation like some sort of byte array, so the size could be reduced 
alot? 

However, extremly useful. Thanks!