Re: locale-specific sort algorithms undocumented? - Mailing list pgsql-general

From Stephan Szabo
Subject Re: locale-specific sort algorithms undocumented?
Date
Msg-id 20040725170328.F42584@megazone.bigpanda.com
Whole thread Raw
In response to locale-specific sort algorithms undocumented?  (John Gunther <mail@bucksvsbytes.com>)
List pgsql-general
On Tue, 20 Jul 2004, John Gunther wrote:

> In the course of creating a new PG 7.4.3 server, I chose UTF-8 as my
> locale. I now find that sorting is very different with that setting: It

As a note, UTF-8 is not a locale.  My guess is that you are actually using
something like en_XX.UTF-8 (for some XX) given your comments below.

> appears, through trial and error, that all non-alphanumeric characters
> are completely ignored by ORDER BY. Thus, traditional tricks like
> prefixing a name with a leading space to force it to the head of an
> alphabetized list don't work anymore. This is unexpected and undesired,
> but curiously, I can't find a word written anywhere that defines the
> sort process for various locale settings. Can anyone point me to any
> documentation on this?

Pretty much we punt to your system documentation, because we rely on the
facilities provided by it for handling the collation.
 "PostgreSQL uses the standard ISO C and POSIX locale facilities provided
by the server operating system. For additional information refer to the
documentation of your system."

In practice, most systems seem to provide collation similar to
dictionaries for English (at least) which means that symbols and spaces
aren't significant for first pass comparisons.

> Second, can anyone offer any positive or negative implications of
> choosing a non-default locale (for English language databases)? I chose
> UTF because I perceive it as more of a standard encoding approach, but
> I'll have to reinit the server with a diffeent setting if I can't
> control its predilection to ignoring non-alphanumerics.

I think it mostly depends on what you want. en_US provides you dictionary
collation, which means that "A C" > "AB" but it also means that "AC" >
"Ab".

pgsql-general by date:

Previous
From: Geoff Caplan
Date:
Subject: Re: Sql injection attacks
Next
From: Tom Lane
Date:
Subject: Re: locale-specific sort algorithms undocumented?