Re: locale-specific sort algorithms undocumented? - Mailing list pgsql-general

From Peter Eisentraut
Subject Re: locale-specific sort algorithms undocumented?
Date
Msg-id 200407261049.12346.peter_e@gmx.net
Whole thread Raw
In response to Re: locale-specific sort algorithms undocumented?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
> > I now find that sorting is very different with that setting: It
> > appears, through trial and error, that all non-alphanumeric
> > characters are completely ignored by ORDER BY.
>
> I doubt they are ignored completely, but they probably are ignored in
> the first-order comparison.

The way this more or less works is:

First pass: letters, numbers
Second pass: accents
Third pass: upper/lower case
Fourth pass: punctuation characters

This is all enshrined in various standards such as ISO/IEC 14651,
national standards based on it, and independent technical standards
such as the Unicode Collation Algorithm.

The latter in fact allows what many people appear to be looking for: a
"variable weighting" option that allows you to promote punctuation
characters to the first pass.  But I don't think any operating system
implements that, yet.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


pgsql-general by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: Sql injection attacks
Next
From: Geoff Caplan
Date:
Subject: Re: Sql injection attacks