Re: a strange order by behavior - Mailing list pgsql-sql

From Peter Eisentraut
Subject Re: a strange order by behavior
Date
Msg-id 1308773415.10498.6.camel@vanquo.pezone.net
Whole thread Raw
In response to Re: a strange order by behavior  (Samuel Gendler <sgendler@ideasculptor.com>)
List pgsql-sql
On ons, 2011-06-22 at 01:43 -0700, Samuel Gendler wrote:
> I seem to recall a thread here about it ignoring spaces entirely in that
> collation (and maybe ignoring capitalization, too?).

The way it works is that every collating element (letter or other
character or character group that you sort as a unit) is assigned four
weights (primary, secondary, tertiary, and quaternary), and the sorting
then first compares the primary weights, then the secondary weights,
etc.  The primary weight typically indicates the overall sort order,
like A before B, the secondary weight has to do with diacritic marks,
the tertiary with letter case, and the fourth level is only used in
special cases.  So that's why it looks as though the capitalization is
"ignored" unless both the primary and secondary weights are the same.

> This worked:
> 
> createdb  -E UTF-8 --lc-collate=C some_db
> 
> A quick google search
> reveals that there is some kind of standard for unicode collation (
> http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is
> represented by the en_US.UTF-8 collation or not.

At least the collate category of the en_US.UTF-8 locale on glibc is
unaltered from the ISO 14651 default ordering, which is equivalent to
the Unicode default ordering.  There several other locales for which
that is also the case.  Unfortunately, this is not exposed outside of
the glibc source code.  So you can't just select "give me a neutral
default ordering".




pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: a strange order by behavior
Next
From: Peter Eisentraut
Date:
Subject: Re: a strange order by behavior