Issues with german 'Umlaute' - Mailing list pgsql-bugs

From Nicolaus Erichsen
Subject Issues with german 'Umlaute'
Date
Msg-id 200210171706.36054.nico.erichsen@hsh-berlin.com
Whole thread Raw
Responses Re: Issues with german 'Umlaute'  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Hello everybody,

I recently found a problem with sorting german 'Umlaute' . I hope the encoding
of this mail works ;-)  :

Postgres puts Umlaute (i.e., ÄäÖöÜü)  at the very end of the Alphabet, and
this is not the way it should be.  I didn't check for the special Character
'ß', but its probably similar.

The canonical sort order for Umlaute is to treat them as two characters, like
this:
ä -> ae
ö -> oe
ü -> ue
ß -> ss
( and the same for upper case 'ÄÖÜ'. 'ß' does not have an upper case )

Well, I guess this might be difficult to implement and might have quite an
impact on performance. The solution I know from other databases consists of
inserting ä after a, ö after o, ü after u and ß after s. Afaik this is
generally accepted.

upper() does not handle Umlaute correctly as well. It leaves äöü unchanged
instead of converting them to upper case.

All this happens with a database  created with encoding ='latin1'. If there
are better results with a different encoding (I didn't try it yet), I'd
suggest adding some information about this in the documentation.

Thanks for your work,

N.Erichsen

--
HSH Soft-und Hardware Vertriebs GmbH
Rudolf-Diesel-Straße 2 - 16321 Lindenberg
Tel. (030) 94004 - 509  Fax (030) 94004 - 400

pgsql-bugs by date:

Previous
From: Talja Ari
Date:
Subject: now() gives the time of the last commit, not the time it is calle d
Next
From: Nicolaus Erichsen
Date:
Subject: 'pg_dump --create' forgets database encoding