Re: BUG #1775: UTF8 to ISO does not convert the german - Mailing list pgsql-bugs

From Tatsuo Ishii
Subject Re: BUG #1775: UTF8 to ISO does not convert the german
Date
Msg-id 20050720.172726.68162903.t-ishii@sra.co.jp
Whole thread Raw
In response to Re: BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß)  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: BUG #1775: UTF8 to ISO does not convert the german "sharp s" (ß)  (Marcus Raphelt <marcus@raphelt.de>)
List pgsql-bugs
> Hi Tatsuo,
>
> Am Mittwoch, 20. Juli 2005 um 01:00 schrieben Sie:
>
> TI> conversion tables. So if german umlauts are converted fine, there's no
> TI> reason the conversion for german sharp s does not work.
>
> TI> Marcus,
>
> TI> Can you give me the exact error message from PostgreSQL when the
> TI> conversio failed?
>
> Well - actually, there IS no error message, convert() just returns an
> empty sting. In detail, here's what I did:
>
> -Set up PostgreSQL (Ascii)
>
> -Imported the opengeodb
> http://sourceforge.net/project/showfiles.php?group_id=132421
> As they offer a PostgreSQL dump, I chose this one.
>
> -Imported the dump using pgAdminIII
>
> -Created a view that returns all german cities with ZIP and
> opengeodb-locationID:
>
> CREATE OR REPLACE VIEW orte_de AS
> SELECT code.text_val AS plz, code.loc_id, town.text_val AS ort
> FROM geodb_hierarchies hi, geodb_textdata state, geodb_textdata town, geodb_textdata code
> WHERE hi.id_lvl2 = state.loc_id AND state.text_val = 'DE'::text
> AND state.text_type = 500100001 AND town.loc_id = hi.loc_id
> AND town.text_type = 500100000 AND code.loc_id = town.loc_id
> AND code.text_type = 500300000;
>
> So now I've got three columns: "plz" (zip), "ort" (city) and loc_id.
> Assuming I want to retrieve cites in the Hamburg / Hannover area...
> (perfect for this task, as they have pretty strange city names there
> :)
>
> SELECT plz, loc_id, ort, convert(ort using utf_8_to_iso_8859_1) as
> ort_conv
> from orte_de
> where plz between 20000 and 30000
> order by ort_conv
>
> This query returns empty values for "ort_conv" if "ort" contains a
> sharp s.
> Btw, it seems like it is the same for "Ä" (Ä), have a look at
> loc_id 25182.

I see no problem with encoding conversion itself:

utf8=# \encoding latin1
utf8=# select * from t1;
 t
----
 $Bd_(B
(1 row)

> SELECT plz, loc_id, ort, convert(ort using utf_8_to_iso_8859_1) as
> ort_conv
> from orte_de
> where plz between 20000 and 30000
> order by ort_conv

Problem here is the result of convert(ort using utf_8_to_iso_8859_1)
is ISO-8859-1 but your database encoding is UTF-8, so the terminal
treats ISO-8859-1 chars as UTF-8 which will result in unexpected
characters displayed.

I guess what you want to do is:

SELECT plz, loc_id, ort from orte_de
where plz between 20000 and 30000
order by convert(ort using utf_8_to_iso_8859_1)
--
Tatsuo Ishii

pgsql-bugs by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: BUG #1775: UTF8 to ISO does not convert the german
Next
From: Tatsuo Ishii
Date:
Subject: Re: BUG #1775: UTF8 to ISO does not convert the german