Re: postgresql euc/sjis utf8 mappings - Mailing list pgsql-general

From Joel Rees
Subject Re: postgresql euc/sjis utf8 mappings
Date
Msg-id 20020819184655.3A7F.JOEL@alpsgiken.gr.jp
Whole thread Raw
In response to postgresql euc/sjis utf8 mappings  (Thomas O'Dowd <tom@nooper.com>)
List pgsql-general
Hmm.

> I've noted that in PostgreSQL 7.2.1 some of the utf8 mappings
> of sjis and euc characters were different. One example that caught me out
> was the double width ~.
>
> '〜' (double byte/double width ~)

That's not really a tilde. It's referred to as a "wave dash", and is
usually used as such in most of what I've seen of word-processing/e-mail
type data. (Tilde is a combining character, is it not?)

> euc:  0xa1c1 -> 0xe3809c utf8

That's the Unicode wave dash.

> sjis: 0x8160 -> 0xefbd9e utf8

That's the Unicode full-width tilde.

Now, if I were going by the names, I would choose the Unicode wave dash
for that mapping, both of them to 0xe3809c.

But if I were to go by the intent of the full-width block, I'd go with
the latter, 0xefbd9e, but I'd still be wondering why the Unicode people
called it full-width tilde.

Hmm.

At any rate, mapping euc and s-jis the same should be correct, since euc
and s-jis are both just a numerical transform of JIS with ASCII squeezed
in.

> This caused me problems when a '〜' was loaded using euc and retrieved
> using sjis as there was no sjis mapping for 0xe3809c.

Another hmm. That's probably going to create surprises sometimes. Good
reason to have the source code open.

(Just thinking out loud.)

Anyway, thanks for the heads-up, Tom.

--
Joel Rees <joel@alpsgiken.gr.jp>


pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Success: Finished porting application to postgreSQL
Next
From: Ralph Graulich
Date:
Subject: DATE field subtraction