Re: Rep:Re: [BUGS] Encoding Problem? - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: Rep:Re: [BUGS] Encoding Problem?
Date
Msg-id 20020306000127E.t-ishii@sra.co.jp
Whole thread Raw
List pgsql-hackers
> I guess you are inserting correct EUC Traditional
> Chinese (EUC-TW)
> characters but hard to tell what is happening unless
> you are showing
> us the character sequences in hexa decimal format.
> --
> Tatsuo Ishii
> ===============================
> Many thanks! Tatsuo,
>
> Please see below.  Best Regards,
>
> CN
> ---------------
> linux:~$ cat /tmp/tt
> 1111
> ¦¨¥\
> ³\
> 2222
> linux:~$ od -t x /tmp/tt
> 0000000 31313131 a5a8a60a 5cb30a5c 3232320a
> 0000020 00000a32
> 0000022

Are you sure that they are EUC-TW? Considering the byte swapping, they
are actually like this:

0x31,0x31,0x31,0x31,0x0a,
0xa6,0xa8,0xa5,0x5c,0x0a,
0xb3,0x5c,0x0a,
0x32,0x32,0x32,0x32,0x0a

Here we see a55c and b35c, which should never happen in EUC-TW, since
the each second byte is lower than 0x80.
I guess they are BIG5. If my guess is correct, you could set the
client encoding to BIG5 ("\encoding BIG5" in psql) and get correct
result.
--
Tatsuo Ishii


pgsql-hackers by date:

Previous
From: Fernando Nasser
Date:
Subject: Re: Reverting SET SESSION AUTHORIZATION command
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] WITH DELIMITERS in COPY