Tony Caduto wrote:
> Arnaud Lesauvage wrote:
>>
>>
>>> I then try to import into PostgreSQL. The farther I can get is when
>>> using the UNICODE export, and importing it using a client_encoding
>>> set to UTF8 (I tried WIN1252, LATIN9, LATIN1, ...).
>>> The copy then stops with an error :
>>> ERROR: invalid byte sequence for encoding "UTF8": 0xff
>>> État SQL :22021
>>>
>>> The problematic character is the euro currency symbol.
>>
>>
> Exporting from MS SQL server as unicode is going to give you full
> Unicode, not UTF8. Full unicde is 2 bytes per character and UTF8 is 1,
> same as ASCII.
> You will have to encode the Unicode data to UTF8
Well, UTF8 is a minimum of one byte, but can be longer for non-ASCII
characters. The idea being that chars below 128 map to ASCII. There's
also UTF16 and I believe UTF32 with 2+ and four byte characters.
> I have done this in Delphi using it's built in UTF8 encoding and
> decoding routines. You can get a free copy of Delphi Turbo Explorer
> which includes components for MS SQL server and ODBC, so it would be
> pretty straight forward to get this working.
>
> The actual method in Delphi is system.UTF8Encode(widestring). This will
> encode unicode to UTF8 which is compatible with a Postgresql UTF8 database.
Ah, that's useful to know. Windows just doesn't have the same quantity
of tools installed as a *nix platform.
> I am sure Perl could do it also.
And in one line if you're clever enough no doubt ;-)
--
Richard Huxton
Archonet Ltd