Home > mailing lists

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text - Mailing list pgsql-odbc

From	Marc Herbert
Subject	Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text
Date	March 31, 2006 14:59:26
Msg-id	khjsloyigj5.fsf@meije.emic.fr Whole thread Raw
In response to	Re: psqlODBC-Driver Test / text fields ("Dave Page" <dpage@vale-housing.co.uk>)
List	pgsql-odbc

Tree view

Johann Zuschlag <zuschlag2@online.de> writes:

> Hmm..., so Windows  XP uses UCS-2 or do be more correct (like Bart
> mentioned) UTF-16 (which is nearly the same, except for the
> surrogates).

It's nearly the same... but that makes a huge difference.

The reason why you use fixed-character length encoding in memory is
speed. This saves you a lot of time when computing string lengths,
look for some characters (isalnum(),...), collating etc.

If don't care about all this speed then you better stay in a
variable-length encoding like UTF-8 which saves you A LOT of space,
especially with small occidental alphabets.

I think that by moving from UCS-2 to UTF-16 you lose on BOTH sides
[insert some missing benchmarks here]

And you can be sure that it brings a lot of bugs: one bug every
time some string code has been "forgotten" and not updated, still
assuming UCS-2.

Anyway those bugs are only for far-away and unknown countries out of
the BMP so who cares? :-/

So it really looks like a poor compatibility hack to me (java does it
too).

pgsql-odbc by date:

From: Marc Herbert
Date: 31 March 2006, 14:43:49
Subject: Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

From: Marc Herbert
Date: 31 March 2006, 15:09:14
Subject: Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text

Re: Unicode is not UTF-8. was :psqlODBC-Driver Test / text - Mailing list pgsql-odbc

Previous

Next