Microsoft harmful extensions to 8859-X charsets (was: Continuing encoding fun....) - Mailing list pgsql-odbc

From Marc Herbert
Subject Microsoft harmful extensions to 8859-X charsets (was: Continuing encoding fun....)
Date
Msg-id 878xveyw4w.fsf@meije.emic.fr
Whole thread Raw
In response to Re: Continuing encoding fun....  ("Dave Page" <dpage@vale-housing.co.uk>)
List pgsql-odbc
"Dave Page" <dpage@vale-housing.co.uk> writes:

>> By the way 0x8A is not in the range of latin4
>> <http://czyborra.com/charsets/iso8859.html#ISO-8859-4>
>
> http://www.gar.no/home/mats/8859-4.htm says differently, however, I
> can't claim to know enough about encoding issues to refute
> either. I've been forced to learn what I can about the subject to help
> maintain this driver and certainly may have got the wrong end of the
> stick on one or more points!

The page from gar.no is just a dump of the *Microsoft-extended* latin4
charset.

The standards comittee carefully left a gap in all LATIN-X charsets
between 0x80 and 0x9F, because those characters become (harmful)
control characters once stripped of their 8th bit (by accident).
You can see that very clearly in this table for instance
 <http://en.wikipedia.org/wiki/ISO_8859-4>

If you follow the links from gar.no itself, you can land here:
<http://en.wikipedia.org/wiki/ISO_8859> with tons of links (like the
ECMA standards for instance) showing this gap.

Microsoft, being Microsoft, jumped in that gap. Those non-standard
Microsoft characters now plague the web as clearly explained here:

<http://home.earthlink.net/~bobbau/platforms/specialchars/#windows>
or here:
<http://www.cs.tut.fi/~jkorpela/www/windows-chars.html>



pgsql-odbc by date:

Previous
From: "Dave Page"
Date:
Subject: Re: Postgresql odbc and Visual studio 2005 .net 2.0
Next
From: Marc Herbert
Date:
Subject: Re: Continuing encoding fun....