Thread: ISO-8859-1 encoding not enforced?

ISO-8859-1 encoding not enforced?

From
Christopher Kings-Lynne
Date:
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's 
the database encoding?

Because people using this database can happily insert any old non-LATIN1 
junk into the database, then when I export as XML, all XML validation 
fails because the encoding is not correct.

If this is not expected behaviour, I will submit an example script 
showing the problem...

Chris


Re: ISO-8859-1 encoding not enforced?

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's 
> the database encoding?

AFAIK, there are no illegal characters in 8859-1, except \0 which we
do reject.
        regards, tom lane


Re: ISO-8859-1 encoding not enforced?

From
"Andrew Dunstan"
Date:
Tom Lane said:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if
>> that's  the database encoding?
>
> AFAIK, there are no illegal characters in 8859-1, except \0 which we do
> reject.
>

Perhaps Chris is confusing ISO/IEC 8859-1 with ISO-8859-1 a.k.a. Latin-1.

According to the wikipedia,

"The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of
ISO/IEC 8859-1, for use on the Internet. This character map, or character
set or code page, supplements the assignments made by ISO/IEC 8859-1,
mapping control characters to code values 00-1F, 7F, and 80-9F. It thus
provides for 256 characters via every possible 8-bit value.
[snip]
The name Latin-1 is an informal alias [for ISO-8859-1] unrecognized by ISO
or the IANA, but is perhaps meaningful in some computer software."

But let's not start accepting \0 ;-)

cheers

andrew






Re: ISO-8859-1 encoding not enforced?

From
Christopher Kings-Lynne
Date:
>>Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's 
>>the database encoding?
> 
> AFAIK, there are no illegal characters in 8859-1, except \0 which we
> do reject.

Hmmm...

It turns out I was confused by the developer who reported this issue. 
Basically they have a requirement that they only want the parts of 
LATIN1 that can be converted to single byte UTF8 (ie. 7bit ascii).

Only about 8 of these high bit characters existed in our database, so I 
replaced them and put in a CHECK constraint on a few fields like this:
 CHECK (description = convert(description, 'ISO-8859-1', 'UTF-8'))

Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)

Chris


Re: ISO-8859-1 encoding not enforced?

From
Alvaro Herrera
Date:
On Wed, Apr 13, 2005 at 10:10:32AM +0800, Christopher Kings-Lynne wrote:

> Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :)

Given all the problems with unwanted recoding I've seen, I think such an
encoding should be the default instead of unchecked-8-bits SQL_ASCII :-(

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Amanece.                                               (Ignacio Reyes)El Cerro San Cristóbal me mira, cínicamente, con
ojosde virgen"
 


Re: ISO-8859-1 encoding not enforced?

From
Christopher Kings-Lynne
Date:
> Given all the problems with unwanted recoding I've seen, I think such an
> encoding should be the default instead of unchecked-8-bits SQL_ASCII :-(

I agree, but that would be a nightmare of backwards compaitibility :D

Chris