Thread: ISO-8859-1 encoding not enforced?
Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's the database encoding? Because people using this database can happily insert any old non-LATIN1 junk into the database, then when I export as XML, all XML validation fails because the encoding is not correct. If this is not expected behaviour, I will submit an example script showing the problem... Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's > the database encoding? AFAIK, there are no illegal characters in 8859-1, except \0 which we do reject. regards, tom lane
Tom Lane said: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if >> that's the database encoding? > > AFAIK, there are no illegal characters in 8859-1, except \0 which we do > reject. > Perhaps Chris is confusing ISO/IEC 8859-1 with ISO-8859-1 a.k.a. Latin-1. According to the wikipedia, "The IANA has approved ISO-8859-1 (note the extra hyphen), a superset of ISO/IEC 8859-1, for use on the Internet. This character map, or character set or code page, supplements the assignments made by ISO/IEC 8859-1, mapping control characters to code values 00-1F, 7F, and 80-9F. It thus provides for 256 characters via every possible 8-bit value. [snip] The name Latin-1 is an informal alias [for ISO-8859-1] unrecognized by ISO or the IANA, but is perhaps meaningful in some computer software." But let's not start accepting \0 ;-) cheers andrew
>>Is PostgreSQL supposed to enforce a LATIN1/ISO-8859-1 encoding if that's >>the database encoding? > > AFAIK, there are no illegal characters in 8859-1, except \0 which we > do reject. Hmmm... It turns out I was confused by the developer who reported this issue. Basically they have a requirement that they only want the parts of LATIN1 that can be converted to single byte UTF8 (ie. 7bit ascii). Only about 8 of these high bit characters existed in our database, so I replaced them and put in a CHECK constraint on a few fields like this: CHECK (description = convert(description, 'ISO-8859-1', 'UTF-8')) Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :) Chris
On Wed, Apr 13, 2005 at 10:10:32AM +0800, Christopher Kings-Lynne wrote: > Can I put in a request for a '7 bit ascii' encoding for PostgreSQL :) Given all the problems with unwanted recoding I've seen, I think such an encoding should be the default instead of unchecked-8-bits SQL_ASCII :-( -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Amanece. (Ignacio Reyes)El Cerro San Cristóbal me mira, cínicamente, con ojosde virgen"
> Given all the problems with unwanted recoding I've seen, I think such an > encoding should be the default instead of unchecked-8-bits SQL_ASCII :-( I agree, but that would be a nightmare of backwards compaitibility :D Chris