Encoding problems in PostgreSQL with XML data - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Encoding problems in PostgreSQL with XML data
Date
Msg-id 200401091946.01930.peter_e@gmx.net
Whole thread Raw
Responses Re: Encoding problems in PostgreSQL with XML data  (Rod Taylor <pg@rbt.ca>)
List pgsql-hackers
This is not directly related to current development, but it is something 
that might need a low-level solution.  I've been thinking for some time 
about how to enchance the current "XML support" (e.g., contrib/xml).

The central problem I have is this:  How do we deal with the fact that 
an XML datum carries its own encoding information?

Here's a scenario:  It is desirable to have validity checking on XML 
input, be it a special XML data type or some functions that take XML 
data.  Say we define a data type that stores XML documents and rejects 
documents that are not well-formed.  I want to insert something in 
psql:

CREATE TABLE test (   description text,   content xml
);

INSERT INTO test VALUES ('test document', '<?xml 
version="1.0"?><doc><para>blah</para>...</doc>');

Now an XML parser will assume this document to be in UTF-8, and say at 
the client it is.  What if client_encoding=UNICODE but 
server_encoding=LATIN1?  Do we expect some layer to rewrite the <?xml?> 
declaration to contain the correct encoding information?  Or can the 
xml type bypass encoding conversion?  What about reading it back out of 
the database with yet another client encoding?

Rewriting the <?xml?> declaration seems like a workable solution, but it 
would break the transparency of the client/server encoding conversion.  
Also, some people might dislike that their documents are being changed 
as they are stored.

Any ideas?



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: "with grant option" for user groups.
Next
From: Andreas Pflug
Date:
Subject: Re: OLE DB driver