Re: Encoding problems in PostgreSQL with XML data - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Encoding problems in PostgreSQL with XML data
Date
Msg-id 3FFF129E.6020109@dunslane.net
Whole thread Raw
In response to Re: Encoding problems in PostgreSQL with XML data  ("Merlin Moncure" <merlin.moncure@rcsonline.com>)
Responses Re: Encoding problems in PostgreSQL with XML data
List pgsql-hackers
Perhaps the document should be stored in canonical form. See 
http://www.w3.org/TR/xml-c14n

I think I agree with Rod's opinion elsewhere in this thread. I guess the 
"philosophical" question is this: If 2 XML documents with different 
encodings have the same canonical form, or perhaps produce the same DOM, 
are they equivalent? Merlin appears to want to say "no", and I think I 
want to say "yes".

cheers

andrew

Merlin Moncure wrote:

>Peter Eisentraut wrote:
>  
>
>>The central problem I have is this:  How do we deal with the fact that
>>an XML datum carries its own encoding information?
>>    
>>
>
>Maybe I am misunderstanding your question, but IMO postgres should be
>treating xml documents as if they were binary data, unless the server
>takes on the role of a parser, in which case it should handle
>unspecified/unknown encodings just like a normal xml parser would (and
>this does *not* include changing the encoding!).
>
>According to me, an XML parser should not change one bit of a document,
>because that is not a 'parse', but a 'transformation'.
> 
>  
>
>>Rewriting the <?xml?> declaration seems like a workable solution, but
>>    
>>
>it
>  
>
>>would break the transparency of the client/server encoding conversion.
>>Also, some people might dislike that their documents are being changed
>>as they are stored.
>>    
>>
>
>Right, your example begs the question: why does the server care what the
>encoding of the documents is (perhaps indexing)?  ZML validation is a
>standardized operation which the server (or psql, I suppose) can
>subcontract out to another application.
>
>Just a side thought: what if the xml encoding type was built into the
>domain type itself?
>create domain xml_utf8 ...
>Which allows casting, etc. which is more natural than an implicit
>transformation.
>
>Regards,
>Merlin
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>  
>



pgsql-hackers by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: Encoding problems in PostgreSQL with XML data
Next
From: Shachar Shemesh
Date:
Subject: Re: OLE DB driver