Perhaps the document should be stored in canonical form. See
http://www.w3.org/TR/xml-c14n
I think I agree with Rod's opinion elsewhere in this thread. I guess the
"philosophical" question is this: If 2 XML documents with different
encodings have the same canonical form, or perhaps produce the same DOM,
are they equivalent? Merlin appears to want to say "no", and I think I
want to say "yes".
cheers
andrew
Merlin Moncure wrote:
>Peter Eisentraut wrote:
>
>
>>The central problem I have is this: How do we deal with the fact that
>>an XML datum carries its own encoding information?
>>
>>
>
>Maybe I am misunderstanding your question, but IMO postgres should be
>treating xml documents as if they were binary data, unless the server
>takes on the role of a parser, in which case it should handle
>unspecified/unknown encodings just like a normal xml parser would (and
>this does *not* include changing the encoding!).
>
>According to me, an XML parser should not change one bit of a document,
>because that is not a 'parse', but a 'transformation'.
>
>
>
>>Rewriting the <?xml?> declaration seems like a workable solution, but
>>
>>
>it
>
>
>>would break the transparency of the client/server encoding conversion.
>>Also, some people might dislike that their documents are being changed
>>as they are stored.
>>
>>
>
>Right, your example begs the question: why does the server care what the
>encoding of the documents is (perhaps indexing)? ZML validation is a
>standardized operation which the server (or psql, I suppose) can
>subcontract out to another application.
>
>Just a side thought: what if the xml encoding type was built into the
>domain type itself?
>create domain xml_utf8 ...
>Which allows casting, etc. which is more natural than an implicit
>transformation.
>
>Regards,
>Merlin
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>
>