Re: Encoding problems in PostgreSQL with XML data - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: Encoding problems in PostgreSQL with XML data
Date
Msg-id 303E00EBDD07B943924382E153890E5434AA4A@cuthbert.rcsinc.local
Whole thread Raw
In response to Encoding problems in PostgreSQL with XML data  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
Andrew Dunstan wrote:
> I think I agree with Rod's opinion elsewhere in this thread. I guess
the
> "philosophical" question is this: If 2 XML documents with different
> encodings have the same canonical form, or perhaps produce the same
DOM,
> are they equivalent? Merlin appears to want to say "no", and I think I
> want to say "yes".

Er, yes, except for canonical XML.  Canonical XML neatly bypasses all
the encoding issues that I can see.

Maybe I am still not getting the basic point, but the part I was not
quite clear on is why the server would need to parse the document at
all, much less change the encoding.  Sure, it doesn't necessarily hurt
to do it, but why bother?  An external parser could handle both the
parsing and the validation.  Reading Peter's post, he seems to be
primarily concerned with an automatic XML validation trigger that comes
built in with the XML 'type'.

*unless*

1. The server needs to parse the document and get values from the
document for indexing/key generation purposes, now the encoding becomes
very important (especially considering joins between XML to non XML data
types).
2. There are plans to integrate Xpath expressions into queries.
3. The server wants to compose generated XML documents from stored
XML/non XML sources, with (substantial) additions to the query language
to facilitate this, i.e. a nested data extraction replacement for psql.

But, since I'm wishing for things, I may as well ask for a hockey rink
in my living room :)

Merlin


pgsql-hackers by date:

Previous
From: Shachar Shemesh
Date:
Subject: Re: OLE DB driver
Next
From: Peter Eisentraut
Date:
Subject: Re: Translations in the distributions