Thread: XML validation of whitespace values
HI all,
I’ve had an interesting problem trying to perform an UPDATE based on the results of processing an xpath against a field of type xml.
Specifically, my query would fail with:
ERROR: could not parse XML documentDETAIL: line 1: Start tag expected, '<' not found
I thought this strange, since the xml type is (I believe) supposed to guarantee that the value is well formed (wether it’s a document or content).
After some amount of digging to locate the rogue entry, it turns out the field value is actually a single whitespace character – which apparently is well formed..
Thus I’m left with the following situation…
clone=# select xml_is_well_formed(' ');
xml_is_well_formed
--------------------
t
(1 row)
clone=# select xpath_exists (‘//test', ' ');
ERROR: could not parse XML document
DETAIL: line 1: Start tag expected, '<' not found
clone=# create temp table xml_test (document xml);
CREATE TABLE
clone=# insert into xml_test values (' ');
INSERT 0 1
clone=# select xml_is_well_formed(document::text) from xml_test;
xml_is_well_formed
--------------------
t
(1 row)
clone=# select xpath_exists ('test', document) from xml_test;
ERROR: could not parse XML document
DETAIL: line 1: Start tag expected, '<' not found
This is confusing, since… if the xml is well formed – it probably shouldn’t cause the parser to error. And if it isn’t well formed, then the instantiation of the value as an xml type should fail.
On 3/14/14, 11:12 AM, Tim Kane wrote: > clone=# select xml_is_well_formed(' '); > xml_is_well_formed > -------------------- > t > (1 row) > > > clone=# select xpath_exists (‘//test', ' '); > ERROR: could not parse XML document > DETAIL: line 1: Start tag expected, '<' not found There are several issues at work here: - contrib/xml2 has a slightly different notion of what is an OK xml value than the built-in xml type. - A string consisting of whitespace is well-formed XML content, but not a well-formed XML document. Compare xmlparse(document ' ') vs xmlparse(content ' '). contrib/xml2 (apparently) follows the latter interpretation. - xpath queries only work properly on XML documents. It might be better if you wrote your code without using contrib/xml2, and instead caught any parse exceptions in, say, plpgsql code.
Tim Kane wrote > clone=# create temp table xml_test (document xml); > CREATE TABLE If you know you need to use xpath on this content then you should do one of the following: SELECT CASE WHEN document IS DOCUMENT THEN xpath(...) ELSE default_value_for_missing_data END; CREATE TABLE xml_test ( document xml NOT NULL CHECK (document IS DOCUMENT) ); For better and worse in the name of "simplicity" both xml and json(b?) types are usable for both the embedded content and the entire document. Unless you think you have a good reason for a single column to represent both content and documents you should probably constrain stored data to be whichever you are expecting. This is the root of your confusion since, as Peter said, xpathing only operates on documents - not content fragments. "The second argument must be a well formed XML document. In particular, it must have a single root node element." Unfortunately the decision to treat documents and content as the same types means this cannot be discovered at parse-time but is data dependent and thus a run-time error. By putting in the constraint you ensure that any use of said column in an xpath query will either always succeed or always fail. Furthermore: "xml_is_well_formed does the former if the xmloption configuration parameter is set to DOCUMENT, or the latter if it is set to CONTENT" - the default for xmloption is CONTENT... Both quotes above come from: http://www.postgresql.org/docs/9.3/interactive/functions-xml.html David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/XML-validation-of-whitespace-values-tp5796092p5803594.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.