Re: Question about xmloption and pg_restore - Mailing list pgsql-hackers

From Chapman Flack
Subject Re: Question about xmloption and pg_restore
Date
Msg-id 5BD1C44B.6040300@anastigmatix.net
Whole thread Raw
In response to Re: Question about xmloption and pg_restore  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 10/25/18 05:02, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> a difference between the 2003 SQL/XML standard (which PG implements) and
>> the later versions, which changed the data model so there really is a
>> containment relationship between 'content' and 'document'.
>> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION
> 
> See also
> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
> 
> It's odd that people are just reporting this now when it's been like that
> for quite a few years, but anyway we've got a problem.  Sounds like maybe
> adopting the later standards' definitions would fix it?  Although I have
> no idea how complicated that'd be.

Supporting the later standards entirely would be a commendable thing,
but honest work:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Possible_ways_forward

OTOH, making the current XML parsing not fail in this particular case
(which could be viewed as adopting the later standards' relationship
of CONTENT to DOCUMENT) might just be as simple as having the current
parsing code for CONTENT detect whether the string "starts with" a
<!DOCTYPE and fall back to the existing parsing code for DOCUMENT
if it does.

... where "starts with" actually means "possibly following some
whitespace, comments, or PIs, but you can stop looking if you see
a start-element", so essentially a port to C of:

https://github.com/tada/pljava/blob/V1_5_1/pljava/src/main/java/org/postgresql/pljava/jdbc/SQLXMLImpl.java#L409

which decides whether the input should be passed straight to the DOCUMENT-
style parser or somehow treated specially to parse as CONTENT. In Java
the special treatment involves a wrapping element, in xml.c it involves
calling a different libxml2 function, xmlParseBalancedChunkMemory, but
the choice of which method to apply is the same choice.

IIRC, XML comments don't nest, so it may be that "possibly following
some whitespace, comments, or PIs" could be shown to be a regular language,
and checked with a regex. I did it the more explicit way in Java for
clarity, and because the API was there, and so I wouldn't have to think
about it.

-Chap


pgsql-hackers by date:

Previous
From: Marius Timmer
Date:
Subject: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full
Next
From: Hironobu SUZUKI
Date:
Subject: Re: Support custom socket directory in pg_upgrade