Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Add pretty-printed XML output option
Date
Msg-id 2752578.1678815625@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] Add pretty-printed XML output option  (Jim Jones <jim.jones@uni-muenster.de>)
Responses Re: [PATCH] Add pretty-printed XML output option  (Jim Jones <jim.jones@uni-muenster.de>)
List pgsql-hackers
Jim Jones <jim.jones@uni-muenster.de> writes:
> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
 xmlserialize 
--------------
 <!DOCTYPE a>+
 <a></a>     +
 
(1 row)

regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
 xmlserialize 
--------------
 
(1 row)

The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway.  I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess.  We could pass back more info so that xmlserialize_indent
knows what really happened.  However, that won't fix the bogus output
for the DOCUMENT case.  Are we perhaps passing incorrect flags to
xmlSaveToBuffer?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: ICU locale validation / canonicalization
Next
From: Andres Freund
Date:
Subject: DROP DATABASE is interruptible