Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

From Jim Jones
Subject Re: [PATCH] Add pretty-printed XML output option
Date
Msg-id abd25443-ef6d-7b8a-c593-a2a991d3e5ce@uni-muenster.de
Whole thread Raw
In response to Re: [PATCH] Add pretty-printed XML output option  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Add pretty-printed XML output option  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 14.03.23 18:40, Tom Lane wrote:
> Jim Jones <jim.jones@uni-muenster.de> writes:
>> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]
> I poked at this for awhile and ran into a problem that I'm not sure
> how to solve: it misbehaves for input with embedded DOCTYPE.
>
> regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
>   xmlserialize
> --------------
>   <!DOCTYPE a>+
>   <a></a>     +
>   
> (1 row)

The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements 
to be serialized with start-end tag pairs. Removing it did the trick ...

postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
  xmlserialize
--------------
  <!DOCTYPE a>+
  <a/>        +

(1 row)

... but as a side effect empty start-end tags will be now serialized as 
empty elements

postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text 
INDENT);
  xmlserialize
--------------
  <foo>       +
    <bar/>    +
  </foo>
(1 row)

It seems to be the standard behavior of other xml indent tools 
(including Oracle)

> regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
>   xmlserialize
> --------------
>   
> (1 row)
>
> The bad result for CONTENT is because xml_parse() decides to
> parse_as_document, but xmlserialize_indent has no idea that happened
> and tries to use the content_nodes list anyway.  I don't especially
> care for the laissez faire "maybe we'll set *content_nodes and maybe
> we won't" API you adopted for xml_parse, which seems to be contributing
> to the mess.  We could pass back more info so that xmlserialize_indent
> knows what really happened.

I added a new (nullable) parameter to the xml_parse function that will 
return the actual XmlOptionType used to parse the xml data. Now 
xmlserialize_indent knows how the data was really parsed:

postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
  xmlserialize
--------------
  <!DOCTYPE a>+
  <a/>        +

(1 row)

I added test cases for these queries.

v23 attached.

Thanks!

Best, Jim

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Add pg_walinspect function with block info columns
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] Add pretty-printed XML output option