Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

From Jim Jones
Subject Re: [PATCH] Add pretty-printed XML output option
Date
Msg-id efe0b19b-d41c-f8ab-f3b8-afb0108f3706@uni-muenster.de
Whole thread Raw
In response to Re: [PATCH] Add pretty-printed XML output option  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Add pretty-printed XML output option  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 10.03.23 15:32, Tom Lane wrote:
Jim Jones <jim.jones@uni-muenster.de> writes:
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,
How do you suggest the output should look like?
I'd say a series of node trees, each starting on a separate line.

v22 attached enables the usage of INDENT with non singly-rooted xml.

postgres=# SELECT xmlserialize (CONTENT '<bar><val x="y">42</val></bar><foo>73</foo>' AS text INDENT);
     xmlserialize      
-----------------------
 <bar>                +
   <val x="y">42</val>+
 </bar>               +
 <foo>73</foo>
(1 row)

I tried several libxml2 dump functions and none of them could cope very well with an xml string without a root node. So added them into a temporary root node, so that I could iterate over its children and add them one by one (formatted) into the output buffer.

I slightly modified the existing xml_parse() function to return the list of nodes parsed by xmlParseBalancedChunkMemory:

xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
          int encoding, Node *escontext, xmlNodePtr *parsed_nodes)
      
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
                                                       utf8string + count, parsed_nodes);

I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
uses the encoding of the given doc and UTF8 if not provided.
Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();

Thanks!

Best, Jim

Attachment

pgsql-hackers by date:

Previous
From: 'Sandro Santilli'
Date:
Subject: Re: Ability to reference other extensions by schema in extension scripts
Next
From: Dean Rasheed
Date:
Subject: Re: Lock mode in ExecMergeMatched()