On 10.03.23 15:32, Tom Lane wrote:
Jim Jones <jim.jones@uni-muenster.de> writes:
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,
How do you suggest the output should look like?
I'd say a series of node trees, each starting on a separate line.
v22 attached enables the usage of INDENT with non singly-rooted xml.
postgres=# SELECT xmlserialize (CONTENT '<bar><val x="y">42</val></bar><foo>73</foo>' AS text INDENT);
xmlserialize
-----------------------
<bar> +
<val x="y">42</val>+
</bar> +
<foo>73</foo>
(1 row)
I tried several libxml2 dump functions and none of them could cope very well with an xml string without a root node. So added them into a temporary root node, so that I could iterate over its children and add them one by one (formatted) into the output buffer.
I slightly modified the existing xml_parse() function to return the list of nodes parsed by xmlParseBalancedChunkMemory:
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
int encoding, Node *escontext, xmlNodePtr *parsed_nodes)
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, parsed_nodes);
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.
Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward. The input *is* (or had better be)
in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();
Thanks!
Best, Jim