Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Add pretty-printed XML output option
Date
Msg-id 3622266.1678914816@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] Add pretty-printed XML output option  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PATCH] Add pretty-printed XML output option  (Jim Jones <jim.jones@uni-muenster.de>)
List pgsql-hackers
I wrote:
> Huh, interesting.  That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation".  I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
  t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Add support for DEFAULT specification in COPY FROM
Next
From: Thomas Munro
Date:
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED