Home > mailing lists

Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [PATCH] Add pretty-printed XML output option
Date	March 16, 2023 00:13:36
Msg-id	3622266.1678914816@sss.pgh.pa.us Whole thread Raw
In response to	Re: [PATCH] Add pretty-printed XML output option (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option) Re: [PATCH] Add pretty-printed XML output option
List	pgsql-hackers

Tree view

I wrote:
> Huh, interesting.  That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation".  I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
  t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

            regards, tom lane

pgsql-hackers by date:

From: Andrew Dunstan
Date: 15 March 2023, 23:43:33
Subject: Re: Add support for DEFAULT specification in COPY FROM

From: Thomas Munro
Date: 16 March 2023, 00:13:56
Subject: Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED

Re: [PATCH] Add pretty-printed XML output option - Mailing list pgsql-hackers

Previous

Next