I wrote:
> Huh, interesting. That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation". I'm okay with it personally; anyone want to object?
Hearing no objections to that, I moved ahead with this.
It occurred to me to test v23 for memory leaks, and it had bad ones:
* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.
* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.
After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.
Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!
(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)
BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like
do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;
That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?
regards, tom lane