Re: Regression with large XML data input - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Regression with large XML data input
Date
Msg-id 997668.1753802857@sss.pgh.pa.us
Whole thread Raw
In response to Re: Regression with large XML data input  (Jim Jones <jim.jones@uni-muenster.de>)
Responses Re: Regression with large XML data input
List pgsql-hackers
Jim Jones <jim.jones@uni-muenster.de> writes:
> On 29.07.25 14:11, Tom Lane wrote:
>> In the original coding, there was a hazard of the node list getting
>> leaked if the caller passed parsed_nodes == NULL.  Or at least I
>> thought there was.  It may be that all releases of libxml2 are smart
>> enough to free the node list if there's no way to pass it back,
>> but I guess we had reason not to trust it.  Possibly there's something
>> about that in the discussion that led up to 6082b3d5d, though I see
>> I neglected to mention it in the commit message.

> I see.. thanks for explaining.

Re-reading the prior thread, I see that my memory above is quite
faulty: we added the node_list intermediate variable as a way to
detect errors when the return code from xmlParseBalancedChunkMemory
couldn't be trusted.  So I think you're right to question whether we
still need it.  I tried reverting to just passing parsed_nodes, and
I don't see any leak in either the normal or error paths --- so at
least with the quite-old version of libxml2 I'm testing, there is no
such bug.

> I went through the discussions and the libxml2 issue, and I also think
> it is prudent to keep it like that :)

I've got mixed feelings about it now.  I think the $64 question
is whether there are any cases in which xmlParseBalancedChunkMemory
thinks things are fine (and returns a node list) but then we conclude
there's an error, perhaps as a consequence of xmlerrcxt->err_occurred
having become set earlier.  That's a little bit of a stretch.

In any case, I now realize that I broke that scenario yesterday
because I forgot that xml_errsave could throw a longjmp --- so freeing
the node list after calling it is too late :-(

On the whole I'm inclined to revert to the previous coding without
a node_list variable.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Making pgfdw_report_error statically analyzable
Next
From: Andrew Dunstan
Date:
Subject: Re: split func.sgml to separated individual sgml files