Thread: BUG #18943: Return value of a function 'xmlBufferCreate' is dereferenced at xpath.c:177 without checking for NUL

The following bug has been logged on the website:

Bug reference:      18943
Logged by:          Karavaev Alexey
Email address:      maralist86@mail.ru
PostgreSQL version: 17.5
Operating system:   alt workstation 10.4 x86_64
Description:

Return value of a function 'xmlBufferCreate' is dereferenced at xpath.c:177
without checking for NULL, but it is usually checked for this function.
The function 'xmlBufferCreate' may return NULL, and in some cases the code
xmlStrdup(buf->content) will throw an error.


On Tue, Jun 03, 2025 at 01:15:33PM -0400, Tom Lane wrote:
> Thanks for taking this on --- contrib/xml2 is really a mess so far as
> error handling goes.  Your patch looks like an improvement, although
> I do have one concern: the routines in xml.c that use an xmlerrcxt
> seem to check xmlerrcxt->err_occurred pretty frequently, eg xmlStrdup
> is used like this:
>
>             doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
>             if (doc->encoding == NULL || xmlerrcxt->err_occurred)
>                 xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>                             "could not allocate XML document");
>
> Not sure if that's needed here.

Yes, I was also wondering a bit about this part a couple of days ago
while looking at the module's code and xml.c.  Looking at cacd42d62cb2
and its thread (particularly [1]), I think that we need to bite the
bullet or we are going to miss some error context.

PgXmlErrorContext can remain private in xml.c as we can reuse
pg_xml_error_occurred() to do the job for out-of-core code.

> There's more that could be looked at, if you feel like it:

Well, now that you mention these things, I do feel like it while I
have my hands on this area of the code.

> xml_encode_special_chars seems to need a PG_TRY block to avoid
> possibly leaking "tt".  I also wonder if it's really not possible
> for xmlEncodeSpecialChars to fail and return NULL.  (xmltext()
> doesn't believe that either, but maybe it's wrong too.)

Oh, missed this call.  xmlEncodeSpecialChars() relies on
xmlEscapeText(), which can return NULL on allocation failures based
on a check in the upstream code (first argument of the routine is not
used, only second is).  So xmltext() and xml_encode_special_chars()
are both wrong; we need a try/catch block for the edge case where
cstring_to_text_with_len() or cstring_to_text() could fail an
allocation or we would leak what could have been allocated by libxml.

> The usage of cleanup_workspace seems quite a mess: one caller
> uses a PG_TRY block to ensure it's called, but the rest don't.
> I also find it confusing that pgxml_xpath returns a value that is
> also referenced in the workspace and cleanup_workspace is responsible
> for freeing.  I wonder if there's a better way to do that.

Yes, that's not optimal.  This comes down to the fact that we need
workspace->res to not be free'd by libxml until the result of these
SQL functions is generated, and even with my previous patch we could
leak it if one of the allocations done for the function results fail.

I've been wondering about a few approaches, like adding the error
context to the workspace, but that did not feel right as we require
the error context before the try/catch block, and the workspace and
its internals allocated by libxml need to be fully handled in the
scope of the try/catch.  So I've finished with the attached, where the
workspace and its cleanup routine gain volatile declarations, keeping
pg_xml_done() outside the workspace logic.  This is a rather
mechanical change if done this way with the try/catch blocks moved one
level higher, but as long as we need to hold on the result inside
the workspace, I'm feeling that this is a cleaner approach in the
long-run for xml2, because it's impossible to miss what's in the scope
of the catch cleanup with the cleanup policy enforced in the
definition of cleanup_workspace().

> In the end of xslt_process, I wonder why
>
>     if (resstr)
>         xmlFree((xmlChar *) resstr);
>
> isn't done before the pg_xml_done call.

Good point.  Fixed.

All that is rather unlikely going to be a problem in practice, so I
don't really feel a strong reason to backpatch any of that.  v18 would
be OK, but we could just also wait for v19 based on how these are
unlikely going to be problems in the field.

Or it's worth worrying about a backpatch of the xml.c code paths which
are in the core backend?  The xmltext() case is isolated, so this part
is not invasive.

[1]: https://www.postgresql.org/message-id/23388.1311118974%40sss.pgh.pa.us
--
Michael

Attachment
On 05.06.25 11:38, Jim Jones wrote:
> 
> 
> Hi Michael
> 
> Am Do., 5. Juni 2025 um 10:11 Uhr schrieb Michael Paquier
> <michael@paquier.xyz <mailto:michael@paquier.xyz>>:
> 
>     On Tue, Jun 03, 2025 at 01:15:33PM -0400, Tom Lane wrote:
>     > Thanks for taking this on --- contrib/xml2 is really a mess so far as
>     > error handling goes.  Your patch looks like an improvement, although
>     > I do have one concern: the routines in xml.c that use an xmlerrcxt
>     > seem to check xmlerrcxt->err_occurred pretty frequently, eg xmlStrdup
>     > is used like this:
>     >
>     >             doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
>     >             if (doc->encoding == NULL || xmlerrcxt->err_occurred)
>     >                 xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>     >                             "could not allocate XML document");
>     >
>     > Not sure if that's needed here.
> 
>     Yes, I was also wondering a bit about this part a couple of days ago
>     while looking at the module's code and xml.c.  Looking at cacd42d62cb2
>     and its thread (particularly [1]), I think that we need to bite the
>     bullet or we are going to miss some error context.
> 
>     PgXmlErrorContext can remain private in xml.c as we can reuse
>     pg_xml_error_occurred() to do the job for out-of-core code. 
> 
>     > There's more that could be looked at, if you feel like it:
> 
>     Well, now that you mention these things, I do feel like it while I
>     have my hands on this area of the code.
> 
>     > xml_encode_special_chars seems to need a PG_TRY block to avoid
>     > possibly leaking "tt".  I also wonder if it's really not possible
>     > for xmlEncodeSpecialChars to fail and return NULL.  (xmltext()
>     > doesn't believe that either, but maybe it's wrong too.)
> 
>     Oh, missed this call.  xmlEncodeSpecialChars() relies on
>     xmlEscapeText(), which can return NULL on allocation failures based
>     on a check in the upstream code (first argument of the routine is not
>     used, only second is).  So xmltext() and xml_encode_special_chars()
>     are both wrong; we need a try/catch block for the edge case where
>     cstring_to_text_with_len() or cstring_to_text() could fail an
>     allocation or we would leak what could have been allocated by libxml.
> 
> 
> Yeah, xmlEscapeText() does return NULL in some cases[1,2] and
> xmlEncodeSpecialChars() doesn't mind.
> 
> Taking a further look at xml.c I am wondering if other functions might
> also need some attention in this regard:
> 
> * xmlTextWriterStartElement [3]
> * xmlTextWriterWriteAttribute [4]
> * xmlTextWriterWriteRaw [5]
> * xmlTextWriterEndAttribute [6]
> 
> We're assuming they never fail. Perhaps something like this?
>  ...
>  nbytes = xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
>  if (nbytes == -1 || xmlerrcxt->err_occurred)
>     xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>                         "could not allocate xmlTextWriterStartElement");
> 
> Best regards, Jim
> 
> 1 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlIO.c#L205
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlIO.c#L205>
> 2 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlIO.c#L284
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlIO.c#L284>
> 3 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L967
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L967>
> 4 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1950
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1950>
> 5 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1323
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1323>
> 6 -
> https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1839
<https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/xmlwriter.c#L1839> 
> 

Oups .. just replied to Michael.
Sorry!

Jim



On 05.06.25 11:47, Jim Jones wrote:
> Taking a further look at xml.c I am wondering if other functions might
> also need some attention in this regard:
> 
> * xmlTextWriterStartElement [3]
> * xmlTextWriterWriteAttribute [4]
> * xmlTextWriterWriteRaw [5]
> * xmlTextWriterEndAttribute [6]
> 
> We're assuming they never fail. Perhaps something like this?
>  ...
>  nbytes = xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
>  if (nbytes == -1 || xmlerrcxt->err_occurred)
>     xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>                         "could not allocate xmlTextWriterStartElement");
> 

There is also a further xmlXPathCastNodeToString() call in xml.c at
xml_xmlnodetoxmltype() - it calls xmlNodeGetContent() and it can return
NULL.

xmlChar    *str;
str = xmlXPathCastNodeToString(cur);

PG_TRY();
{
  /* Here we rely on XML having the same representation as TEXT */
  char       *escaped = escape_xml((char *) str);

  result = (xmltype *) cstring_to_text(escaped);
  pfree(escaped);
}
PG_FINALLY();
{
  xmlFree(str);
}
PG_END_TRY();

The function pgxmlNodeSetToText() also calls xmlXPathCastNodeToString(),
but apparently xmlBufferAdd() can handle NULL values.[1]

Best regards, Jim

1 -
https://github.com/GNOME/libxml2/blob/2b6b3945f2df548b56f2c73c490dda9781f92eb2/buf.c#L989






On Thu, Jun 05, 2025 at 04:15:19PM +0200, Jim Jones wrote:
> On 05.06.25 11:47, Jim Jones wrote:
>> Taking a further look at xml.c I am wondering if other functions might
>> also need some attention in this regard:
>>
>> * xmlTextWriterStartElement [3]
>> * xmlTextWriterWriteAttribute [4]
>> * xmlTextWriterWriteRaw [5]
>> * xmlTextWriterEndAttribute [6]

It seems to me that you mean xmlTextWriterEndElement() and not
xmlTextWriterEndAttribute() for the last one.

I've been looking at the rest of the callers (this took some time),
like:
- xmlTextWriterWriteBase64, OK.
- xmlTextWriterWriteBinHex, OK.
- xmlNewStringInputStream, which is intentional in xmlPgEntityLoader()
- xmlAddChildList, where we expect valid input.
- xmlXPathCompiledEval, where valid input is expected.
- xmlXPathNewContext, which is incorrect, could fail an allocation.
- xmlReadMemory, looks OK.
- xmlBufferWriteChar, which could fail on OOM if they need to grow
memory, but let's leave these as they are; I suspect that
xmlBufferCreate() would fail anyway.

>> We're assuming they never fail. Perhaps something like this?
>>  ...
>>  nbytes = xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
>>  if (nbytes == -1 || xmlerrcxt->err_occurred)
>>     xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>>                         "could not allocate xmlTextWriterStartElement");

Oh.  These can return XML_ERR_NO_MEMORY as well, with more error
patterns..

> There is also a further xmlXPathCastNodeToString() call in xml.c at
> xml_xmlnodetoxmltype() - it calls xmlNodeGetContent() and it can return
> NULL.

And it's documented as a routine that returns an allocated string, so
yes, we would miss allocation failures but we should not.  I think
that we should move the call of xmlXPathCastNodeToString() inside the
PG_TRY block and rely on the xmlerrcxt given by the caller to log an
error if an allocation fails, marking xmlChar *str as volatile to free
it in the finally block if required.

> The function pgxmlNodeSetToText() also calls xmlXPathCastNodeToString(),
> but apparently xmlBufferAdd() can handle NULL values.[1]

Yeah, the patch I have posted upthread gets that done better.

What do you think?
--
Michael

Attachment
On 06.06.25 07:54, Michael Paquier wrote:
> On Thu, Jun 05, 2025 at 04:15:19PM +0200, Jim Jones wrote:
>> On 05.06.25 11:47, Jim Jones wrote:
>>> Taking a further look at xml.c I am wondering if other functions might
>>> also need some attention in this regard:
>>>
>>> * xmlTextWriterStartElement [3]
>>> * xmlTextWriterWriteAttribute [4]
>>> * xmlTextWriterWriteRaw [5]
>>> * xmlTextWriterEndAttribute [6]
> 
> It seems to me that you mean xmlTextWriterEndElement() and not
> xmlTextWriterEndAttribute() for the last one.

+1
The return value of the xmlTextWriter* functions are now properly evaluated.

> - xmlBufferWriteChar, which could fail on OOM if they need to grow
> memory, but let's leave these as they are; I suspect that
> xmlBufferCreate() would fail anyway.
> 

I also think we're safe in this case. xmlBufferAdd() can return
XML_ERR_ARGUMENT if the xmlBuffer* or the xmlChar* are NULL, which
cannot happen in this part of the code. XML_ERR_NO_MEMORY is also
unlikely to happen, since xmlBufferWriteChar() and xmlBufferWriteCHAR()
call xmlBufferAdd() with a -1 length.


>>> We're assuming they never fail. Perhaps something like this?
>>>  ...
>>>  nbytes = xmlTextWriterStartElement(writer, (xmlChar *) xexpr->name);
>>>  if (nbytes == -1 || xmlerrcxt->err_occurred)
>>>     xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
>>>                         "could not allocate xmlTextWriterStartElement");
> 
> Oh.  These can return XML_ERR_NO_MEMORY as well, with more error
> patterns..

+1

> 
>> There is also a further xmlXPathCastNodeToString() call in xml.c at
>> xml_xmlnodetoxmltype() - it calls xmlNodeGetContent() and it can return
>> NULL.
> 
> And it's documented as a routine that returns an allocated string, so
> yes, we would miss allocation failures but we should not.  I think
> that we should move the call of xmlXPathCastNodeToString() inside the
> PG_TRY block and rely on the xmlerrcxt given by the caller to log an
> error if an allocation fails, marking xmlChar *str as volatile to free
> it in the finally block if required.

+1

> 
>> The function pgxmlNodeSetToText() also calls xmlXPathCastNodeToString(),
>> but apparently xmlBufferAdd() can handle NULL values.[1]
> 
> Yeah, the patch I have posted upthread gets that done better.
> 
> What do you think?

Going through xml.c once again, I stumbled upon xmlAddChildList(), which
returns the last child or NULL in case of error. [1]

...
xmlAddChildList(root, content_nodes);


So, perhaps this?

if (xmlAddChildList(root, content_nodes) == NULL ||
    xmlerrcxt->err_occurred)
  xml_ereport(xmlerrcxt,
     ERROR, ERRCODE_OUT_OF_MEMORY,
     "could not add content nodes to root element");



-- 
Jim


1-
https://github.com/GNOME/libxml2/blob/c6206c93872fc91d98fbc61215c5618b629bf915/tree.c#L2976