Thread: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details

Dear PG hackers,

here is a patch that improves the xslt_process() error handling. Error
handling was one of the points criticized earlier [1] and as far as I
understand it is one of the reasons that xslt_process() was never adopted
into core and contrib/xml2 is considered deprecated since PG 8. However the
point is also still on the TODO list ("Report errors returned by the XSLT
library") and xslt_process() has been asked for several times on the mailing
list. Memory handling bugs that have been reported earlier [2] have obviously
long been fixed.

The point raised in [1] was that xslt_process() would not report missing
stylesheet parameters at all. This hasn't been the case at the current HEAD.
However, xslt_process() still wouldn't report any details about errors
related to XSLT processing. This is what is fixed by this patch. It now
installs an XSLT error handler. See the patch comment for details. Since the
goal is to eventually get xslt_process() adopted into core, I also got rid of
PG_XML_STRICTNESS_LEGACY.

Perhaps you can tell me what else is preventing adoption into core. I believe
that xslt_process() should also accept the `xml` type as an alternative to
strings. Strings should be kept for backwards compatibility, though. Also,
the stylesheet parameter passing is very crude and limited. I suggest, that
the current method of passing them as strings (see parse_params()) is
deprecated and we don't try to tackle its limitations, but will instead
accept hstores. If you agree, I will work on these two tasks next.

Yours sincerely,
Robin Haberkorn

[1]: https://www.postgresql.org/message-id/4C5ECEF9.3030806%40mlfowler.com
[2]: https://www.postgresql.org/message-id/flat/4A6A276A.6090405%40dunslane.net

--
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537



I forgot the patch of course...

--
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537

Attachment
"Robin Haberkorn" <haberkorn@b1-systems.de> writes:
> I forgot the patch of course...

Hi Robin, thank you for the patch!  This has arrived too late to
be considered for PG v18, but please add the thread to the open
commitfest

https://commitfest.postgresql.org/53/

so that we don't forget about it for v19, and so that it will
receive automated testing from our cfbot.

Just from a quick eyeballing of the patch, it seems generally
sane, but I wonder why you have xml_generic_error_handler
creating and then destroying an errorBuf instead of just
appending the results directly to xmlerrcxt->err_buf.  The
extra buffer doesn't seem to be doing much except increasing
our odds of an OOM failure right there.

I'm also a little suspicious of

-    xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
+    xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);

as that looks like it is probably changing the behavior of
the module in ways bigger than just providing more error
details.  This code has been in legacy status for so long
that I feel a little hesitant about doing that.  If we are
willing to do it, should we go further and clean out the use
of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well?

            regards, tom lane



Hello Tom!

On Tue Apr 22, 2025 at 18:29:02 GMT +03, Tom Lane wrote:
> Hi Robin, thank you for the patch!  This has arrived too late to
> be considered for PG v18, but please add the thread to the open
> commitfest

I will do that. But what about the other changes I proposed (xml data type
and hstore support)? Should they all be discussed in this thread or should I
create separate threads and consequently separate patch entries for the
Commitfest? And should they all be based on the master branch or can they be
based upon each other? Do you think they will help to bring xslt_process()
closer to adoption into core?

> Just from a quick eyeballing of the patch, it seems generally
> sane, but I wonder why you have xml_generic_error_handler
> creating and then destroying an errorBuf instead of just
> appending the results directly to xmlerrcxt->err_buf.  The
> extra buffer doesn't seem to be doing much except increasing
> our odds of an OOM failure right there.

You are right. It was a leftover from thinning out xml_errorHandler().
I attached a new version of the patch where this is fixed.
I also ran all touched files through pgindent.

> I'm also a little suspicious of
>
> -    xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_LEGACY);
> +    xmlerrcxt = pgxml_parser_init(PG_XML_STRICTNESS_ALL);
>
> as that looks like it is probably changing the behavior of
> the module in ways bigger than just providing more error
> details.  This code has been in legacy status for so long
> that I feel a little hesitant about doing that.

You wouldn't actually need that change to get more error details as the
details are provided by xml_generic_error_handler() which ignores the
strictness level. The reason is that libxml generic error handlers cannot
discern between warnings and errors. The only thing that the strictness level
does is affecting how xml_errorHandler() works: With STRICTNESS_ALL, it can
actually log warnings without appending anything to the err_buf and set the
err_occurred flag which is apparently useful because some libxml functions
can result in error-level messages without the function calls signalling
failure. At least, otherwise the code doesn't make sense to me. In other
words, using the STRICTNESS_ALL should allow you to see warnings, you
wouldn't otherwise see and catch errors, you wouldn't otherwise catch -- at
least if they occur in libxml itself. As far as I understand from the code
comment xml_errorHandler(), STRICTNESS_LEGACY was introduced, just so you
don't have to touch the existing contrib/xml2 code that did not check
err_occurred:

/*
 * Legacy error handling mode.  err_occurred is never set, we just add the
 * message to err_buf.  This mode exists because the xml2 contrib module
 * uses our error-handling infrastructure, but we don't want to change its
 * behaviour since it's deprecated anyway.  This is also why we don't
 * distinguish between notices, warnings and errors here --- the old-style
 * generic error handler wouldn't have done that either.
 */

Since we now check that flag in xslt_proc.c, we should be fine.
But I might be overlooking something.

> If we are willing to do it, should we go further and clean out the use
> of PG_XML_STRICTNESS_LEGACY in xml2/xpath.c as well?

I think this is a separate construction site. We don't necessarily have
to touch xpath.c and I didn't plan to do so in the near future.
We should first determine whether the XPath support is actually worth
keeping (should be adopted into core) or rather stay deprecated.

Best regards,
Robin Haberkorn

--
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537

Attachment
On Wed Apr 23, 2025 at 00:42:06 GMT +03, Robin Haberkorn wrote:
> /*
>  * Legacy error handling mode.  err_occurred is never set, we just add the
>  * message to err_buf.  This mode exists because the xml2 contrib module
>  * uses our error-handling infrastructure, but we don't want to change its
>  * behaviour since it's deprecated anyway.  This is also why we don't
>  * distinguish between notices, warnings and errors here --- the old-style
>  * generic error handler wouldn't have done that either.
>  */

btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
back in 2011.
The commit message also contains some background information.

--
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537



"Robin Haberkorn" <haberkorn@b1-systems.de> writes:
> btw. you wrote that comment in cacd42d62cb2ddf32135b151f627780a5509780f
> back in 2011.
> The commit message also contains some background information.

Hm.  I wonder if we couldn't get rid of the
HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization.  If the
libxml2 versions that lacked that were "old" in 2011, surely
they are extinct in the wild by now.  I'm thinking that we
really don't want to be using the generic handler at all, given
the commit log's comment that it can't distinguish between
error, warning, and notice cases.

            regards, tom lane



On Thu Apr 24, 2025 at 01:59:10 GMT +03, Tom Lane wrote:
> Hm.  I wonder if we couldn't get rid of the
> HAVE_XMLSTRUCTUREDERRORCONTEXT conditionalization.  If the
> libxml2 versions that lacked that were "old" in 2011, surely
> they are extinct in the wild by now.  I'm thinking that we
> really don't want to be using the generic handler at all, given
> the commit log's comment that it can't distinguish between
> error, warning, and notice cases.

libxml 2.7.4 was released 15 years ago, so yes these #ifdefs can
probably be removed already. What's the oldest distribution/OS you
want to support PG on?
Even Ubuntu 14.04 from 2014 already had libxml 2.9.1.

If you like, I will prepare another patch to remove
HAVE_XMLSTRUCTUREDERRORCONTEXT in a separate thread.

PS: I added the libxslt error handling patch to the next commitfest:
https://commitfest.postgresql.org/patch/5718/

--
Robin Haberkorn
Senior Software Engineer

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / https://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt, HRB 3537