BUG #18274: Error 'invalid XML content' - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18274: Error 'invalid XML content'
Date
Msg-id 18274-98d16bc03520665f@postgresql.org
Whole thread Raw
Responses Re: BUG #18274: Error 'invalid XML content'  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18274
Logged by:          Dmitry Koval
Email address:      d.koval@postgrespro.ru
PostgreSQL version: 16.1
Operating system:   Ubuntu 22.04
Description:

Hello!
It's easy to get an 'invalid XML content' error when using UTF-8 special
characters:

>select length((repeat('ї', 10 * 1000 * 1000))::xml::text::bytea);
ERROR:  invalid XML content
DETAIL:  line 1: xmlSAX2Characters: huge text node
їїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїїї

This error is not directly related to UTF-8, since this query is processed
without an error:

>select length((repeat('a', 100 * 1000 * 1000))::xml::text::bytea);
  length   
-----------
 100000000
(1 row)


The problem is in the libxml2 library (in xmlParseBalancedChunkMemory
function), which is used in PostgreSQL and does not support the
XML_PARSE_HUGE flag.
There have been attempts to correct this problem [1].
Apparently they were unsuccessful because libxml2 technical support refused
to fix the xmlParseBalancedChunkMemory function.

I'd like to know what the community's opinion is regarding this error:
1) the error is correct and does not need to be corrected;
2) corrections should be made in the libxml2 library;
3) corrections should be made in PostgreSQL (maybe need to stop using the
xmlParseBalancedChunkMemory function or make other corrections);
4) ...?

[1] https://gitlab.gnome.org/GNOME/libxml2/-/issues/167
----
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com


pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Next
From: Noah Misch
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()