Thread: XML element with special characters can be created, serialized, but not deserialized


Hello,

I am using PostgreSQL 13.8 and I think that I found an issue with XML serialization and deserialization.

A text that has special characters cannot be converted to XML even if it was created by serializing an XML element.

In our case a string contains a special character with the ASCII code 19, placed between the letters i and p.
The simple statement that serializes an XML element works.
select xmlelement(name "street",'i p')::text

When the same text has to be converted back to XML. it fails with an error

select xmlelement(name "street",'i p')::text::xml


The error message is

SQL Error [2200N]: ERROR: invalid XML content
  Detail: line 1: PCDATA invalid Char value 19
<street>i p</street>
         ^
line 1: chunk is not well balanced
<street>i p</street>
                    ^

The expected behaviour would be to successfully parse an XML element that was created and serialized by the same engine. 

Best regards,
--
Serghei Ignat
Sergiu Ignat <sergiu@bitsoftware.ro> writes:
> I am using PostgreSQL 13.8 and I think that I found an issue with XML
> serialization and deserialization.

Hmm.  The root cause here seems to be that escape_xml() thinks it
doesn't need to escape ASCII control characters, other than CR (\r).
Which is a bit backwards, because after some googling I conclude that
XML 1.1 requires all C0 and C1 control characters to be represented as
numeric escapes *except* CR, LF, and TAB [1].

What we probably ought to do is escape all except LF and TAB.
However, I'm a bit hesitant to back-patch such a behavioral change.
Maybe change this in HEAD (v16) only?

            regards, tom lane

[1] https://www.w3.org/International/questions/qa-controls