Thread: XML element with special characters can be created, serialized, but not deserialized
XML element with special characters can be created, serialized, but not deserialized
From
Sergiu Ignat
Date:
Hello,
I am using PostgreSQL 13.8 and I think that I found an issue with XML serialization and deserialization.
A text that has special characters cannot be converted to XML even if it was created by serializing an XML element.
In our case a string contains a special character with the ASCII code 19, placed between the letters i and p.
The simple statement that serializes an XML element works.
select xmlelement(name "street",'i p')::text
When the same text has to be converted back to XML. it fails with an error
select xmlelement(name "street",'i p')::text::xml
The error message is
SQL Error [2200N]: ERROR: invalid XML content
Detail: line 1: PCDATA invalid Char value 19
<street>i p</street>
^
line 1: chunk is not well balanced
<street>i p</street>
^
The expected behaviour would be to successfully parse an XML element that was created and serialized by the same engine.
Detail: line 1: PCDATA invalid Char value 19
<street>i p</street>
^
line 1: chunk is not well balanced
<street>i p</street>
^
The expected behaviour would be to successfully parse an XML element that was created and serialized by the same engine.
Best regards,
-- Serghei Ignat
Re: XML element with special characters can be created, serialized, but not deserialized
From
Tom Lane
Date:
Sergiu Ignat <sergiu@bitsoftware.ro> writes: > I am using PostgreSQL 13.8 and I think that I found an issue with XML > serialization and deserialization. Hmm. The root cause here seems to be that escape_xml() thinks it doesn't need to escape ASCII control characters, other than CR (\r). Which is a bit backwards, because after some googling I conclude that XML 1.1 requires all C0 and C1 control characters to be represented as numeric escapes *except* CR, LF, and TAB [1]. What we probably ought to do is escape all except LF and TAB. However, I'm a bit hesitant to back-patch such a behavioral change. Maybe change this in HEAD (v16) only? regards, tom lane [1] https://www.w3.org/International/questions/qa-controls