Thread: BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

BUG #16277: xmlelement allows invalid XML characters when XML version is set to 1.0

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      16277
Logged by:          Andreas Lennartsson
Email address:      andreas@apkudo.com
PostgreSQL version: 10.7
Operating system:   Ubuntu
Description:

The following example:
SELECT
  xmlroot (
     xmlelement (name "test", CHR(26))
  , version '1.0'
  )

Produces xml with the invalid ASCII character 26.

The documentation states:
Element content, if specified, will be formatted according to its data type.
If the content is itself of type xml, complex XML documents can be
constructed.
Content of other types will be formatted into valid XML character data. This
means in particular that the characters <, >, and & will be converted to
entities. Binary data (data type bytea) will be represented in base64 or hex
encoding, depending on the setting of the configuration parameter xmlbinary.
The particular behavior for individual data types is expected to evolve in
order to align the SQL and PostgreSQL data types with the XML Schema
specification, at which point a more precise description will appear.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following example:
> SELECT
>   xmlroot (
>      xmlelement (name "test", CHR(26))
>   , version '1.0'
>   )

> Produces xml with the invalid ASCII character 26.

On what grounds do you call it invalid?  What other behavior
would you expect?

            regards, tom lane



Re: BUG #16277: xmlelement allows invalid XML characters when XMLversion is set to 1.0

From
Andreas Lennartsson
Date:
>On what grounds do you call it invalid?
Based on the valid control characters for XML 1.0 https://en.wikipedia.org/wiki/Valid_characters_in_XML

>What other behavior would you expect?
I would expect valid XML 1.0 to be generated on success.
If that is not possible I would expect an error.

Thanks,

Andreas 

On Tue, Feb 25, 2020 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> The following example:
> SELECT
>   xmlroot (
>      xmlelement (name "test", CHR(26))
>   , version '1.0'
>   )

> Produces xml with the invalid ASCII character 26.

On what grounds do you call it invalid?  What other behavior
would you expect?

                        regards, tom lane
Andreas Lennartsson <andreas@apkudo.com> writes:
>> On what grounds do you call it invalid?

> Based on the valid control characters for XML 1.0
> https://en.wikipedia.org/wiki/Valid_characters_in_XML

Hm.  According to that, C0 control characters *are* legal in XML 1.1,
which would mean that to do this strictly correctly we'd have to
understand the differences between different XML versions, which we
don't --- and, as best I can tell in some quick testing, libxml2
doesn't either.  At least, it will happily take random values for the
document version.

xmlroot() just wraps the given XML text in a new outer <xml> declaration,
without any regard for whether the new version number allows or disallows
things that the possibly-implicit version would've allowed before.  That
seems of a piece with the generally cavalier treatment of the version
in the rest of xml.c, though.

TBH, it's unlikely that anyone is going to care about this enough
to fix it, even if you could get consensus that making the code
more strict was a good idea.  (Backwards compatibility would argue
against that, so I'm not sure such consensus would be easy to get.)
But if you're sufficiently excited about it, you could try submitting
a patch and see what happens.

            regards, tom lane



Re: BUG #16277: xmlelement allows invalid XML characters when XMLversion is set to 1.0

From
Andreas Lennartsson
Date:
Thanks for the feedback. I get your point about backwards compatibility. Maybe update the documentation to make it clear what is going on?

On Tue, Feb 25, 2020 at 5:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andreas Lennartsson <andreas@apkudo.com> writes:
>> On what grounds do you call it invalid?

> Based on the valid control characters for XML 1.0
> https://en.wikipedia.org/wiki/Valid_characters_in_XML

Hm.  According to that, C0 control characters *are* legal in XML 1.1,
which would mean that to do this strictly correctly we'd have to
understand the differences between different XML versions, which we
don't --- and, as best I can tell in some quick testing, libxml2
doesn't either.  At least, it will happily take random values for the
document version.

xmlroot() just wraps the given XML text in a new outer <xml> declaration,
without any regard for whether the new version number allows or disallows
things that the possibly-implicit version would've allowed before.  That
seems of a piece with the generally cavalier treatment of the version
in the rest of xml.c, though.

TBH, it's unlikely that anyone is going to care about this enough
to fix it, even if you could get consensus that making the code
more strict was a good idea.  (Backwards compatibility would argue
against that, so I'm not sure such consensus would be easy to get.)
But if you're sufficiently excited about it, you could try submitting
a patch and see what happens.

                        regards, tom lane