On 27.02.23 14:16, I wrote:
> Hi,
>
> In order to compare pairs of XML documents for equivalence it is
> necessary to convert them first to their canonical form, as described
> at W3C Canonical XML 1.1.[1] This spec basically defines a standard
> physical representation of xml documents that have more then one
> possible representation, so that it is possible to compare them, e.g.
> forcing UTF-8 encoding, entity reference replacement, attributes
> normalization, etc.
>
> Although it is not part of the XML/SQL standard, it would be nice to
> have the option CANONICAL in xmlserialize. Additionally, we could also
> add the attribute WITH [NO] COMMENTS to keep or remove xml comments
> from the documents.
>
> Something like this:
>
> WITH t(col) AS (
> VALUES
> ('<?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE doc SYSTEM "doc.dtd" [
> <!ENTITY val "42">
> <!ATTLIST xyz attr CDATA "default">
> ]>
>
> <!-- ordering of attributes -->
> <foo ns:c = "3" ns:b = "2" ns:a = "1"
> xmlns:ns="http://postgresql.org">
>
> <!-- Normalization of whitespace in start and end tags -->
> <!-- Elimination of superfluous namespace declarations,
> as already declared in <foo> -->
> <bar xmlns:ns="http://postgresql.org" >&val;</bar >
>
> <!-- Empty element conversion to start-end tag pair -->
> <empty/>
>
> <!-- Effect of transcoding from a sample encoding to UTF-8 -->
> <iso8859>©</iso8859>
>
> <!-- Addition of default attribute -->
> <!-- Whitespace inside tag preserved -->
> <xyz> 321 </xyz>
> </foo>
> <!-- comment outside doc -->'::xml)
> )
> SELECT xmlserialize(DOCUMENT col AS text CANONICAL) FROM t;
> xmlserialize
>
--------------------------------------------------------------------------------------------------------------------------------------------------------
>
> <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2"
> ns:c="3"><bar>42</bar><empty></empty><iso8859>©</iso8859><xyz
> attr="default"> 321 </xyz></foo>
> (1 row)
>
> -- using WITH COMMENTS
>
> WITH t(col) AS (
> VALUES
> (' <foo ns:c = "3" ns:b = "2" ns:a = "1"
> xmlns:ns="http://postgresql.org">
> <!-- very important comment -->
> <xyz> 321 </xyz>
> </foo>'::xml)
> )
> SELECT xmlserialize(DOCUMENT col AS text CANONICAL WITH COMMENTS) FROM t;
> xmlserialize
>
------------------------------------------------------------------------------------------------------------------------
>
> <foo xmlns:ns="http://postgresql.org" ns:a="1" ns:b="2" ns:c="3"><!--
> very important comment --><xyz> 321 </xyz></foo>
> (1 row)
>
>
> Another option would be to simply create a new function, e.g.
> xmlcanonical(doc xml, keep_comments boolean), but I'm not sure if this
> would be the right approach.
>
> Attached a very short draft. What do you think?
>
> Best, Jim
>
> 1- https://www.w3.org/TR/xml-c14n11/
The attached version includes documentation and tests to the patch.
I hope things are clearer now :)
Best, Jim