Home > mailing lists

Re: [PATCH] Add CANONICAL option to xmlserialize - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: [PATCH] Add CANONICAL option to xmlserialize
Date	August 30 00:50:33
Msg-id	CAFj8pRDrgOoJzxxOAswGcr7E+JZ-1SOoX+Oy3_RTPV=Jg4YGHw@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Add CANONICAL option to xmlserialize (Jim Jones <jim.jones@uni-muenster.de>)
Responses	Re: [PATCH] Add CANONICAL option to xmlserialize
List	pgsql-hackers

Tree view

út 27. 8. 2024 v 13:57 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 26.08.24 16:59, Pavel Stehule wrote:
>
> 1. what about behaviour of NO INDENT - the implementation is not too
> old, so it can be changed if we want (I think), and it is better to do
> early than too late

While checking the feasibility of removing indentation with NO INDENT I
may have found a bug in XMLSERIALIZE ... INDENT.
xmlSaveToBuffer seems to ignore elements if there are whitespaces
between them:

SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>' AS text INDENT);
xmlserialize
-----------------
<foo>          +
   <bar>42</bar>+
</foo>         +

(1 row)

SELECT xmlserialize(DOCUMENT '<foo> <bar>42</bar> </foo>'::xml AS text
INDENT);
        xmlserialize
----------------------------
<foo> <bar>42</bar> </foo>+

(1 row)

I'll take a look at it.

Regarding removing indentation: yes, it would be possible with libxml2.
The question is if it would be right to do so.
> 2. Are we able to implement SQL/XML syntax with libxml2?
>
> 3. Are we able to implement Oracle syntax with libxml2? And there are
> benefits other than higher possible compatibility?
I guess it would be beneficial if you're migrating from oracle to
postgres - or the other way around. It certainly wouldn't hurt, but so
far I personally had little use for the oracle's extra xmlserialize
features.
>
> 4. Can there be some possible collision (functionality, syntax) with
> CANONICAL?
I couldn't find anything in the SQL/XML spec that might refer to
canonocal xml.
>
> 5. SQL/XML XMLSERIALIZE supports other target types than varchar. I
> can imagine XMLSERIALIZE with CANONICAL to bytea (then we don't need
> to force database encoding). Does it make sense? Are the results
> comparable?
|
As of pg16 bytea is not supported. Currently type| can be |character|,
|character varying|, or |text - also their other flavours like 'name'.

I know, but theoretically, there can be some benefit for CANONICAL if pg supports bytea there. Lot of databases still use non utf8 encoding.

It is a more theoretical question - if pg supports different types there in future (because SQL/XML or Oracle), then CANONICAL can be used without limit, or CANONICAL can be used just for text? And you are sure, so you can compare text X text, instead xml X xml?

+SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize;
+ ?column?
+----------
+ t
+ t
+(2 rows)

Maybe I am a little bit confused by these regress tests, because at the end it is not too useful - you compare two identical XML, and WITH COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search for a sense of this test. Better to use really different documents (columns) instead.

Regards

Pavel

|

--
Jim

pgsql-hackers by date:

From: Pavel Stehule
Date: 30 August, 00:17:06
Subject: Re: proposal: schema variables

From: Heikki Linnakangas
Date: 30 August, 00:52:06
Subject: Primary and standby setting cross-checks

Re: [PATCH] Add CANONICAL option to xmlserialize - Mailing list pgsql-hackers

Previous

Next