Re: [PATCH] Add CANONICAL option to xmlserialize - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [PATCH] Add CANONICAL option to xmlserialize
Date
Msg-id CAFj8pRDtW94U2inD3tXiF++JvL3gzYrCfqZh4hHTHovkES+Ezg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add CANONICAL option to xmlserialize  (Jim Jones <jim.jones@uni-muenster.de>)
Responses Re: [PATCH] Add CANONICAL option to xmlserialize
Re: [PATCH] Add CANONICAL option to xmlserialize
List pgsql-hackers
Hi

so 24. 8. 2024 v 7:40 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 19.06.24 10:59, Jim Jones wrote:
> On 09.02.24 14:19, Jim Jones wrote:
>> v9 attached with rebase due to changes done to primnodes.h in 615f5f6
>>
> v10 attached with rebase due to changes in primnodes, parsenodes.h, and
> gram.y
>
v11 attached with rebase due to changes in xml.c

I try to check this patch

There is unwanted white space in the patch

-<-><--><-->xmlFreeDoc(doc);
+<->else if (format == XMLSERIALIZE_CANONICAL || format == XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS)
+ <>{
+<-><-->xmlChar    *xmlbuf = NULL;
+<-><-->int         nbytes;
+<-><-->int    

1. the xml is serialized to UTF8 string every time, but when target type is varchar or text, then it should be every time encoded to database encoding. Is not possible to hold utf8 string in latin2 database varchar.

2. The proposed feature can increase some confusion in implementation of NO IDENT. I am not an expert on this area, so I checked other databases. DB2 does not have anything similar. But Oracle's "NO IDENT" clause is very similar to the proposed "CANONICAL". Unfortunately, there is different behaviour of NO IDENT - Oracle's really removes formatting, Postgres does nothing.

Regards

Pavel



--
Jim

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Non-trivial condition is only propagated to one side of JOIN
Next
From: Alexander Lakhin
Date:
Subject: Re: type cache cleanup improvements