Re: [PATCH] Add CANONICAL option to xmlserialize - Mailing list pgsql-hackers

From Jim Jones
Subject Re: [PATCH] Add CANONICAL option to xmlserialize
Date
Msg-id 48c21bef-11be-b7dd-3aa1-308c89882ff8@uni-muenster.de
Whole thread Raw
In response to Re: [PATCH] Add CANONICAL option to xmlserialize  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: [PATCH] Add CANONICAL option to xmlserialize  (Jim Jones <jim.jones@uni-muenster.de>)
List pgsql-hackers
On 06.03.23 00:32, Thomas Munro wrote:
> I couldn't reproduce that locally either, but I just tested on CI with
> your patch applied  saw the failure, and then removed
> "PYTHONCOERCECLOCALE=0 LANG=C" and it's all green:
>
> https://github.com/macdice/postgres/commit/91999f5d13ac2df6f7237a301ed6cf73f2bb5b6d
>
> Without looking too closely, my first guess would have been that this
> just isn't going to work without UTF-8 database encoding, so you might
> need to skip the test (see for example
> src/test/regress/expected/unicode_1.out).  It's annoying that "xml"
> already has 3 expected variants... hmm.  BTW shouldn't it be failing
> in a more explicit way somewhere sooner if the database encoding is
> not UTF-8, rather than getting confused?

I guess this confusion is happening because xml_parse() was being called 
with the database encoding from GetDatabaseEncoding().

I added a condition before calling xml_parse() to check if the xml 
document has a different encoding than UTF-8

parse_xml_decl(xml_text2xmlChar(data), NULL, NULL, &encodingStr, NULL);
encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8;

doc = xml_parse(data, XMLOPTION_DOCUMENT, false, encoding, NULL);

v2 attached.

Thanks!

Best, Jim

Attachment

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Dag Lem
Date:
Subject: Re: daitch_mokotoff module