On 06.03.23 00:32, Thomas Munro wrote:
> I couldn't reproduce that locally either, but I just tested on CI with
> your patch applied saw the failure, and then removed
> "PYTHONCOERCECLOCALE=0 LANG=C" and it's all green:
>
> https://github.com/macdice/postgres/commit/91999f5d13ac2df6f7237a301ed6cf73f2bb5b6d
>
> Without looking too closely, my first guess would have been that this
> just isn't going to work without UTF-8 database encoding, so you might
> need to skip the test (see for example
> src/test/regress/expected/unicode_1.out). It's annoying that "xml"
> already has 3 expected variants... hmm. BTW shouldn't it be failing
> in a more explicit way somewhere sooner if the database encoding is
> not UTF-8, rather than getting confused?
I guess this confusion is happening because xml_parse() was being called
with the database encoding from GetDatabaseEncoding().
I added a condition before calling xml_parse() to check if the xml
document has a different encoding than UTF-8
parse_xml_decl(xml_text2xmlChar(data), NULL, NULL, &encodingStr, NULL);
encoding = encodingStr ? xmlChar_to_encoding(encodingStr) : PG_UTF8;
doc = xml_parse(data, XMLOPTION_DOCUMENT, false, encoding, NULL);
v2 attached.
Thanks!
Best, Jim