Thread: XML test error on Arch Linux
Hello hackers, i compile postgresql just for fun. My system is a Arch Linux. I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1) test errors for xml: --- ... ok 200 + largeobject 553 ms ok 201 + with 819 ms not ok 202 + xml 1464 ms # parallel group (15 tests): hash_part predicate partition_info explain reloptions memoize compression partition_merge statspartition_split partition_aggregate tuplesort partition_join partition_prune indexing ok 203 + partition_merge 1282 ms ok 204 + partition_split 1533 ms ok 205 + partition_join 2229 ms ... ok 221 - fast_default 230 ms ok 222 - tablespace 498 ms 1..222 # 1 of 222 tests failed. # The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs". # A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out". make[1]: *** [GNUmakefile:118: check] Fehler 1 make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen make: *** [GNUmakefile:69: check] Fehler 2 --- What can I do? Who can help? Best regards, Frank
Attachment
On 2024-07-05 15:33 +0200, Frank Streitzig wrote: > My system is a Arch Linux. > I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1) > test errors for xml: > > not ok 202 + xml 1464 ms > [...snip...] > # 1 of 222 tests failed. > # The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs". > # A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out". > make[1]: *** [GNUmakefile:118: check] Fehler 1 > make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen > make: *** [GNUmakefile:69: check] Fehler 2 Hmm, I did not get this error after upgrading libxml2 on my Arch machine a couple of weeks ago. Did you just run make check after upgrading, without recompiling? Please try make clean && make && make check -- Erik
>> My system is a Arch Linux. >> I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1) >> test errors for xml: >> >> not ok 202 + xml 1464 ms >> [...snip...] >> # 1 of 222 tests failed. >> # The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs". >> # A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out". >> make[1]: *** [GNUmakefile:118: check] Fehler 1 >> make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen >> make: *** [GNUmakefile:69: check] Fehler 2 > > Hmm, I did not get this error after upgrading libxml2 on my Arch machine > a couple of weeks ago. Did you just run make check after upgrading, > without recompiling? Please try make clean && make && make check > Hi, i get same error. Frank
On 2024-07-06 11:57 +0200, Frank Streitzig wrote: > >> My system is a Arch Linux. > >> I get after upgrade the libxml2 package (from 2.12.7-1 to 2.13.1-1) > >> test errors for xml: > >> > >> not ok 202 + xml 1464 ms > >> [...snip...] > >> # 1 of 222 tests failed. > >> # The differences that caused some tests to fail can be viewed in the file "/home/frastr/devel/postgresql/src/test/regress/regression.diffs". > >> # A copy of the test summary that you see above is saved in the file "/home/frastr/devel/postgresql/src/test/regress/regression.out". > >> make[1]: *** [GNUmakefile:118: check] Fehler 1 > >> make[1]: Verzeichnis „/home/frastr/devel/postgresql/src/test/regress“ wird verlassen > >> make: *** [GNUmakefile:69: check] Fehler 2 > > > > Hmm, I did not get this error after upgrading libxml2 on my Arch machine > > a couple of weeks ago. Did you just run make check after upgrading, > > without recompiling? Please try make clean && make && make check > > i get same error. Ah! I forgot to run ./configure --with-libxml. I thought it was enabled by default if libxml2 is available. Now I get the same errors. Also with 2.13.2-1 which is currently still in core-testing. So, there must be breaking changes in 2.13.0: https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0 Maybe you can downgrade in the meantime, if you still have 2.12.7 in the cache: pacman -U /var/cache/pacman/pkg/libxml2-2.12.7-1-x86_64.pkg.tar.zst -- Erik
Erik Wienhold <ewie@ewie.name> writes: > So, there must be breaking changes in 2.13.0: > https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0 Yeah, apparently --- I get what look like the same diffs with libxml2 2.13.0 recently supplied by MacPorts. Grumble. Somebody's going to have to look into that. regards, tom lane
On 2024-07-06 16:25 +0200, Tom Lane wrote: > Erik Wienhold <ewie@ewie.name> writes: > > So, there must be breaking changes in 2.13.0: > > https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.13.0 > > Yeah, apparently --- I get what look like the same diffs with > libxml2 2.13.0 recently supplied by MacPorts. Grumble. > Somebody's going to have to look into that. Here's a patch that fixes just the xmlserialize and namespace errors. Use xmlAddChildList instead of xmlAddChild for xmlserialize. That also works with 2.12.7, but I don't know about older libxml2 versions. Maybe add a version check to be safe: #if LIBXML_VERSION >= 21300 xmlAddChildList(root, content_nodes); #else xmlAddChild(root, content_nodes); #endif I don't know if using xmlAddChild in this context was ever correct. The namespace errors are tricky because xmlParseBalancedChunkMemory now returns res_code != 0 for invalid or unknown namespaces (probably other errors as well). So I just added an additional check to ignore those errors for >=2.13. But that's rather hackish. I don't know how to handle it in xml_errorHandler where those error codes are already dealt with in order to compensate for differences in error reporting across different libxml2 versions. Looks like xmlerrcxt is ignored by xmlParseBalancedChunkMemory. No idea how to deal with the remaining errors for invalid and undefined entities which appear to include less details now. That seems to be expected, judging from the release notes: > A few error messages were improved and consolidated. Please update > downstream test suites accordingly. How to deal with that in a manner that still works for pre-2.13, other than filtering out those details that are no longer included in 2.13? Or just \set VERBOSITY terse for those few test cases? But that omits the entire error detail. -- Erik
Attachment
Erik Wienhold <ewie@ewie.name> writes: > On 2024-07-06 16:25 +0200, Tom Lane wrote: >> Yeah, apparently --- I get what look like the same diffs with >> libxml2 2.13.0 recently supplied by MacPorts. Grumble. >> Somebody's going to have to look into that. > Here's a patch that fixes just the xmlserialize and namespace errors. One angle that ought to be considered is that some of this stuff may be flat-out bugs in 2.13.0. I see at https://gitlab.gnome.org/GNOME/libxml2/-/releases that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0. I'm disinclined to spend much effort on working around dot-zero bugs. > Use xmlAddChildList instead of xmlAddChild for xmlserialize. That also > works with 2.12.7, but I don't know about older libxml2 versions. Maybe > ... > I don't know if using xmlAddChild in this context was ever correct. Good question. A look at https://github.com/ConradIrwin/libxml2/blame/master/include/libxml/tree.h shows that xmlAddChildList has been there for decades, and I also see in the 2.13.0 notes * tree: Align xmlAddChild with other node insertion functions so it sort of looks like this is something that we got away with but it wasn't right. I'm inclined to just replace the call and see what the buildfarm says. > The namespace errors are tricky because xmlParseBalancedChunkMemory now > returns res_code != 0 for invalid or unknown namespaces (probably other > errors as well). So I just added an additional check to ignore those > errors for >=2.13. But that's rather hackish. I don't know how to > handle it in xml_errorHandler where those error codes are already dealt > with in order to compensate for differences in error reporting across > different libxml2 versions. Looks like xmlerrcxt is ignored by > xmlParseBalancedChunkMemory. 2.13.0 mentions having added some new error-handler-setting functions; maybe we need to use one of those to get xmlParseBalancedChunkMemory's attention now? > No idea how to deal with the remaining errors for invalid and undefined > entities which appear to include less details now. That's fairly annoying. One answer is to create a variant expected-file, but the long-term maintenance costs could be high. On the other hand, our xml support is a bit of a backwater, so maybe it wouldn't be that bad. regards, tom lane
I wrote: > One angle that ought to be considered is that some of this stuff may > be flat-out bugs in 2.13.0. I see at > https://gitlab.gnome.org/GNOME/libxml2/-/releases > that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0. > I'm disinclined to spend much effort on working around dot-zero bugs. Meh ... 2.13.1, at least, changes nothing here. regards, tom lane
On 2024-07-06 20:43 +0200, Tom Lane wrote: > One angle that ought to be considered is that some of this stuff may > be flat-out bugs in 2.13.0. I see at > > https://gitlab.gnome.org/GNOME/libxml2/-/releases > > that both 2.13.1 and 2.13.2 contain fixes for "regressions" in 2.13.0. > I'm disinclined to spend much effort on working around dot-zero bugs. Found an open issue about ABI compatibility that affects 2.12.7 and possibly also 2.13: https://gitlab.gnome.org/GNOME/libxml2/-/issues/751. Maybe just wait this one out, in case they'll bump SONAME. > 2.13.0 mentions having added some new error-handler-setting functions; > maybe we need to use one of those to get xmlParseBalancedChunkMemory's > attention now? I tried xmlCtxtSetErrorHandler but xmlParseBalancedChunkMemory doesn't report to that. It's a known issue: https://gitlab.gnome.org/GNOME/libxml2/-/issues/727 -- Erik
Erik Wienhold <ewie@ewie.name> writes: > On 2024-07-06 20:43 +0200, Tom Lane wrote: >> I'm disinclined to spend much effort on working around dot-zero bugs. > Found an open issue about ABI compatibility that affects 2.12.7 and > possibly also 2.13: https://gitlab.gnome.org/GNOME/libxml2/-/issues/751. > Maybe just wait this one out, in case they'll bump SONAME. Whether they bump SONAME or put back the removed functions, there's no way that production-oriented distros are going to ship 2.13.x until something's done about that. That's good news for us, in that it gives us an excuse to not work around any dot-zero issues that get fixed before that point. But we'd better analyze our concerns and make sure anything we don't want to work around gets fixed by then. > I tried xmlCtxtSetErrorHandler but xmlParseBalancedChunkMemory doesn't > report to that. It's a known issue: > https://gitlab.gnome.org/GNOME/libxml2/-/issues/727 I saw that one. It would be good to have a replacement for xmlParseBalancedChunkMemory, because after looking at the libxml2 sources I realize that that's classed as a SAX1 function, which means it will likely go away at some point (maybe it's already not there in some builds). That's a long-term consideration though. In the short term, I dug in the libxml2 sources and found that xmlParseBalancedChunkMemory got pretty heavily refactored between 2.12 and 2.13. The reason for our immediate problem is that the return value in the old code is defined by if (!ctxt->wellFormed) { if (ctxt->errNo == 0) ret = 1; else ret = ctxt->errNo; } else { ret = 0; } whereas the new code just returns ctxt->errNo unconditionally. This does not agree with the phraseology in libxml2's documentation: 0 if the chunk is well balanced, -1 in case of args problem and the parser error code otherwise so I filed an issue at https://gitlab.gnome.org/GNOME/libxml2/-/issues/765 We'll see what they say about that. I think we could work around it as attached. This relies on seeing that the 2.13 code will return a node list if and only if ctxt->wellFormed is true (and we already eliminated the empty-input case, so an empty node list shouldn't happen). But it's not a lot less ugly than your proposal. Also, this only fixes the two wrong-output-from-xmlserialize test cases. I'm not so stressed about the cases where the errdetail changes, but I think we need to find an answer for the places where it fails and didn't before, like: SELECT xmlparse(content '<invalidns xmlns=''<''/>'); - xmlparse ---------------------------- - <invalidns xmlns='<'/> -(1 row) - +ERROR: invalid XML content regards, tom lane diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index d75f765de0..d5fa83e736 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -1840,6 +1840,18 @@ xml_parse(text *data, XmlOptionType xmloption_arg, res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string + count, parsed_nodes); + + /* + * libxml2 2.13.x incorrectly returns parser errors even if + * the chunk is well-balanced. As a workaround, ignore the + * return code if a node list was returned. + * https://gitlab.gnome.org/GNOME/libxml2/-/issues/765 + */ +#if LIBXML_VERSION >= 21300 + if (res_code > 0 && parsed_nodes != NULL) + res_code = 0; +#endif + if (res_code != 0 || xmlerrcxt->err_occurred) { xml_errsave(escontext, xmlerrcxt,
I wrote: > I think we could work around it as attached. This relies on seeing > that the 2.13 code will return a node list if and only if > ctxt->wellFormed is true (and we already eliminated the empty-input > case, so an empty node list shouldn't happen). But it's not a lot > less ugly than your proposal. > Also, this only fixes the two wrong-output-from-xmlserialize > test cases. I'm not so stressed about the cases where the errdetail > changes, but I think we need to find an answer for the places where > it fails and didn't before, like: > SELECT xmlparse(content '<invalidns xmlns=''<''/>'); > - xmlparse > ---------------------------- > - <invalidns xmlns='<'/> > -(1 row) > - > +ERROR: invalid XML content Oh! That's actually the same bug, and my patch was faulty because I didn't think about the case where the caller of xml_parse passes parsed_nodes = NULL. (And it wasn't doing the right thing in the other case either :-(.) The attached works significantly better, and cleans up these bogus errors. We're still left with missing "chunk is not well balanced" errcontext entries, which we could live without if we have to, but I wonder why those are not there. regards, tom lane diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index d75f765de0..311de91110 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -1837,16 +1837,36 @@ xml_parse(text *data, XmlOptionType xmloption_arg, /* allow empty content */ if (*(utf8string + count)) { + xmlNodePtr node_list = NULL; + res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, utf8string + count, - parsed_nodes); + &node_list); + + /* + * libxml2 2.13.x incorrectly returns parser errors even if + * the chunk is well-balanced. As a workaround, ignore the + * return code if a node list was returned. + * https://gitlab.gnome.org/GNOME/libxml2/-/issues/765 + */ +#if LIBXML_VERSION >= 21300 + if (res_code > 0 && node_list != NULL) + res_code = 0; +#endif + if (res_code != 0 || xmlerrcxt->err_occurred) { + xmlFreeNodeList(node_list); xml_errsave(escontext, xmlerrcxt, ERRCODE_INVALID_XML_CONTENT, "invalid XML content"); goto fail; } + + if (parsed_nodes != NULL) + *parsed_nodes = node_list; + else + xmlFreeNodeList(node_list); } }
I wrote: > I saw that one. It would be good to have a replacement for > xmlParseBalancedChunkMemory, because after looking at the libxml2 > sources I realize that that's classed as a SAX1 function, which means > it will likely go away at some point (maybe it's already not there in > some builds). That's a long-term consideration though. Actually ... after nosing around in libxml2 some more, I noticed xmlParseInNodeContext, which is the only other function specified to parse a Well Balanced Chunk. It requires a context node, but AFAICS we can just gin up a dummy root node and use that. It's existed for plenty long enough for our purposes, and it's not semi-deprecated, and it lacks the bug at hand. So I'm now thinking about the attached. As far as the errcontext changes go: I think we have to just bite the bullet and accept them. It looks like 2.13 has a completely different mechanism than prior versions for deciding when to issue XML_ERR_NOT_WELL_BALANCED. And it's not even clear that it's wrong; for example, in our first failing case DETAIL: line 1: xmlParseEntityRef: no name <invalidentity>&</invalidentity> ^ -line 1: chunk is not well balanced -<invalidentity>&</invalidentity> - ^ it's kind of hard to argue that the chunk isn't well-balanced. So we can either suppress errdetails from the expected output, or set up an additional expected-file. I'm leaning to the "\set VERBOSITY terse" solution. regards, tom lane diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index d75f765de0..4a5517fd75 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -1822,6 +1822,8 @@ xml_parse(text *data, XmlOptionType xmloption_arg, } else { + xmlNodePtr root; + doc = xmlNewDoc(version); if (doc == NULL || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, @@ -1834,19 +1836,39 @@ xml_parse(text *data, XmlOptionType xmloption_arg, "could not allocate XML document"); doc->standalone = standalone; + root = xmlNewNode(NULL, (const xmlChar *) "content-root"); + if (root == NULL || xmlerrcxt->err_occurred) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + "could not allocate xml node"); + /* This attaches root to doc, so we need not free it separately. */ + xmlDocSetRootElement(doc, root); + /* allow empty content */ if (*(utf8string + count)) { - res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, - utf8string + count, - parsed_nodes); - if (res_code != 0 || xmlerrcxt->err_occurred) + xmlNodePtr node_list = NULL; + xmlParserErrors res; + + res = xmlParseInNodeContext(root, + (char *) utf8string + count, + strlen((char *) utf8string + count), + XML_PARSE_NOENT | XML_PARSE_DTDATTR + | (preserve_whitespace ? 0 : XML_PARSE_NOBLANKS), + &node_list); + + if (res != XML_ERR_OK || xmlerrcxt->err_occurred) { + xmlFreeNodeList(node_list); xml_errsave(escontext, xmlerrcxt, ERRCODE_INVALID_XML_CONTENT, "invalid XML content"); goto fail; } + + if (parsed_nodes != NULL) + *parsed_nodes = node_list; + else + xmlFreeNodeList(node_list); } }
On 2024-07-07 22:43 +0200, Tom Lane wrote: > As far as the errcontext changes go: I think we have to just bite > the bullet and accept them. It looks like 2.13 has a completely > different mechanism than prior versions for deciding when to issue > XML_ERR_NOT_WELL_BALANCED. And it's not even clear that it's wrong; > for example, in our first failing case > > DETAIL: line 1: xmlParseEntityRef: no name > <invalidentity>&</invalidentity> > ^ > -line 1: chunk is not well balanced > -<invalidentity>&</invalidentity> > - ^ > > it's kind of hard to argue that the chunk isn't well-balanced. > > So we can either suppress errdetails from the expected output, > or set up an additional expected-file. I'm leaning to the > "\set VERBOSITY terse" solution. +1 for \set VERBOSITY terse as a last resort. But it looks to me as if "chunk is not well balanced" is just noise because libxml2 reports more specific errors before that. For example: SELECT xmlparse(content '<twoerrors>&idontexist;</unbalanced>'); ERROR: invalid XML content DETAIL: line 1: Entity 'idontexist' not defined <twoerrors>&idontexist;</unbalanced> ^ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced <twoerrors>&idontexist;</unbalanced> ^ line 1: chunk is not well balanced <twoerrors>&idontexist;</unbalanced> ^ Here, "Opening and ending tag mismatch" already covers the unbalanced closing tag. So how about just ignoring XML_ERR_NOT_WELL_BALANCED like in the attached? This also adds test cases for an unclosed tag because I wanted to see if I can trigger just "chunk is not well balanced", but without success. SELECT xmlparse(content '<unclosed>'); ERROR: invalid XML content DETAIL: line 1: Premature end of data in tag unclosed line 1 <unclosed> ^ line 1: chunk is not well balanced <unclosed> ^ libxml2 2.13 doesn't report "chunk ..." here either. There's also this more explicit test case for unbalanced tags: <parent><child></parent></child> But I'm not sure if that's really necessary if we already have: <twoerrors>&idontexist;</unbalanced> The error messages are the same, except for the additional entity error. -- Erik
Attachment
Erik Wienhold <ewie@ewie.name> writes: > So how about just ignoring XML_ERR_NOT_WELL_BALANCED like in the > attached? Oh, that's a good idea. Given that we know they've changed the behavior around this at least once, I'm not sure that it's safe to unconditionally ignore this error --- but we could ignore it as long as we've already recorded some libxml2 error. I dug through the 2.13 sources and I think that in that version, it is actually impossible to get XML_ERR_NOT_WELL_BALANCED without any prior error. It's only issued if parsing stops short of the end of input, and that only happens if PARSER_STOPPED() becomes true, and that only happens if xmlHaltParser() is called, and all the calls to that seem to follow other errors being issued. But the 2.12 code looks quite different and I'm not sure that there's no such code path there; much less that it can't happen in older versions. > This also adds test cases for an unclosed tag because I > wanted to see if I can trigger just "chunk is not well balanced", but > without success. No particular objection to adding these. > But I'm not sure if that's really necessary if we already have: > <twoerrors>&idontexist;</unbalanced> > The error messages are the same, except for the additional entity error. I think the point of that test is different: it's showing that we actually can report multiple errors out of a single libxml2 parsing call. Before seeing your message I was thinking we'd have to "\set VERBOSITY terse" that test, which was annoying me mightily because it could no longer prove any such thing. I've updated indri's host to current MacPorts including libxml2 2.13.1, and as expected it's now showing failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2024-07-09%2018%3A17%3A23 I'll push this change in a little bit (still gotta write commit message) and indri should go back to green. Unless one of the other animals complains, I'll set about back-patching in a day or two. regards, tom lane
I wrote: > I'll push this change in a little bit (still gotta write commit > message) and indri should go back to green. Unless one of the > other animals complains, I'll set about back-patching in a > day or two. Well, that didn't take long: several animals are reporting different error text for one or both of those new test cases. It looks like I did indeed guess wrong about the output for the libxml2 versions that match xml_2.out; but more interestingly we have https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=widowbird&dt=2024-07-09%2019%3A30%3A07 @@ -287,7 +287,7 @@ SELECT xmlparse(content '<unclosed>'); ERROR: invalid XML content -DETAIL: line 1: Premature end of data in tag unclosed line 1 +DETAIL: line 1: chunk is not well balanced <unclosed> ^ SELECT xmlparse(content '<parent><child></parent></child>'); @@ -358,7 +358,7 @@ SELECT xmlparse(document '<unclosed>'); ERROR: invalid XML document -DETAIL: line 1: Premature end of data in tag unclosed line 1 +DETAIL: line 1: EndTag: '</' not found <unclosed> ^ SELECT xmlparse(document '<parent><child></parent></child>'); which proves that whatever version widowbird is running is indeed capable of emitting XML_ERR_NOT_WELL_BALANCED with no other error. So my instinct to not suppress that unconditionally was right. At the moment I'm thinking that we should just remove those new test cases again. They are not valuable enough to justify a new variant expected-file, while suppressing the errdetail would remove whatever value they do have. regards, tom lane
On 2024-07-09 21:53 +0200, Tom Lane wrote: > Well, that didn't take long: several animals are reporting > different error text for one or both of those new test cases. > [...snip...] > At the moment I'm thinking that we should just remove those > new test cases again. +1 -- Erik