Thread: [PATCH] Add pretty-printed XML output option
Hi, This small patch introduces a XML pretty print function. It basically takes advantage of the indentation feature of xmlDocDumpFormatMemory from libxml2 to format XML strings. postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var id="z">42</var></bar></foo>'); xmlpretty -------------------------- <foo id="x"> + <bar id="y"> + <var id="z">42</var>+ </bar> + </foo> + (1 row) The patch also contains regression tests and documentation. Feedback is very welcome! Jim
Attachment
The system somehow returns different error messages in Linux and MacOS/Windows, which was causing the cfbot to fail. SELECT xmlpretty('<foo>')::xml; ^ -DETAIL: line 1: chunk is not well balanced +DETAIL: line 1: Premature end of data in tag foo line 1 Test removed in v2. On 02.02.23 21:35, Jim Jones wrote: > Hi, > > This small patch introduces a XML pretty print function. It basically > takes advantage of the indentation feature of xmlDocDumpFormatMemory > from libxml2 to format XML strings. > > postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var > id="z">42</var></bar></foo>'); > xmlpretty > -------------------------- > <foo id="x"> + > <bar id="y"> + > <var id="z">42</var>+ > </bar> + > </foo> + > > (1 row) > > > The patch also contains regression tests and documentation. > > Feedback is very welcome! > > Jim
Attachment
Hi, The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing the regression tests with the error: ERROR: unsupported XML feature DETAIL: This functionality requires the server to be built with libxml support. Is there anything I can do to enable libxml to run my regression tests? v3 adds a missing xmlFree call. Best, Jim
Attachment
Jim Jones <jim.jones@uni-muenster.de> writes: > The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing > the regression tests with the error: > ERROR: unsupported XML feature > DETAIL: This functionality requires the server to be built with libxml > support. > Is there anything I can do to enable libxml to run my regression tests? Your patch needs to also update expected/xml_1.out to match the output the test produces when run without --with-libxml. regards, tom lane
On 06.02.23 17:23, Tom Lane wrote: >> Your patch needs to also update expected/xml_1.out to match the output >> the test produces when run without --with-libxml. Thanks a lot! It seems to do the trick. xml_1.out updated in v4. Best, Jim
Attachment
while working on another item of the TODO list I realized that I should be using a PG_TRY() block in he xmlDocDumpFormatMemory call. Fixed in v5. Best regards, Jim
Attachment
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > > while working on another item of the TODO list I realized that I should > be using a PG_TRY() block in he xmlDocDumpFormatMemory call. > > Fixed in v5. > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the other xmlFreeDoc(doc) is not. As the doc is assigned outside the PG_TRY shouldn't those both be the same? ------ Kind Regards, Peter Smith. Fujitsu Australia.
On 09.02.23 00:09, Peter Smith wrote: > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the > other xmlFreeDoc(doc) is not. As the doc is assigned outside the > PG_TRY shouldn't those both be the same? Hi Peter, My logic there was the following: if program reached that part of the code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, which consequently means that the variables doc and xmlbuf are != NULL, therefore not needing to be checked. Am I missing something? Thanks a lot for the review! Best, Jim
Attachment
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > > On 09.02.23 00:09, Peter Smith wrote: > > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the > > other xmlFreeDoc(doc) is not. As the doc is assigned outside the > > PG_TRY shouldn't those both be the same? > > Hi Peter, > > My logic there was the following: if program reached that part of the > code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, > which consequently means that the variables doc and xmlbuf are != NULL, > therefore not needing to be checked. Am I missing something? > Thanks. I think I understand it better now -- I expect xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this source [1]), but it will return nbytes of 0, but your code will still throw ERROR, meaning the guard for doc NULL is necessary for the PG_CATCH. In that case, everything LGTM. ~ OTOH, if you are having to check for NULL doc anyway, maybe it's just as easy only doing that up-front. Then you could quick-exit the function without calling xmlDocDumpFormatMemory etc. in the first place. For example: doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); if (!doc) return 0; ------ Kind Regards, Peter Smith. Fujitsu Australia.
On 09.02.23 02:01, Peter Smith wrote: > OTOH, if you are having to check for NULL doc anyway, maybe it's just > as easy only doing that up-front. Then you could quick-exit the > function without calling xmlDocDumpFormatMemory etc. in the first > place. For example: > > doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); > if (!doc) > return 0; I see your point. If I got it right, you're suggesting the following change in the PG_TRY(); PG_TRY(); { int nbytes; if(!doc) xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR, "could not parse the given XML document"); xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); if(!nbytes || xmlerrcxt->err_occurred) xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR, "could not indent the given XML document"); initStringInfo(&buf); appendStringInfoString(&buf, (const char *)xmlbuf); } .. which will catch the doc == NULL before calling xmlDocDumpFormatMemory. Is it what you suggest? Thanks a lot for the thorough review! Best, Jim
Jim Jones <jim.jones@uni-muenster.de> writes: > I see your point. If I got it right, you're suggesting the following > change in the PG_TRY(); > PG_TRY(); > { > int nbytes; > if(!doc) > xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR, > "could not parse the given XML document"); > xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); > if(!nbytes || xmlerrcxt->err_occurred) > xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR, > "could not indent the given XML document"); > initStringInfo(&buf); > appendStringInfoString(&buf, (const char *)xmlbuf); > } > .. which will catch the doc == NULL before calling xmlDocDumpFormatMemory. Um ... why are you using PG_TRY here at all? It seems like you have full control of the actually likely error cases. The only plausible error out of the StringInfo calls is OOM, and you probably don't want to trap that at all. regards, tom lane
On 09.02.23 08:23, Tom Lane wrote: > Um ... why are you using PG_TRY here at all? It seems like > you have full control of the actually likely error cases. > The only plausible error out of the StringInfo calls is OOM, > and you probably don't want to trap that at all. My intention was to catch any unexpected error from xmlDocDumpFormatMemory and handle it properly. But I guess you're right, I can control the likely error cases by checking doc and nbytes. You suggest something along these lines? xmlDocPtr doc; xmlChar *xmlbuf = NULL; text *arg = PG_GETARG_TEXT_PP(0); StringInfoData buf; int nbytes; doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL); if(!doc) elog(ERROR, "could not parse the given XML document"); xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); xmlFreeDoc(doc); if(!nbytes) elog(ERROR, "could not indent the given XML document"); initStringInfo(&buf); appendStringInfoString(&buf, (const char *)xmlbuf); xmlFree(xmlbuf); PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); Thanks! Best, Jim
Attachment
On 02.02.23 21:35, Jim Jones wrote: > This small patch introduces a XML pretty print function. It basically > takes advantage of the indentation feature of xmlDocDumpFormatMemory > from libxml2 to format XML strings. I suggest we call it "xmlformat", which is an established term for this.
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote: > > On 09.02.23 08:23, Tom Lane wrote: > > Um ... why are you using PG_TRY here at all? It seems like > > you have full control of the actually likely error cases. > > The only plausible error out of the StringInfo calls is OOM, > > and you probably don't want to trap that at all. > > My intention was to catch any unexpected error from > xmlDocDumpFormatMemory and handle it properly. But I guess you're right, > I can control the likely error cases by checking doc and nbytes. > > You suggest something along these lines? > > xmlDocPtr doc; > xmlChar *xmlbuf = NULL; > text *arg = PG_GETARG_TEXT_PP(0); > StringInfoData buf; > int nbytes; > > doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, > GetDatabaseEncoding(), NULL); > > if(!doc) > elog(ERROR, "could not parse the given XML document"); > > xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); > > xmlFreeDoc(doc); > > if(!nbytes) > elog(ERROR, "could not indent the given XML document"); > > initStringInfo(&buf); > appendStringInfoString(&buf, (const char *)xmlbuf); > > xmlFree(xmlbuf); > > PG_RETURN_XML_P(stringinfo_to_xmltype(&buf)); > > > Thanks! > Something like that LGTM, but here are some other minor comments. ====== 1. FYI, there are some whitespace warnings applying the v5 patch [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37: trailing whitespace. ../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41: trailing whitespace. warning: squelched 48 whitespace errors warning: 53 lines add whitespace errors. ====== src/test/regress/sql/xml.sql 2. The v5 patch was already testing NULL, but it might be good to add more tests to verify the function is behaving how you want for edge cases. For example, +-- XML pretty print: NULL, empty string, spaces only... SELECT xmlpretty(NULL); SELECT xmlpretty(''); SELECT xmlpretty(' '); ~~ 3. The function is returning XML anyway, so is the '::xml' casting in these tests necessary? e.g. SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL); ====== src/include/catalog/pg_proc.dat 4. + { oid => '4642', descr => 'Indented text from xml', + proname => 'xmlpretty', prorettype => 'xml', + proargtypes => 'xml', prosrc => 'xmlpretty' }, Spurious leading space for this new entry. ====== doc/src/sgml/func.sgml 5. + <foo id="x"> + <bar id="y"> + <var id="z">42</var> + </bar> + </foo> + +(1 row) + +]]></screen> A spurious blank line in the example after the "(1 row)" ~~~ 6. Does this function docs belong in section 9.15.1 "Producing XML Content"? Or (since it is not really producing content) should it be moved to the 9.15.3 "Processing XML" section? ------ Kind Regards, Peter Smith. Fujitsu Australia
On 10.02.23 02:10, Peter Smith wrote: > On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote: > 1. > FYI, there are some whitespace warnings applying the v5 patch > Trailing whitespaces removed. The patch applies now without warnings. > ====== > src/test/regress/sql/xml.sql > > 2. > The v5 patch was already testing NULL, but it might be good to add > more tests to verify the function is behaving how you want for edge > cases. For example, > > +-- XML pretty print: NULL, empty string, spaces only... > SELECT xmlpretty(NULL); > SELECT xmlpretty(''); > SELECT xmlpretty(' '); Test cases added. > 3. > The function is returning XML anyway, so is the '::xml' casting in > these tests necessary? > > e.g. > SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL); It is indeed not necessary. Most likely I used it for testing and forgot to remove it afterwards. Now removed. > ====== > src/include/catalog/pg_proc.dat > > 4. > > + { oid => '4642', descr => 'Indented text from xml', > + proname => 'xmlpretty', prorettype => 'xml', > + proargtypes => 'xml', prosrc => 'xmlpretty' }, > > Spurious leading space for this new entry. Removed. > > ====== > doc/src/sgml/func.sgml > > 5. > + <foo id="x"> > + <bar id="y"> > + <var id="z">42</var> > + </bar> > + </foo> > + > +(1 row) > + > +]]></screen> > > A spurious blank line in the example after the "(1 row)" Removed. > ~~~ > > 6. > Does this function docs belong in section 9.15.1 "Producing XML > Content"? Or (since it is not really producing content) should it be > moved to the 9.15.3 "Processing XML" section? Moved to the section 9.15.3 Following the suggestion of Peter Eisentraut I renamed the function to xmlformat(). v6 attached. Thanks a lot for the review. Best, Jim
Attachment
Something is misbehaving. Using the latest HEAD, and before applying the v6 patch, 'make check' is working OK. But after applying the v6 patch, some XML regression tests are failing because the DETAIL part of the message indicating the wrong syntax position is not getting displayed. Not only for your new tests -- but for other XML tests too. My ./configure looks like this: ./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug --enable-cassert --enable-tap-tests CFLAGS="-ggdb -O0 -g3 -fno-omit-frame-pointer" resulting in: checking whether to build with XML support... yes checking for libxml-2.0 >= 2.6.23... yes ~ e.g.(these are a sample of errors) xml ... FAILED 2561 ms @@ -344,8 +326,6 @@ <twoerrors>&idontexist;</unbalanced> ^ line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced -<twoerrors>&idontexist;</unbalanced> - ^ SELECT xmlparse(document '<nosuchprefix:tag/>'); xmlparse --------------------- @@ -1696,14 +1676,8 @@ SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input - -^ line 1: Document is empty - -^ -- XML format: invalid string (whitespaces) SELECT xmlformat(' '); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '<' not found - - ^ ~~ Separately (but maybe it's related?), the CF-bot test also reported a failure [1] with strange error detail differences. diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty ^ -- XML format: invalid string (whitespaces) ------ [1] https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs Kind Regards, Peter Smith. Fujitsu Australia
Something is misbehaving. Using the latest HEAD, and before applying the v6 patch, 'make check' is working OK. But after applying the v6 patch, some XML regression tests are failing because the DETAIL part of the message indicating the wrong syntax position is not getting displayed. Not only for your new tests -- but for other XML tests too.
Yes, I noticed it yesterday ... and I'm not sure how to solve it. It seems that in the system is returning a different error message in the FreeBSD patch tester, which is causing a regression test in this particular OS to fail.
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty ^ -- XML format: invalid string (whitespaces)
Does anyone know if there is anything I can do to make the error messages be consistent among different OS?
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 @@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -DETAIL: line 1: switching encoding : no input - -^ -line 1: Document is empty +DETAIL: line 1: Document is empty ^ -- XML format: invalid string (whitespaces)
I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself. I just removed the troubling test case for now
SELECT xmlformat('');
v7 attached.
Thanks for reviewing this patch!
Best, Jim
Attachment
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > > On 13.02.23 13:15, Jim Jones wrote: > > diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out > --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000 > +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000 > @@ -1695,10 +1695,7 @@ > -- XML format: empty string > SELECT xmlformat(''); > ERROR: invalid XML document > -DETAIL: line 1: switching encoding : no input > - > -^ > -line 1: Document is empty > +DETAIL: line 1: Document is empty > > ^ > -- XML format: invalid string (whitespaces) > > I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself.I just removed the troubling test case for now > > SELECT xmlformat(''); > > v7 attached. > > Thanks for reviewing this patch! > Yesterday I looked at those cfbot configs and noticed all those machines have different versions of libxml. 2.10.3 2.6.23 2.9.10 2.9.13 But I don't if version numbers have any impact on the different error details or not. ~ The thing that puzzled me most is that in MY environment (CentOS7; libxml 20901; PG --with-libxml build) I get this behaviour. - Without your v6 patch 'make check' is all OK. - With your v6 patch other XML tests (not only yours) of 'make check' failed with different error messages. - Similarly, if I keep the v6 patch but just change (in xmlformat) the #ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail, but the other XML tests are working OK again. Those results implied to me that this function code (in my environment anyway) is somehow introducing a side effect causing the *other* XML tests to fail. But so far I was unable to identify the reason. Sorry, I don't know this XML API well enough to help more. ------ Kind Regards, Peter Smith. Fujitsu Austalia.
On 14.02.23 23:45, Peter Smith wrote: > Those results implied to me that this function code (in my environment > anyway) is somehow introducing a side effect causing the *other* XML > tests to fail. I believe I've found the issue. It is probably related to the XML OPTION settings, as it seems to deliver different error messages when set to DOCUMENT or CONTENT: postgres=# SET XML OPTION CONTENT; SET postgres=# SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input ^ line 1: Document is empty ^ postgres=# SET XML OPTION DOCUMENT; SET postgres=# SELECT xmlformat(''); ERROR: invalid XML document LINE 1: SELECT xmlformat(''); ^ DETAIL: line 1: switching encoding : no input ^ line 1: Document is empty ^ v8 attached reintroduces the SELECT xmlformat('') test case and adds SET XML OPTION DOCUMENT to the regression tests. Best, Jim
Attachment
On Wed, Feb 15, 2023 at 11:05 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > > On 14.02.23 23:45, Peter Smith wrote: > > Those results implied to me that this function code (in my environment > > anyway) is somehow introducing a side effect causing the *other* XML > > tests to fail. > > I believe I've found the issue. It is probably related to the XML OPTION > settings, as it seems to deliver different error messages when set to > DOCUMENT or CONTENT: > > postgres=# SET XML OPTION CONTENT; > SET > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > postgres=# SET XML OPTION DOCUMENT; > SET > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > LINE 1: SELECT xmlformat(''); > ^ > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > > v8 attached reintroduces the SELECT xmlformat('') test case and adds SET > XML OPTION DOCUMENT to the regression tests. > With v8, in my environment, in psql I see something slightly different: test_pub=# SET XML OPTION CONTENT; SET test_pub=# SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input line 1: Document is empty test_pub=# SET XML OPTION DOCUMENT; SET test_pub=# SELECT xmlformat(''); ERROR: invalid XML document LINE 1: SELECT xmlformat(''); ^ DETAIL: line 1: switching encoding : no input line 1: Document is empty ~~ test_pub=# SET XML OPTION CONTENT; SET test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); ERROR: relation "xmltest" does not exist LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); ^ test_pub=# SET XML OPTION DOCUMENT; SET test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); ERROR: relation "xmltest" does not exist LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); ^ ~~ Because the expected extra detail stuff is missing the regression tests are still failing for me. ------ Kind Regards, Peter Smith. Fujitsu Austalia.
On 15.02.23 02:09, Peter Smith wrote: > With v8, in my environment, in psql I see something slightly different: > > > test_pub=# SET XML OPTION CONTENT; > SET > test_pub=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > line 1: Document is empty > test_pub=# SET XML OPTION DOCUMENT; > SET > test_pub=# SELECT xmlformat(''); > ERROR: invalid XML document > LINE 1: SELECT xmlformat(''); > ^ > DETAIL: line 1: switching encoding : no input > line 1: Document is empty > > ~~ > > test_pub=# SET XML OPTION CONTENT; > SET > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); > ERROR: relation "xmltest" does not exist > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > ^ > test_pub=# SET XML OPTION DOCUMENT; > SET > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); > ERROR: relation "xmltest" does not exist > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > ^ > > ~~ Yes... a cfbot also complained about the same thing. Setting the VERBOSITY to terse might solve this issue: postgres=# \set VERBOSITY terse postgres=# SELECT xmlformat(''); ERROR: invalid XML document postgres=# \set VERBOSITY default postgres=# SELECT xmlformat(''); ERROR: invalid XML document DETAIL: line 1: switching encoding : no input ^ line 1: Document is empty ^ v9 wraps the corner test cases with VERBOSITY terse to reduce the error message output. Thanks! Best, Jim
Attachment
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones <jim.jones@uni-muenster.de> wrote: > > On 15.02.23 02:09, Peter Smith wrote: > > With v8, in my environment, in psql I see something slightly different: > > > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# SELECT xmlformat(''); > > ERROR: invalid XML document > > LINE 1: SELECT xmlformat(''); > > ^ > > DETAIL: line 1: switching encoding : no input > > line 1: Document is empty > > > > ~~ > > > > test_pub=# SET XML OPTION CONTENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); > > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > > ^ > > test_pub=# SET XML OPTION DOCUMENT; > > SET > > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong'); > > ERROR: relation "xmltest" does not exist > > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong'); > > ^ > > > > ~~ > > Yes... a cfbot also complained about the same thing. > > Setting the VERBOSITY to terse might solve this issue: > > postgres=# \set VERBOSITY terse > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > > postgres=# \set VERBOSITY default > postgres=# SELECT xmlformat(''); > ERROR: invalid XML document > DETAIL: line 1: switching encoding : no input > > ^ > line 1: Document is empty > > ^ > > v9 wraps the corner test cases with VERBOSITY terse to reduce the error > message output. > Bingo!! Your v9 patch now passes all 'make check' tests for me. But I'll leave it to a committer to decide if this VERBOSITY toggle is the best fix. (I don't understand, maybe someone can explain, how the patch managed to mess verbosity of the existing tests.) ------ Kind Regards, Peter Smith. Fujitsu Austalia.
On 15.02.23 10:06, Peter Smith wrote: > Bingo!! Your v9 patch now passes all 'make check' tests for me. Nice! It also passed all tests in the patch tester. > But I'll leave it to a committer to decide if this VERBOSITY toggle is > the best fix. I see now other test cases in the xml.sql file that also use this option, so it might be a known "issue". Shall we now set the patch to "Ready for Commiter"? Thanks a lot for the review! Best, Jim
Attachment
On 2023-Feb-13, Peter Smith wrote: > Something is misbehaving. > > Using the latest HEAD, and before applying the v6 patch, 'make check' > is working OK. > > But after applying the v6 patch, some XML regression tests are failing > because the DETAIL part of the message indicating the wrong syntax > position is not getting displayed. Not only for your new tests -- but > for other XML tests too. Note that there's another file, xml_2.out, which does not contain the additional part of the DETAIL message. I suspect in Peter's case it's xml_2.out that was originally being used as a comparison basis (not xml.out), but that one is not getting patched, and ultimately the diff is reported for him against xml.out because none of them matches. An easy way forward might be to manually apply the patch from xml.out to xml_2.out, and edit it by hand to remove the additional lines. See commit 085423e3e326 for a bit of background. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 15.02.23 11:11, Alvaro Herrera wrote: > Note that there's another file, xml_2.out, which does not contain the > additional part of the DETAIL message. I suspect in Peter's case it's > xml_2.out that was originally being used as a comparison basis (not > xml.out), but that one is not getting patched, and ultimately the diff > is reported for him against xml.out because none of them matches. > > An easy way forward might be to manually apply the patch from xml.out to > xml_2.out, and edit it by hand to remove the additional lines. > > See commit 085423e3e326 for a bit of background. Hi Álvaro, As my test cases were not specifically about how the error message looks like, I thought that suppressing part of the error messages by setting VERBOSITY terse would suffice.[1] Is this approach not recommended? Thanks! Best, Jim 1 - v9 patch: https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com
Attachment
On 2023-Feb-15, Jim Jones wrote: > Hi Álvaro, > > As my test cases were not specifically about how the error message looks > like, I thought that suppressing part of the error messages by setting > VERBOSITY terse would suffice.[1] Is this approach not recommended? Well, I don't see why we would depart from what we've been doing in the rest of the XML tests. I did see that patch and I thought it was taking the wrong approach. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
On 15.02.23 12:11, Alvaro Herrera wrote: > Well, I don't see why we would depart from what we've been doing in the > rest of the XML tests. I did see that patch and I thought it was taking > the wrong approach. Fair point. v10 patches the xml.out to xml_2.out - manually removing the additional lines. Thanks for the review! Best, Jim
Attachment
Accidentally left the VERBOSE settings out -- sorry! Now it matches the approach used in a xpath test in xml.sql, xml.out, xml_1.out and xml_2.out -- Since different libxml versions emit slightly different -- error messages, we suppress the DETAIL in this test. \set VERBOSITY terse SELECT xpath('/*', '<invalidns xmlns=''<''/>'); ERROR: could not parse XML document \set VERBOSITY default v11 now correctly sets xml_2.out. Best, Jim
Attachment
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Note that there's another file, xml_2.out, which does not contain the > additional part of the DETAIL message. I suspect in Peter's case it's > xml_2.out that was originally being used as a comparison basis (not > xml.out), but that one is not getting patched, and ultimately the diff > is reported for him against xml.out because none of them matches. > See commit 085423e3e326 for a bit of background. Yeah. That's kind of sad, because it means there are still broken libxml2s out there in 2023. Nonetheless, since there are, it is not optional to fix all three expected-files. regards, tom lane
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > > Accidentally left the VERBOSE settings out -- sorry! > > Now it matches the approach used in a xpath test in xml.sql, xml.out, > xml_1.out and xml_2.out > > -- Since different libxml versions emit slightly different > -- error messages, we suppress the DETAIL in this test. > \set VERBOSITY terse > SELECT xpath('/*', '<invalidns xmlns=''<''/>'); > ERROR: could not parse XML document > \set VERBOSITY default > > v11 now correctly sets xml_2.out. > > Best, Jim Firstly, Sorry it seems like I made a mistake and was premature calling bingo above for v9. - today I repeated v9 'make check' and found it failing still. - the new xmlformat tests are OK, but some pre-existing xmlparse tests are broken. - see attached file pretty-v9-results ---- OTOH, the absence of xml_2.out from this patch appears to be the correct explanation for why my results have been differing. ---- Today I fetched and tried the latest v11. It is failing too, but only just. - see attached file pretty-v11-results It looks only due to a whitespace EOF issue in the xml_2.out @@ -1679,4 +1679,4 @@ -- XML format: empty string SELECT xmlformat(''); ERROR: invalid XML document -\set VERBOSITY default \ No newline at end of file +\set VERBOSITY default ------ The attached patch update (v12-0002) fixes the xml_2.out for me. ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
I suggest we call it "xmlformat", which is an established term for this.
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:I suggest we call it "xmlformat", which is an established term for this.Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard?Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
Hi Nikolay,
My first thought was to call it xmlpretty, to make it consistent with the jsonb equivalent "jsonb_pretty". But yes, you make a good observation .. xmlserialize seems to be a much better candidate.
I would be willing to refactor my patch if we agree on xmlserialize.
Thanks for the suggestion!
Jim
On 16.02.23 00:13, Peter Smith wrote: > Today I fetched and tried the latest v11. > > It is failing too, but only just. > - see attached file pretty-v11-results > > It looks only due to a whitespace EOF issue in the xml_2.out > > @@ -1679,4 +1679,4 @@ > -- XML format: empty string > SELECT xmlformat(''); > ERROR: invalid XML document > -\set VERBOSITY default > \ No newline at end of file > +\set VERBOSITY default > > ------ > > The attached patch update (v12-0002) fixes the xml_2.out for me. It works for me too. Thanks for the quick fix! Jim
On 16.02.23 00:13, Peter Smith wrote: > Today I fetched and tried the latest v11. > It is failing too, but only just. > - see attached file pretty-v11-results > > It looks only due to a whitespace EOF issue in the xml_2.out > > @@ -1679,4 +1679,4 @@ > -- XML format: empty string > SELECT xmlformat(''); > ERROR: invalid XML document > -\set VERBOSITY default > \ No newline at end of file > +\set VERBOSITY default > > ------ > > The attached patch update (v12-0002) fixes the xml_2.out for me. I'm squashing v12-0001 and v12-0002 (v13 attached). There is still an open discussion on renaming the function to xmlserialize,[1] but it shouldn't be too difficult to change it later in case we reach a consensus :) Thanks! Jim 1- https://www.postgresql.org/message-id/CANNMO%2BKwb4_87G8qDeN%2BVk1B1vX3HvgoGW%2B13fJ-b6rj7qbAww%40mail.gmail.com
Attachment
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim.jones@uni-muenster.de> wrote: > > I'm squashing v12-0001 and v12-0002 (v13 attached). I've looked into the patch. The code looks to conform to usual expectations. One nit: this comment should have just one asterisk. + /** And I have a dumb question: is this function protected from using external XML namespaces? What if the user passes some xmlns that will force it to read namespace data from the server filesystem? Or is it not possible? I see there are a lot of calls to xml_parse() anyway, but still... Best regards, Andrey Borodin.
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:I suggest we call it "xmlformat", which is an established term for this.Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard?Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.
After your comment I'm studying the possibility to extend the existing xmlserialize function to add the indentation feature. If so, how do you think it should look like? An extra parameter? e.g.
SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text, true);
.. or more or like Oracle does it
SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB INDENT)
FROM dual;
Thanks!
Best, Jim
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim.jones@uni-muenster.de> wrote:I've looked into the patch. The code looks to conform to usual expectations.One nit: this comment should have just one asterisk. + /**
Thanks for reviewing! Asterisk removed in v14.
And I have a dumb question: is this function protected from using external XML namespaces? What if the user passes some xmlns that will force it to read namespace data from the server filesystem? Or is it not possible? I see there are a lot of calls to xml_parse() anyway, but still...
According to the documentation,[1] such validations are not supported.
"The xml
type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema."
But I'll have a look at the code to be sure :)
Best, Jim
Attachment
After your comment I'm studying the possibility to extend the existing xmlserialize function to add the indentation feature. If so, how do you think it should look like? An extra parameter? e.g.SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text, true);
.. or more or like Oracle does it
SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB INDENT)
FROM dual;
<XML character string serialization> ::= XMLSERIALIZE <left paren> [ <document or content> ]
<XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML serialize version> ] [ <XML declaration option> ]
[ <XML serialize indent> ] <right paren>
<XML serialize indent> ::= [ NO ] INDENT
On 17.02.23 23:24, Nikolay Samokhvalov wrote: > > My idea was to follow the SQL standard (part 14, SQL/XML); > unfortunately, there is no free version, but there are drafts at > http://www.wiscorp.com/SQLStandards.html > <http://www.wiscorp.com/SQLStandards.html>. > > <XML character string serialization> ::= XMLSERIALIZE <left paren> [ > <document or content> ] > > <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML > serialize version> ] [ <XML declaration option> ] > > [ <XML serialize indent> ] <right paren> > > <XML serialize indent> ::= [ NO ] INDENT Good find. It would be better to use this standard syntax.
On 18.02.23 19:09, Peter Eisentraut wrote: > On 17.02.23 23:24, Nikolay Samokhvalov wrote: >> >> My idea was to follow the SQL standard (part 14, SQL/XML); >> unfortunately, there is no free version, but there are drafts at >> http://www.wiscorp.com/SQLStandards.html >> <http://www.wiscorp.com/SQLStandards.html>. >> >> <XML character string serialization> ::= XMLSERIALIZE <left paren> [ >> <document or content> ] >> >> <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML >> serialize version> ] [ <XML declaration option> ] >> >> [ <XML serialize indent> ] <right paren> >> >> <XML serialize indent> ::= [ NO ] INDENT > > Good find. It would be better to use this standard syntax. As suggested by Peter and Nikolay, v15 now removes the xmlformat function from the catalog and adds the [NO] INDENT option to xmlserialize, as described in X069. postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); xmlserialize ---------------------------------------- <?xml version="1.0" encoding="UTF-8"?>+ <foo> + <bar> + <val x="y">42</val> + </bar> + </foo> + (1 row) postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); xmlserialize ------------------------------------------- <foo><bar><val x="y">42</val></bar></foo> (1 row) Although the indent feature is designed to work with xml strings of type DOCUMENT, this implementation also allows the usage of CONTENT type strings as long as it contains a well-formed xml. It will throw an error otherwise. Thanks! Best, Jim
Attachment
As suggested by Peter and Nikolay, v15 now removes the xmlformat
function from the catalog and adds the [NO] INDENT option to
xmlserialize, as described in X069.\
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml> +
<more>13</more> +
</xml> +
(1 row)
gitpod=# select xmlserialize(document '<xml>text<more>13</more></xml>' as text indent);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml>text<more>13</more></xml> +
Here are some review comments for patch v15-0001 FYI, the patch applies clean and tests OK for me. ====== doc/src/sgml/datatype.sgml 1. XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] ) ~ Another/shorter way to write that syntax is like below. For me, it is easier to read. YMMV. XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT ] ) ====== src/backend/executor/execExprInterp.c 2. ExecEvalXmlExpr @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; - + bool indent = op->d.xmlexpr.xexpr->indent; + text *data; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); Missing whitespace after the variable declarations ~~~ 3. + + data = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + if(indent) + *op->resvalue = PointerGetDatum(xmlformat(data)); + else + *op->resvalue = PointerGetDatum(data); + } Unnecessary blank line at the end. ====== src/backend/utils/adt/xml. 4. xmlformat +#else + NO_XML_SUPPORT(); +return 0; +#endif Wrong indentation (return 0) in the indentation function? ;-) ------ Kind Regards, Peter Smith. Fujitsu Australia
But is this as expected? Shouldn't it be like this:<xml>text<more>13</more></xml>?
Oracle and other parsers I know also do not work well with mixed contents.[1,2] I believe libxml2's parser does not know where to put the newline, as mixed values can contain more than one text node:
<xml>text<more>13</more> text2 text3</xml> [3]
And applying this logic the output could look like this ..
<xml>text
<more>13</more>text2 text3
</xml>
or even this
<xml>
text
<more>13</more>
text2 text3
</xml>
.. which doesn't seem right either. Perhaps a note about mixed contents in the docs would make things clearer?
Thanks for the review!
Jim
Attachment
On 22.02.23 08:20, Peter Smith wrote: > Here are some review comments for patch v15-0001 > > FYI, the patch applies clean and tests OK for me. > > ====== > doc/src/sgml/datatype.sgml > > 1. > XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> > AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] ) > > ~ > > Another/shorter way to write that syntax is like below. For me, it is > easier to read. YMMV. > > XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> > AS <replaceable>type</replaceable> [ [NO] INDENT ] ) Indeed simpler to read. > ====== > src/backend/executor/execExprInterp.c > > 2. ExecEvalXmlExpr > > @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) > { > Datum *argvalue = op->d.xmlexpr.argvalue; > bool *argnull = op->d.xmlexpr.argnull; > - > + bool indent = op->d.xmlexpr.xexpr->indent; > + text *data; > /* argument type is known to be xml */ > Assert(list_length(xexpr->args) == 1); > Missing whitespace after the variable declarations Whitespace added. > ~~~ > > 3. > + > + data = xmltotext_with_xmloption(DatumGetXmlP(value), > + xexpr->xmloption); > + if(indent) > + *op->resvalue = PointerGetDatum(xmlformat(data)); > + else > + *op->resvalue = PointerGetDatum(data); > + > } > > Unnecessary blank line at the end. blank line removed. > ====== > src/backend/utils/adt/xml. > > 4. xmlformat > > +#else > + NO_XML_SUPPORT(); > +return 0; > +#endif > > Wrong indentation (return 0) in the indentation function? ;-) indentation corrected. v16 attached. Thanks a lot! Jim
Attachment
Here are some review comments for patch v16-0001. ====== > src/backend/executor/execExprInterp.c > > 2. ExecEvalXmlExpr > > @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) > { > Datum *argvalue = op->d.xmlexpr.argvalue; > bool *argnull = op->d.xmlexpr.argnull; > - > + bool indent = op->d.xmlexpr.xexpr->indent; > + text *data; > /* argument type is known to be xml */ > Assert(list_length(xexpr->args) == 1); > Missing whitespace after the variable declarations Whitespace added. ~ Oh, I meant something different to that fix. I meant there is a missing blank line after the last ('data') variable declaration. ====== Test code. I wondered if there ought to be a test that demonstrates explicitly saying NO INDENT will give the identical result to just omitting it. For example: test=# -- no indent is default test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); ?column? ---------- t (1 row) test=# SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); ?column? ---------- t (1 row) ------ Kind Regards, Peter Smith. Fujitsu Australia
On 22.02.23 23:45, Peter Smith wrote: > src/backend/executor/execExprInterp.c >> 2. ExecEvalXmlExpr >> >> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) >> { >> Datum *argvalue = op->d.xmlexpr.argvalue; >> bool *argnull = op->d.xmlexpr.argnull; >> - >> + bool indent = op->d.xmlexpr.xexpr->indent; >> + text *data; >> /* argument type is known to be xml */ >> Assert(list_length(xexpr->args) == 1); >> Missing whitespace after the variable declarations > Whitespace added. > > ~ > > Oh, I meant something different to that fix. I meant there is a > missing blank line after the last ('data') variable declaration. I believe I see it now (it took me a while) :) > ====== > Test code. > > I wondered if there ought to be a test that demonstrates explicitly > saying NO INDENT will give the identical result to just omitting it. > > For example: > > test=# -- no indent is default > test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val > x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT > '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); > ?column? > ---------- > t > (1 row) > > test=# SELECT xmlserialize(CONTENT '<foo><bar><val > x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT > '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); > ?column? > ---------- > t > (1 row) Actually NO INDENT just ignores this feature and doesn't call the function at all, so in this particular case the result sets will always be identical. But yes, I totally agree that a test case for that is also important. v17 attached. Thanks! Best, Jim
Attachment
Here are my review comments for patch v17-0001. ====== src/test/regress/sql/xml.sql The blank line(s) which previously separated the xmlserialize tests from the xml IS [NOT] DOCUMENT tests are now missing... e.g. -- indent different encoding (returns UTF-8) SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT); SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS text INDENT); -- 'no indent' = not using 'no indent' SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); SELECT xml '<foo>bar</foo>' IS DOCUMENT; SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT; SELECT xml '<abc/>' IS NOT DOCUMENT; SELECT xml 'abc' IS NOT DOCUMENT; SELECT '<>' IS NOT DOCUMENT; ~~ Apart from that, patch v17 LGTM. ------ Kind Regards, Peter Smith. Fujitsu Australia
On 23.02.23 02:52, Peter Smith wrote: > Here are my review comments for patch v17-0001. > > ====== > src/test/regress/sql/xml.sql > > The blank line(s) which previously separated the xmlserialize tests > from the xml IS [NOT] DOCUMENT tests are now missing... v18 adds a new line in the xml.sql file to separate the xmlserialize test cases from the rest. Thanks! Best, Jim
Attachment
On 23.02.23 07:36, Jim Jones wrote: > On 23.02.23 02:52, Peter Smith wrote: >> Here are my review comments for patch v17-0001. >> >> ====== >> src/test/regress/sql/xml.sql >> >> The blank line(s) which previously separated the xmlserialize tests >> from the xml IS [NOT] DOCUMENT tests are now missing... > > v18 adds a new line in the xml.sql file to separate the xmlserialize > test cases from the rest. In kwlist.h you have PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL) but you can actually make it BARE_LABEL, which is preferable. More importantly, you need to add the new keyword to the bare_label_keyword production in gram.y. I thought we had some tooling in the build system to catch this kind of omission, but it's apparently not working right now. Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or maybe something like xmlserialize_indent()), so the association is clearer.
On 23.02.23 08:51, Peter Eisentraut wrote: > In kwlist.h you have > > PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL) > > but you can actually make it BARE_LABEL, which is preferable. > > More importantly, you need to add the new keyword to the > bare_label_keyword production in gram.y. I thought we had some > tooling in the build system to catch this kind of omission, but it's > apparently not working right now. Entry in kwlist.h changed to BARE_LABEL. > > Elsewhere, let's rename the xmlformat() C function to xmlserialize() > (or maybe something like xmlserialize_indent()), so the association is > clearer. > xmlserialize_indent sounds much better and makes the association indeed clearer. Changed in v19. v19 attached. Thanks for the review! Best, Jim
Attachment
The patch v19 LGTM. - v19 applies cleanly for me - Full clean build OK - HTML docs build and render OK - The 'make check' tests all pass for me - Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/ So, I marked it a "Ready for Committer" in the CF -- https://commitfest.postgresql.org/42/4162/ ------ Kind Regards, Peter Smith. Fujitsu Australia
While reviewing this patch, I started to wonder why we don't eliminate the maintenance hassle of xml_1.out by putting in a short-circuit at the top of the test, similar to those in some other scripts: /* skip test if XML support not compiled in */ SELECT '<value>one</value>'::xml; \if :ERROR \quit \endif (and I guess xmlmap.sql could get the same treatment). The only argument I can think of against it is that the current approach ensures we produce a clean error (and not, say, a crash) for all xml.c entry points not just xml_in. I'm not sure how much that's worth though. The compiler/linker would tell us if we miss compiling out every reference to libxml2. Thoughts? regards, tom lane
On 09.03.23 18:38, Tom Lane wrote: > While reviewing this patch, I started to wonder why we don't eliminate > the maintenance hassle of xml_1.out by putting in a short-circuit > at the top of the test, similar to those in some other scripts: > > /* skip test if XML support not compiled in */ > SELECT '<value>one</value>'::xml; > \if :ERROR > \quit > \endif > > (and I guess xmlmap.sql could get the same treatment). > > The only argument I can think of against it is that the current > approach ensures we produce a clean error (and not, say, a crash) > for all xml.c entry points not just xml_in. I'm not sure how much > that's worth though. The compiler/linker would tell us if we miss > compiling out every reference to libxml2. > > Thoughts? > > regards, tom lane Hi Tom, I agree it would make things easier and it could indeed save some time (and some CI runs ;)). However, checking in the absence of libxml2 if an error message is raised, and checking if this error message is the one we expect, is IMHO also a very nice test. But I guess I could also live with skipping the whole thing. Best, Jim
Attachment
Peter Smith <smithpb2250@gmail.com> writes: > The patch v19 LGTM. I've looked through this now, and have some minor complaints and a major one. The major one is that it doesn't work for XML that doesn't satisfy IS DOCUMENT. For example, regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document; ?column? ---------- f (1 row) regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text); xmlserialize ------------------------------------------- <bar><val x="y">42</val></bar><foo></foo> (1 row) regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent); ERROR: invalid XML document DETAIL: line 1: Extra content at the end of the document <bar><val x="y">42</val></bar><foo></foo> ^ This is not what the documentation promises, and I don't think it's good enough --- the SQL spec has no restriction saying you can't use INDENT with CONTENT. I tried adjusting things so that we call xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag, but all that got me was empty output (except for a document header). It seems like xmlDocDumpFormatMemory is not the thing to use, at least not in the CONTENT case. But libxml2 has a few other "dump" functions, so maybe we can use a different one? I see we are using xmlNodeDump elsewhere, and that has a format option, so maybe there's a way forward there. A lesser issue is that INDENT tacks on a document header (XML declaration) whether there was one or not. I'm not sure whether that's an appropriate thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT case. We have code that can strip off the header again, but we need to figure out exactly when to apply it. I also suspect that it's outright broken to attach a header claiming the data is now in UTF8 encoding. If the database encoding isn't UTF8, then either that's a lie or we now have an encoding violation. Another thing that's mildly irking me is that the current factorization of this code will result in xml_parse'ing the data twice, if you have both DOCUMENT and INDENT specified. We could consider avoiding that if we merged the indentation functionality into xmltotext_with_xmloption, but it's probably premature to do so when we haven't figured out how to get the output right --- we might end up needing two xml_parse calls anyway with different parameters, perhaps. I also had a bunch of cosmetic complaints (mostly around this having a bad case of add-at-the-end-itis), which I've cleaned up in the attached v20. This doesn't address any of the above, however. regards, tom lane diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 467b49b199..53d59662b9 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>' <type>xml</type>, uses the function <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm> <synopsis> -XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO] INDENT] ) </synopsis> <replaceable>type</replaceable> can be <type>character</type>, <type>character varying</type>, or <type>text</type> (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type <type>xml</type> and character types, but PostgreSQL also allows - you to simply cast the value. + you to simply cast the value. The option <type>INDENT</type> allows to + indent the serialized xml output - the default is <type>NO INDENT</type>. + It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also + be used with <type>CONTENT</type> as long as <replaceable>value</replaceable> + contains a well-formed XML. </para> <para> diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt index 0fb9ab7533..bb4c135a7f 100644 --- a/src/backend/catalog/sql_features.txt +++ b/src/backend/catalog/sql_features.txt @@ -621,7 +621,7 @@ X061 XMLParse: character string input and DOCUMENT option YES X065 XMLParse: binary string input and CONTENT option NO X066 XMLParse: binary string input and DOCUMENT option NO X068 XMLSerialize: BOM NO -X069 XMLSerialize: INDENT NO +X069 XMLSerialize: INDENT YES X070 XMLSerialize: character string serialization and CONTENT option YES X071 XMLSerialize: character string serialization and DOCUMENT option YES X072 XMLSerialize: character string serialization YES diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 19351fe34b..3dcd15d5f0 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; + text *result; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); @@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) return; value = argvalue[0]; - *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value), - xexpr->xmloption)); + result = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + if (xexpr->indent) + result = xmlserialize_indent(result); + + *op->resvalue = PointerGetDatum(result); *op->resnull = false; } break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index a0138382a1..efe88ccf9d 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <node> xml_root_version opt_xml_root_standalone %type <node> xmlexists_argument %type <ival> document_or_content -%type <boolean> xml_whitespace_option +%type <boolean> xml_indent_option xml_whitespace_option %type <list> xmltable_column_list xmltable_column_option_list %type <node> xmltable_column_el %type <defelt> xmltable_column_option_el @@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); HANDLER HAVING HEADER_P HOLD HOUR_P IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE - INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P + INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION @@ -15532,13 +15532,14 @@ func_expr_common_subexpr: $$ = makeXmlExpr(IS_XMLROOT, NULL, NIL, list_make3($3, $5, $6), @1); } - | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')' + | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')' { XmlSerialize *n = makeNode(XmlSerialize); n->xmloption = $3; n->expr = $4; n->typeName = $6; + n->indent = $7; n->location = @1; $$ = (Node *) n; } @@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P { $$ = XMLOPTION_DOCUMENT; } | CONTENT_P { $$ = XMLOPTION_CONTENT; } ; +xml_indent_option: INDENT { $$ = true; } + | NO INDENT { $$ = false; } + | /*EMPTY*/ { $$ = false; } + ; + xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = true; } | STRIP_P WHITESPACE_P { $$ = false; } | /*EMPTY*/ { $$ = false; } @@ -16828,6 +16834,7 @@ unreserved_keyword: | INCLUDE | INCLUDING | INCREMENT + | INDENT | INDEX | INDEXES | INHERIT @@ -17384,6 +17391,7 @@ bare_label_keyword: | INCLUDE | INCLUDING | INCREMENT + | INDENT | INDEX | INDEXES | INHERIT diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 78221d2e0f..2331417552 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs) typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod); xexpr->xmloption = xs->xmloption; + xexpr->indent = xs->indent; xexpr->location = xs->location; /* We actually only need these to be able to parse back the expression. */ xexpr->type = targetType; diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 079bcb1208..4d2549ed03 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -631,6 +631,39 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg) } +text * +xmlserialize_indent(text *data) +{ +#ifdef USE_LIBXML + text *result; + xmlDocPtr doc; + xmlChar *xmlbuf; + int nbytes; + + doc = xml_parse(data, XMLOPTION_DOCUMENT, false, + GetDatabaseEncoding(), NULL); + Assert(doc); + + /* Reformat with indenting requested */ + xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1); + + xmlFreeDoc(doc); + + if (!nbytes) + elog(ERROR, "could not indent the given XML document"); + + result = cstring_to_text_with_len((const char *) xmlbuf, nbytes); + + xmlFree(xmlbuf); + + return result; +#else + NO_XML_SUPPORT(); + return NULL; +#endif +} + + xmltype * xmlelement(XmlExpr *xexpr, Datum *named_argvalue, bool *named_argnull, diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 371aa0ffc5..028588fb33 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -840,6 +840,7 @@ typedef struct XmlSerialize XmlOptionType xmloption; /* DOCUMENT or CONTENT */ Node *expr; TypeName *typeName; + bool indent; /* [NO] INDENT */ int location; /* token location, or -1 if unknown */ } XmlSerialize; diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 4220c63ab7..8fb5b4b919 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -1464,7 +1464,7 @@ typedef enum XmlExprOp IS_XMLPARSE, /* XMLPARSE(text, is_doc, preserve_ws) */ IS_XMLPI, /* XMLPI(name [, args]) */ IS_XMLROOT, /* XMLROOT(xml, version, standalone) */ - IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval) */ + IS_XMLSERIALIZE, /* XMLSERIALIZE(is_document, xmlval, indent) */ IS_DOCUMENT /* xmlval IS DOCUMENT */ } XmlExprOp; @@ -1489,6 +1489,8 @@ typedef struct XmlExpr List *args; /* DOCUMENT or CONTENT */ XmlOptionType xmloption pg_node_attr(query_jumble_ignore); + /* INDENT option for XMLSERIALIZE */ + bool indent; /* target type/typmod for XMLSERIALIZE */ Oid type pg_node_attr(query_jumble_ignore); int32 typmod pg_node_attr(query_jumble_ignore); diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index bb36213e6f..753e9ee174 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL) diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h index 311da06cd6..a1dfe4c631 100644 --- a/src/include/utils/xml.h +++ b/src/include/utils/xml.h @@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res extern xmltype *xmlroot(xmltype *data, text *version, int standalone); extern bool xml_is_document(xmltype *arg); extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg); +extern text *xmlserialize_indent(text *data); extern char *escape_xml(const char *str); extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period); diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index ad852dc2f7..ddbf0ca16b 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10)); SELECT xmlserialize(document 'bad' as text); ERROR: not an XML document +-- indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val x="y">42</val> + + </bar> + + </foo> + + +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val x="y">42</val> + + </bar> + + </foo> + + +(1 row) + +-- no indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); + xmlserialize +------------------------------------------- + <foo><bar><val x="y">42</val></bar></foo> +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); + xmlserialize +------------------------------------------- + <foo><bar><val x="y">42</val></bar></foo> +(1 row) + +\set VERBOSITY terse +-- indent malformed xml +SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: invalid XML document +-- indent empty string +SELECT xmlserialize(DOCUMENT '' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT '' AS text INDENT); +ERROR: invalid XML document +-- whitespaces +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT ' ' AS text INDENT); +ERROR: invalid XML document +\set VERBOSITY default +-- indent null +SELECT xmlserialize(DOCUMENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +SELECT xmlserialize(CONTENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +-- indent different encoding (returns UTF-8) +SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val>42</val> + + </bar> + + </foo> + + +(1 row) + +SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val>42</val> + + </bar> + + </foo> + + +(1 row) + +-- 'no indent' = not using 'no indent' +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); + ?column? +---------- + t +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); + ?column? +---------- + t +(1 row) + SELECT xml '<foo>bar</foo>' IS DOCUMENT; ?column? ---------- diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out index 70fe34a04f..2944f84103 100644 --- a/src/test/regress/expected/xml_1.out +++ b/src/test/regress/expected/xml_1.out @@ -309,6 +309,80 @@ ERROR: unsupported XML feature LINE 1: SELECT xmlserialize(document 'bad' as text); ^ DETAIL: This functionality requires the server to be built with libxml support. +-- indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +-- no indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val><... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +\set VERBOSITY terse +-- indent malformed xml +SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: unsupported XML feature at character 30 +SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: unsupported XML feature at character 30 +-- indent empty string +SELECT xmlserialize(DOCUMENT '' AS text INDENT); +ERROR: unsupported XML feature at character 30 +SELECT xmlserialize(CONTENT '' AS text INDENT); +ERROR: unsupported XML feature at character 30 +-- whitespaces +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); +ERROR: unsupported XML feature at character 30 +SELECT xmlserialize(CONTENT ' ' AS text INDENT); +ERROR: unsupported XML feature at character 30 +\set VERBOSITY default +-- indent null +SELECT xmlserialize(DOCUMENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +SELECT xmlserialize(CONTENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +-- indent different encoding (returns UTF-8) +SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +-- 'no indent' = not using 'no indent' +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></... + ^ +DETAIL: This functionality requires the server to be built with libxml support. SELECT xml '<foo>bar</foo>' IS DOCUMENT; ERROR: unsupported XML feature LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT; diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out index 4f029d0072..60dcb3d36a 100644 --- a/src/test/regress/expected/xml_2.out +++ b/src/test/regress/expected/xml_2.out @@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10)); SELECT xmlserialize(document 'bad' as text); ERROR: not an XML document +-- indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val x="y">42</val> + + </bar> + + </foo> + + +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val x="y">42</val> + + </bar> + + </foo> + + +(1 row) + +-- no indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); + xmlserialize +------------------------------------------- + <foo><bar><val x="y">42</val></bar></foo> +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); + xmlserialize +------------------------------------------- + <foo><bar><val x="y">42</val></bar></foo> +(1 row) + +\set VERBOSITY terse +-- indent malformed xml +SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +ERROR: invalid XML document +-- indent empty string +SELECT xmlserialize(DOCUMENT '' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT '' AS text INDENT); +ERROR: invalid XML document +-- whitespaces +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); +ERROR: not an XML document +SELECT xmlserialize(CONTENT ' ' AS text INDENT); +ERROR: invalid XML document +\set VERBOSITY default +-- indent null +SELECT xmlserialize(DOCUMENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +SELECT xmlserialize(CONTENT NULL AS text INDENT); + xmlserialize +-------------- + +(1 row) + +-- indent different encoding (returns UTF-8) +SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val>42</val> + + </bar> + + </foo> + + +(1 row) + +SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); + xmlserialize +---------------------------------------- + <?xml version="1.0" encoding="UTF-8"?>+ + <foo> + + <bar> + + <val>42</val> + + </bar> + + </foo> + + +(1 row) + +-- 'no indent' = not using 'no indent' +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); + ?column? +---------- + t +(1 row) + +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); + ?column? +---------- + t +(1 row) + SELECT xml '<foo>bar</foo>' IS DOCUMENT; ?column? ---------- diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index 24e40d2653..fea875adfd 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -132,6 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest; SELECT xmlserialize(content 'good' as char(10)); SELECT xmlserialize(document 'bad' as text); +-- indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT); +-- no indent +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT); +\set VERBOSITY terse +-- indent malformed xml +SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +SELECT xmlserialize(CONTENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT); +-- indent empty string +SELECT xmlserialize(DOCUMENT '' AS text INDENT); +SELECT xmlserialize(CONTENT '' AS text INDENT); +-- whitespaces +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); +SELECT xmlserialize(CONTENT ' ' AS text INDENT); +\set VERBOSITY default +-- indent null +SELECT xmlserialize(DOCUMENT NULL AS text INDENT); +SELECT xmlserialize(CONTENT NULL AS text INDENT); +-- indent different encoding (returns UTF-8) +SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); +SELECT xmlserialize(CONTENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AStext INDENT); +-- 'no indent' = not using 'no indent' +SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); +SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT '<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); SELECT xml '<foo>bar</foo>' IS DOCUMENT; SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
Thanks a lot for the review! On 09.03.23 21:21, Tom Lane wrote: > I've looked through this now, and have some minor complaints and a major > one. The major one is that it doesn't work for XML that doesn't satisfy > IS DOCUMENT. For example, > > regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document; > ?column? > ---------- > f > (1 row) > > regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text); > xmlserialize > ------------------------------------------- > <bar><val x="y">42</val></bar><foo></foo> > (1 row) > > regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent); > ERROR: invalid XML document > DETAIL: line 1: Extra content at the end of the document > <bar><val x="y">42</val></bar><foo></foo> > ^ I assumed it should fail because the XML string doesn't have a singly-rooted XML. Oracle has this feature implemented and it does not seem to allow non singly-rooted strings either[1]. Also, some the tools I use also fail in this case[2,3] How do you suggest the output should look like? Does the SQL spec also define it? I can't find it online :( > This is not what the documentation promises, and I don't think it's > good enough --- the SQL spec has no restriction saying you can't > use INDENT with CONTENT. I tried adjusting things so that we call > xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag, > but all that got me was empty output (except for a document header). > It seems like xmlDocDumpFormatMemory is not the thing to use, at least > not in the CONTENT case. But libxml2 has a few other "dump" > functions, so maybe we can use a different one? I see we are using > xmlNodeDump elsewhere, and that has a format option, so maybe there's > a way forward there. > > A lesser issue is that INDENT tacks on a document header (XML declaration) > whether there was one or not. I'm not sure whether that's an appropriate > thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT > case. We have code that can strip off the header again, but we > need to figure out exactly when to apply it. I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to option XML_SAVE_NO_DECL for input docs with XML declaration. It no longer returns a XML declaration if the input doc does not contain it. > I also suspect that it's outright broken to attach a header claiming > the data is now in UTF8 encoding. If the database encoding isn't > UTF8, then either that's a lie or we now have an encoding violation. I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now uses the encoding of the given doc and UTF8 if not provided. > Another thing that's mildly irking me is that the current > factorization of this code will result in xml_parse'ing the data > twice, if you have both DOCUMENT and INDENT specified. We could > consider avoiding that if we merged the indentation functionality > into xmltotext_with_xmloption, but it's probably premature to do so > when we haven't figured out how to get the output right --- we might > end up needing two xml_parse calls anyway with different parameters, > perhaps. > > I also had a bunch of cosmetic complaints (mostly around this having > a bad case of add-at-the-end-itis), which I've cleaned up in the > attached v20. This doesn't address any of the above, however. I swear to god I have no idea what "add-at-the-end-itis" means :) > regards, tom lane Thanks a lot! Best, Jim 1 - https://dbfiddle.uk/WUOWtjBU 2 - https://www.samltool.com/prettyprint.php 3 - https://xmlpretty.com/xmlpretty
Attachment
Jim Jones <jim.jones@uni-muenster.de> writes: > On 09.03.23 21:21, Tom Lane wrote: >> I've looked through this now, and have some minor complaints and a major >> one. The major one is that it doesn't work for XML that doesn't satisfy >> IS DOCUMENT. For example, > How do you suggest the output should look like? I'd say a series of node trees, each starting on a separate line. >> I also suspect that it's outright broken to attach a header claiming >> the data is now in UTF8 encoding. If the database encoding isn't >> UTF8, then either that's a lie or we now have an encoding violation. > I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now > uses the encoding of the given doc and UTF8 if not provided. Mmmm .... doing this differently from what we do elsewhere does not sound like the right path forward. The input *is* (or had better be) in the database encoding. regards, tom lane
Jim Jones <jim.jones@uni-muenster.de> writes:On 09.03.23 21:21, Tom Lane wrote:I've looked through this now, and have some minor complaints and a major one. The major one is that it doesn't work for XML that doesn't satisfy IS DOCUMENT. For example,How do you suggest the output should look like?I'd say a series of node trees, each starting on a separate line.
v22 attached enables the usage of INDENT with non singly-rooted xml.
postgres=# SELECT xmlserialize (CONTENT '<bar><val x="y">42</val></bar><foo>73</foo>' AS text INDENT);
xmlserialize
-----------------------
<bar> +
<val x="y">42</val>+
</bar> +
<foo>73</foo>
(1 row)
I tried several libxml2 dump functions and none of them could cope very well with an xml string without a root node. So added them into a temporary root node, so that I could iterate over its children and add them one by one (formatted) into the output buffer.
I slightly modified the existing xml_parse() function to return the list of nodes parsed by xmlParseBalancedChunkMemory:
xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
int encoding, Node *escontext, xmlNodePtr *parsed_nodes)
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, parsed_nodes);
I changed that behavior. It now uses GetDatabaseEncoding();I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now uses the encoding of the given doc and UTF8 if not provided.Mmmm .... doing this differently from what we do elsewhere does not sound like the right path forward. The input *is* (or had better be) in the database encoding.
Thanks!
Best, Jim
Attachment
On 09.03.23 21:21, Tom Lane wrote: > Peter Smith <smithpb2250@gmail.com> writes: >> The patch v19 LGTM. > Another thing that's mildly irking me is that the current > factorization of this code will result in xml_parse'ing the data > twice, if you have both DOCUMENT and INDENT specified. We could > consider avoiding that if we merged the indentation functionality > into xmltotext_with_xmloption, but it's probably premature to do so > when we haven't figured out how to get the output right --- we might > end up needing two xml_parse calls anyway with different parameters, > perhaps. Just a thought: since xmlserialize_indent also calls xml_parse() to build the xmlDocPtr, couldn't we simply bypass xmltotext_with_xmloption() in case of INDENT is specified? Something like this: diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 19351fe..ea808dd 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) { Datum *argvalue = op->d.xmlexpr.argvalue; bool *argnull = op->d.xmlexpr.argnull; + text *result; /* argument type is known to be xml */ Assert(list_length(xexpr->args) == 1); @@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) return; value = argvalue[0]; - *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value), - xexpr->xmloption)); + if (xexpr->indent) + result = xmlserialize_indent(DatumGetXmlP(value), + xexpr->xmloption); + else + result = xmltotext_with_xmloption(DatumGetXmlP(value), + xexpr->xmloption); + + *op->resvalue = PointerGetDatum(result); *op->resnull = false; } break;
Jim Jones <jim.jones@uni-muenster.de> writes: > [ v22-0001-Add-pretty-printed-XML-output-option.patch ] I poked at this for awhile and ran into a problem that I'm not sure how to solve: it misbehaves for input with embedded DOCTYPE. regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent); xmlserialize -------------- <!DOCTYPE a>+ <a></a> + (1 row) regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent); xmlserialize -------------- (1 row) The bad result for CONTENT is because xml_parse() decides to parse_as_document, but xmlserialize_indent has no idea that happened and tries to use the content_nodes list anyway. I don't especially care for the laissez faire "maybe we'll set *content_nodes and maybe we won't" API you adopted for xml_parse, which seems to be contributing to the mess. We could pass back more info so that xmlserialize_indent knows what really happened. However, that won't fix the bogus output for the DOCUMENT case. Are we perhaps passing incorrect flags to xmlSaveToBuffer? regards, tom lane
On 14.03.23 18:40, Tom Lane wrote: > Jim Jones <jim.jones@uni-muenster.de> writes: >> [ v22-0001-Add-pretty-printed-XML-output-option.patch ] > I poked at this for awhile and ran into a problem that I'm not sure > how to solve: it misbehaves for input with embedded DOCTYPE. > > regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent); > xmlserialize > -------------- > <!DOCTYPE a>+ > <a></a> + > > (1 row) The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements to be serialized with start-end tag pairs. Removing it did the trick ... postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT); xmlserialize -------------- <!DOCTYPE a>+ <a/> + (1 row) ... but as a side effect empty start-end tags will be now serialized as empty elements postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text INDENT); xmlserialize -------------- <foo> + <bar/> + </foo> (1 row) It seems to be the standard behavior of other xml indent tools (including Oracle) > regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent); > xmlserialize > -------------- > > (1 row) > > The bad result for CONTENT is because xml_parse() decides to > parse_as_document, but xmlserialize_indent has no idea that happened > and tries to use the content_nodes list anyway. I don't especially > care for the laissez faire "maybe we'll set *content_nodes and maybe > we won't" API you adopted for xml_parse, which seems to be contributing > to the mess. We could pass back more info so that xmlserialize_indent > knows what really happened. I added a new (nullable) parameter to the xml_parse function that will return the actual XmlOptionType used to parse the xml data. Now xmlserialize_indent knows how the data was really parsed: postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT); xmlserialize -------------- <!DOCTYPE a>+ <a/> + (1 row) I added test cases for these queries. v23 attached. Thanks! Best, Jim
Attachment
Jim Jones <jim.jones@uni-muenster.de> writes: > On 14.03.23 18:40, Tom Lane wrote: >> I poked at this for awhile and ran into a problem that I'm not sure >> how to solve: it misbehaves for input with embedded DOCTYPE. > The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements > to be serialized with start-end tag pairs. Removing it did the trick ... > ... but as a side effect empty start-end tags will be now serialized as > empty elements > postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text > INDENT); > xmlserialize > -------------- > <foo> + > <bar/> + > </foo> > (1 row) Huh, interesting. That is a legitimate pretty-fication of the input, I suppose, but some people might think it goes beyond the charter of "indentation". I'm okay with it personally; anyone want to object? regards, tom lane
I wrote: > Huh, interesting. That is a legitimate pretty-fication of the input, > I suppose, but some people might think it goes beyond the charter of > "indentation". I'm okay with it personally; anyone want to object? Hearing no objections to that, I moved ahead with this. It occurred to me to test v23 for memory leaks, and it had bad ones: * the "newline" node used in the CONTENT case never got freed. Making another one for each line wasn't helping, either. * libxml, at least in the 2.9.7 version I have here, turns out to leak memory if you pass a non-null encoding to xmlSaveToBuffer. But AFAICS we don't really need to do that, because the last thing we want is for libxml to try to do any encoding conversion. After cleaning that up, I saw that we were indeed doing essentially duplicative xml_parse calls for the DOCUMENT check and the indentation work, so I refactored to allow just one call to serve. Pushed with those changes and some other cosmetic cleanup. Thanks for working so hard on this! (Now to keep an eye on the buildfarm, to see if other versions of libxml work like mine ...) BTW, the libxml leak problem seems to extend to other cases too. I tested with code like do $$ declare x xml; t text; begin x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>'; for i in 1..10000000 loop t := xmlserialize(document x as text); end loop; raise notice 't = %', t; end; $$; That case is fine, but if you change the encoding spec to "latin1", it leaks like mad. That problem is not the fault of this patch, I don't think. I wonder if we need to do something to prevent libxml from seeing encoding declarations other than utf8? regards, tom lane
I wrote: > BTW, the libxml leak problem seems to extend to other cases too. > I tested with code like > do $$ > declare x xml; t text; > begin > x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>'; > for i in 1..10000000 loop > t := xmlserialize(document x as text); > end loop; > raise notice 't = %', t; > end; > $$; > That case is fine, but if you change the encoding spec to "latin1", > it leaks like mad. That problem is not the fault of this patch, > I don't think. I wonder if we need to do something to prevent > libxml from seeing encoding declarations other than utf8? After a bit of further testing: the leak is present in libxml2 2.9.7 which is what I have on this RHEL8 box, but it seems not to occur in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora isn't carrying any relevant local patch). So maybe it's worth working around that, or maybe it isn't. regards, tom lane
Re: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)
> On 15 Mar 2023, at 22:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > After a bit of further testing: the leak is present in libxml2 2.9.7 > which is what I have on this RHEL8 box, but it seems not to occur > in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora > isn't carrying any relevant local patch). > > So maybe it's worth working around that, or maybe it isn't. 2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on when in that timespan the issue was fixed it might be in a release which will be with us for quite some time. The lack of reports (that I was able to find) indicate that it might be rare in production though? -- Daniel Gustafsson
On 15.03.23 22:13, Tom Lane wrote: > I wrote: > It occurred to me to test v23 for memory leaks, and it had bad ones: > * the "newline" node used in the CONTENT case never got freed. > Making another one for each line wasn't helping, either. Oh, I did really miss that one. Thanks! > Pushed with those changes and some other cosmetic cleanup. > Thanks for working so hard on this! Great! Thank you, Peter and Andrey for the very nice reviews. > BTW, the libxml leak problem seems to extend to other cases too. > I tested with code like > > do $$ > declare x xml; t text; > begin > x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>'; > for i in 1..10000000 loop > t := xmlserialize(document x as text); > end loop; > raise notice 't = %', t; > end; > $$; > > That case is fine, but if you change the encoding spec to "latin1", > it leaks like mad. That problem is not the fault of this patch, > I don't think. I wonder if we need to do something to prevent > libxml from seeing encoding declarations other than utf8? In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't reproduce this memory leak. It's been most likely fixed in further libxml2 versions. Unfortunately their gitlab page has no release notes from versions prior to 2.9.13 :( Best, Jim