Thread: [PATCH] Add pretty-printed XML output option

[PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
Hi,

This small patch introduces a XML pretty print function. It basically 
takes advantage of the indentation feature of xmlDocDumpFormatMemory 
from libxml2 to format XML strings.

postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var 
id="z">42</var></bar></foo>');
         xmlpretty
--------------------------
  <foo id="x">            +
    <bar id="y">          +
      <var id="z">42</var>+
    </bar>                +
  </foo>                  +

(1 row)


The patch also contains regression tests and documentation.

Feedback is very welcome!

Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
The system somehow returns different error messages in Linux and 
MacOS/Windows, which was causing the cfbot to fail.

SELECT xmlpretty('<foo>')::xml;
                           ^
-DETAIL:  line 1: chunk is not well balanced
+DETAIL:  line 1: Premature end of data in tag foo line 1

Test removed in v2.

On 02.02.23 21:35, Jim Jones wrote:
> Hi,
>
> This small patch introduces a XML pretty print function. It basically 
> takes advantage of the indentation feature of xmlDocDumpFormatMemory 
> from libxml2 to format XML strings.
>
> postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var 
> id="z">42</var></bar></foo>');
>         xmlpretty
> --------------------------
>  <foo id="x">            +
>    <bar id="y">          +
>      <var id="z">42</var>+
>    </bar>                +
>  </foo>                  +
>
> (1 row)
>
>
> The patch also contains regression tests and documentation.
>
> Feedback is very welcome!
>
> Jim
Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
Hi,

The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
the regression tests with the error:

ERROR:  unsupported XML feature
DETAIL:  This functionality requires the server to be built with libxml 
support.

Is there anything I can do to enable libxml to run my regression tests?

v3 adds a missing xmlFree call.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:
> The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing 
> the regression tests with the error:

> ERROR:  unsupported XML feature
> DETAIL:  This functionality requires the server to be built with libxml 
> support.

> Is there anything I can do to enable libxml to run my regression tests?

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 06.02.23 17:23, Tom Lane wrote:
>> Your patch needs to also update expected/xml_1.out to match the output
>> the test produces when run without --with-libxml.

Thanks a lot! It seems to do the trick.

xml_1.out updated in v4.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
while working on another item of the TODO list I realized that I should 
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.

Fixed in v5.

Best regards, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Thu, Feb 9, 2023 at 7:31 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> while working on another item of the TODO list I realized that I should
> be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
>
> Fixed in v5.
>

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 09.02.23 00:09, Peter Smith wrote:
> I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
> other xmlFreeDoc(doc) is not. As the doc is assigned outside the
> PG_TRY shouldn't those both be the same?

Hi Peter,

My logic there was the following: if program reached that part of the 
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked, 
which consequently means that the variables doc and xmlbuf are != NULL, 
therefore not needing to be checked. Am I missing something?

Thanks a lot for the review!

Best, Jim




Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Thu, Feb 9, 2023 at 10:42 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> On 09.02.23 00:09, Peter Smith wrote:
> > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
> > other xmlFreeDoc(doc) is not. As the doc is assigned outside the
> > PG_TRY shouldn't those both be the same?
>
> Hi Peter,
>
> My logic there was the following: if program reached that part of the
> code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
> which consequently means that the variables doc and xmlbuf are != NULL,
> therefore not needing to be checked. Am I missing something?
>

Thanks. I think I understand it better now -- I expect
xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this
source [1]), but it will return nbytes of 0, but your code will still
throw ERROR, meaning the guard for doc NULL is necessary for the
PG_CATCH.

In that case, everything LGTM.

~

OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:

doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
   return 0;

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 09.02.23 02:01, Peter Smith wrote:
> OTOH, if you are having to check for NULL doc anyway, maybe it's just
> as easy only doing that up-front. Then you could quick-exit the
> function without calling xmlDocDumpFormatMemory etc. in the first
> place. For example:
>
> doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
> if (!doc)
>     return 0;

I see your point. If I got it right, you're suggesting the following 
change in the PG_TRY();

    PG_TRY();
     {

         int nbytes;

         if(!doc)
             xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                     "could not parse the given XML document");

         xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

         if(!nbytes || xmlerrcxt->err_occurred)
             xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                     "could not indent the given XML document");


         initStringInfo(&buf);
         appendStringInfoString(&buf, (const char *)xmlbuf);

     }

.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Is it what you suggest?

Thanks a lot for the thorough review!

Best, Jim




Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:
> I see your point. If I got it right, you're suggesting the following
> change in the PG_TRY();

>     PG_TRY();
>      {

>          int nbytes;

>          if(!doc)
>              xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                      "could not parse the given XML document");

>          xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

>          if(!nbytes || xmlerrcxt->err_occurred)
>              xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                      "could not indent the given XML document");


>          initStringInfo(&buf);
>          appendStringInfoString(&buf, (const char *)xmlbuf);

>      }

> .. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Um ... why are you using PG_TRY here at all?  It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 09.02.23 08:23, Tom Lane wrote:
> Um ... why are you using PG_TRY here at all?  It seems like
> you have full control of the actually likely error cases.
> The only plausible error out of the StringInfo calls is OOM,
> and you probably don't want to trap that at all.

My intention was to catch any unexpected error from 
xmlDocDumpFormatMemory and handle it properly. But I guess you're right, 
I can control the likely error cases by checking doc and nbytes.

You suggest something along these lines?

     xmlDocPtr  doc;
     xmlChar    *xmlbuf = NULL;
     text       *arg = PG_GETARG_TEXT_PP(0);
     StringInfoData buf;
     int nbytes;

     doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, 
GetDatabaseEncoding(), NULL);

     if(!doc)
         elog(ERROR, "could not parse the given XML document");

     xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

     xmlFreeDoc(doc);

     if(!nbytes)
         elog(ERROR, "could not indent the given XML document");

     initStringInfo(&buf);
     appendStringInfoString(&buf, (const char *)xmlbuf);

     xmlFree(xmlbuf);

     PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));


Thanks!

Best, Jim


Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Eisentraut
Date:
On 02.02.23 21:35, Jim Jones wrote:
> This small patch introduces a XML pretty print function. It basically 
> takes advantage of the indentation feature of xmlDocDumpFormatMemory 
> from libxml2 to format XML strings.

I suggest we call it "xmlformat", which is an established term for this.




Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> On 09.02.23 08:23, Tom Lane wrote:
> > Um ... why are you using PG_TRY here at all?  It seems like
> > you have full control of the actually likely error cases.
> > The only plausible error out of the StringInfo calls is OOM,
> > and you probably don't want to trap that at all.
>
> My intention was to catch any unexpected error from
> xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
> I can control the likely error cases by checking doc and nbytes.
>
> You suggest something along these lines?
>
>      xmlDocPtr  doc;
>      xmlChar    *xmlbuf = NULL;
>      text       *arg = PG_GETARG_TEXT_PP(0);
>      StringInfoData buf;
>      int nbytes;
>
>      doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
> GetDatabaseEncoding(), NULL);
>
>      if(!doc)
>          elog(ERROR, "could not parse the given XML document");
>
>      xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
>
>      xmlFreeDoc(doc);
>
>      if(!nbytes)
>          elog(ERROR, "could not indent the given XML document");
>
>      initStringInfo(&buf);
>      appendStringInfoString(&buf, (const char *)xmlbuf);
>
>      xmlFree(xmlbuf);
>
>      PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
>
>
> Thanks!
>

Something like that LGTM, but here are some other minor comments.

======

1.
FYI, there are some whitespace warnings applying the v5 patch

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41:
trailing whitespace.

warning: squelched 48 whitespace errors
warning: 53 lines add whitespace errors.

======
src/test/regress/sql/xml.sql

2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,

+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty('     ');

~~

3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?

e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);

======
src/include/catalog/pg_proc.dat

4.

+  { oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlpretty', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlpretty' },

Spurious leading space for this new entry.

======
doc/src/sgml/func.sgml

5.
+ <foo id="x">
+   <bar id="y">
+     <var id="z">42</var>
+   </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>

A spurious blank line in the example after the "(1 row)"

~~~

6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 10.02.23 02:10, Peter Smith wrote:
> On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
> 1.
> FYI, there are some whitespace warnings applying the v5 patch
>
Trailing whitespaces removed. The patch applies now without warnings.
> ======
> src/test/regress/sql/xml.sql
>
> 2.
> The v5 patch was already testing NULL, but it might be good to add
> more tests to verify the function is behaving how you want for edge
> cases. For example,
>
> +-- XML pretty print: NULL, empty string, spaces only...
> SELECT xmlpretty(NULL);
> SELECT xmlpretty('');
> SELECT xmlpretty('     ');

Test cases added.

> 3.
> The function is returning XML anyway, so is the '::xml' casting in
> these tests necessary?
>
> e.g.
> SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
It is indeed not necessary. Most likely I used it for testing and forgot 
to remove it afterwards. Now removed.
> ======
> src/include/catalog/pg_proc.dat
>
> 4.
>
> +  { oid => '4642', descr => 'Indented text from xml',
> +  proname => 'xmlpretty', prorettype => 'xml',
> +  proargtypes => 'xml', prosrc => 'xmlpretty' },
>
> Spurious leading space for this new entry.
Removed.
>
> ======
> doc/src/sgml/func.sgml
>
> 5.
> + <foo id="x">
> +   <bar id="y">
> +     <var id="z">42</var>
> +   </bar>
> + </foo>
> +
> +(1 row)
> +
> +]]></screen>
>
> A spurious blank line in the example after the "(1 row)"
Removed.
> ~~~
>
> 6.
> Does this function docs belong in section 9.15.1 "Producing XML
> Content"? Or (since it is not really producing content) should it be
> moved to the 9.15.3 "Processing XML" section?
Moved to the section 9.15.3

Following the suggestion of Peter Eisentraut I renamed the function to 
xmlformat().

v6 attached.

Thanks a lot for the review.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.

My ./configure looks like this:
./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug
--enable-cassert --enable-tap-tests  CFLAGS="-ggdb -O0 -g3
-fno-omit-frame-pointer"

resulting in:
checking whether to build with XML support... yes
checking for libxml-2.0 >= 2.6.23... yes

~

e.g.(these are a sample of errors)

xml                          ... FAILED     2561 ms

@@ -344,8 +326,6 @@
 <twoerrors>&idontexist;</unbalanced>
                        ^
 line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
-                                    ^
 SELECT xmlparse(document '<nosuchprefix:tag/>');
       xmlparse
 ---------------------
@@ -1696,14 +1676,8 @@
 SELECT xmlformat('');
 ERROR:  invalid XML document
 DETAIL:  line 1: switching encoding : no input
-
-^
 line 1: Document is empty
-
-^
 -- XML format: invalid string (whitespaces)
 SELECT xmlformat('   ');
 ERROR:  invalid XML document
 DETAIL:  line 1: Start tag expected, '<' not found
-
-   ^

~~

Separately (but maybe it's related?), the CF-bot test also reported a
failure [1] with strange error detail differences.

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12
09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty

 ^
 -- XML format: invalid string (whitespaces)

------
[1] https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 13.02.23 02:15, Peter Smith wrote:
Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.

Yes, I noticed it yesterday ... and I'm not sure how to solve it. It seems that in the system is returning a different error message in the FreeBSD patch tester, which is causing a regression test in this particular OS to fail.


diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out	2023-02-12 09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out	2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty  ^ -- XML format: invalid string (whitespaces)


Does anyone know if there is anything I can do to make the error messages be consistent among different OS?

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 13.02.23 13:15, Jim Jones wrote:
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out	2023-02-12 09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out	2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@ -- XML format: empty string SELECT xmlformat(''); ERROR:  invalid XML document
-DETAIL:  line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL:  line 1: Document is empty  ^ -- XML format: invalid string (whitespaces)

I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself. I just removed the troubling test case for now

SELECT xmlformat('');

v7 attached.

Thanks for reviewing this patch!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Wed, Feb 15, 2023 at 8:55 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> On 13.02.23 13:15, Jim Jones wrote:
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
> @@ -1695,10 +1695,7 @@
>  -- XML format: empty string
>  SELECT xmlformat('');
>  ERROR:  invalid XML document
> -DETAIL:  line 1: switching encoding : no input
> -
> -^
> -line 1: Document is empty
> +DETAIL:  line 1: Document is empty
>
>  ^
>  -- XML format: invalid string (whitespaces)
>
> I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment
itself.I just removed the troubling test case for now
 
>
> SELECT xmlformat('');
>
> v7 attached.
>
> Thanks for reviewing this patch!
>

Yesterday I looked at those cfbot configs and noticed all those
machines have different versions of libxml.

2.10.3
2.6.23
2.9.10
2.9.13

But I don't if version numbers have any impact on the different error
details or not.

~

The thing that puzzled me most is that in MY environment (CentOS7;
libxml 20901; PG --with-libxml build) I get this behaviour.

- Without your v6 patch 'make check' is all OK.

- With your v6 patch other XML tests (not only yours) of 'make check'
failed with different error messages.

- Similarly, if I keep the v6 patch but just change (in xmlformat) the
#ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail,
but the other XML tests are working OK again.

Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.

But so far I was unable to identify the reason. Sorry, I don't know
this XML API well enough to help more.

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 14.02.23 23:45, Peter Smith wrote:
> Those results implied to me that this function code (in my environment
> anyway) is somehow introducing a side effect causing the *other* XML
> tests to fail.

I believe I've found the issue. It is probably related to the XML OPTION 
settings, as it seems to deliver different error messages when set to 
DOCUMENT or CONTENT:

postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
                          ^
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v8 attached reintroduces the SELECT xmlformat('') test case and adds SET 
XML OPTION DOCUMENT to the regression tests.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Wed, Feb 15, 2023 at 11:05 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> On 14.02.23 23:45, Peter Smith wrote:
> > Those results implied to me that this function code (in my environment
> > anyway) is somehow introducing a side effect causing the *other* XML
> > tests to fail.
>
> I believe I've found the issue. It is probably related to the XML OPTION
> settings, as it seems to deliver different error messages when set to
> DOCUMENT or CONTENT:
>
> postgres=# SET XML OPTION CONTENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
> postgres=# SET XML OPTION DOCUMENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> LINE 1: SELECT xmlformat('');
>                           ^
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
> XML OPTION DOCUMENT to the regression tests.
>

With v8, in my environment, in psql I see something slightly different:


test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
                         ^
DETAIL:  line 1: switching encoding : no input
line 1: Document is empty

~~

test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR:  relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                    ^
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR:  relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
                    ^

~~

Because the expected extra detail stuff is missing the regression
tests are still failing for me.

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 15.02.23 02:09, Peter Smith wrote:
> With v8, in my environment, in psql I see something slightly different:
>
>
> test_pub=# SET XML OPTION CONTENT;
> SET
> test_pub=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
> line 1: Document is empty
> test_pub=# SET XML OPTION DOCUMENT;
> SET
> test_pub=# SELECT xmlformat('');
> ERROR:  invalid XML document
> LINE 1: SELECT xmlformat('');
>                           ^
> DETAIL:  line 1: switching encoding : no input
> line 1: Document is empty
>
> ~~
>
> test_pub=# SET XML OPTION CONTENT;
> SET
> test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> ERROR:  relation "xmltest" does not exist
> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
>                      ^
> test_pub=# SET XML OPTION DOCUMENT;
> SET
> test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> ERROR:  relation "xmltest" does not exist
> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
>                      ^
>
> ~~

Yes... a cfbot also complained about the same thing.

Setting the VERBOSITY to terse might solve this issue:

postgres=# \set VERBOSITY terse
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document

postgres=# \set VERBOSITY default
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v9 wraps the corner test cases with VERBOSITY terse to reduce the error 
message output.

Thanks!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Wed, Feb 15, 2023 at 6:10 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> On 15.02.23 02:09, Peter Smith wrote:
> > With v8, in my environment, in psql I see something slightly different:
> >
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > LINE 1: SELECT xmlformat('');
> >                           ^
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> >
> > ~~
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> >                      ^
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> >                      ^
> >
> > ~~
>
> Yes... a cfbot also complained about the same thing.
>
> Setting the VERBOSITY to terse might solve this issue:
>
> postgres=# \set VERBOSITY terse
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
>
> postgres=# \set VERBOSITY default
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v9 wraps the corner test cases with VERBOSITY terse to reduce the error
> message output.
>

Bingo!! Your v9 patch now passes all 'make check' tests for me.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.

(I don't understand, maybe someone can explain, how the patch managed
to mess verbosity of the existing tests.)

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 15.02.23 10:06, Peter Smith wrote:
> Bingo!! Your v9 patch now passes all 'make check' tests for me.
Nice! It also passed all tests in the patch tester.
> But I'll leave it to a committer to decide if this VERBOSITY toggle is
> the best fix.

I see now other test cases in the xml.sql file that also use this 
option, so it might be a known "issue".

Shall we now set the patch to "Ready for Commiter"?

Thanks a lot for the review!

Best, Jim



Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Alvaro Herrera
Date:
On 2023-Feb-13, Peter Smith wrote:

> Something is misbehaving.
> 
> Using the latest HEAD, and before applying the v6 patch, 'make check'
> is working OK.
> 
> But after applying the v6 patch, some XML regression tests are failing
> because the DETAIL part of the message indicating the wrong syntax
> position is not getting displayed. Not only for your new tests -- but
> for other XML tests too.

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message.  I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 15.02.23 11:11, Alvaro Herrera wrote:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message.  I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
>
> An easy way forward might be to manually apply the patch from xml.out to
> xml_2.out, and edit it by hand to remove the additional lines.
>
> See commit 085423e3e326 for a bit of background.

Hi Álvaro,

As my test cases were not specifically about how the error message looks 
like, I thought that suppressing part of the error messages by setting 
VERBOSITY terse would suffice.[1] Is this approach not recommended?

Thanks!

Best, Jim

1 - v9 patch: 
https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com


Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Alvaro Herrera
Date:
On 2023-Feb-15, Jim Jones wrote:

> Hi Álvaro,
> 
> As my test cases were not specifically about how the error message looks
> like, I thought that suppressing part of the error messages by setting
> VERBOSITY terse would suffice.[1] Is this approach not recommended?

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests.  I did see that patch and I thought it was taking
the wrong approach.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 15.02.23 12:11, Alvaro Herrera wrote:
> Well, I don't see why we would depart from what we've been doing in the
> rest of the XML tests.  I did see that patch and I thought it was taking
> the wrong approach.

Fair point.

v10 patches the xml.out to xml_2.out - manually removing the additional 
lines.

Thanks for the review!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
Accidentally left the VERBOSE settings out -- sorry!

Now it matches the approach used in a xpath test in xml.sql, xml.out, 
xml_1.out and xml_2.out

-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '<invalidns xmlns=''<''/>');
ERROR:  could not parse XML document
\set VERBOSITY default

v11 now correctly sets xml_2.out.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message.  I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
> See commit 085423e3e326 for a bit of background.

Yeah.  That's kind of sad, because it means there are still broken
libxml2s out there in 2023.  Nonetheless, since there are, it is not
optional to fix all three expected-files.

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
On Thu, Feb 16, 2023 at 12:49 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> Accidentally left the VERBOSE settings out -- sorry!
>
> Now it matches the approach used in a xpath test in xml.sql, xml.out,
> xml_1.out and xml_2.out
>
> -- Since different libxml versions emit slightly different
> -- error messages, we suppress the DETAIL in this test.
> \set VERBOSITY terse
> SELECT xpath('/*', '<invalidns xmlns=''<''/>');
> ERROR:  could not parse XML document
> \set VERBOSITY default
>
> v11 now correctly sets xml_2.out.
>
> Best, Jim


Firstly, Sorry it seems like I made a mistake and was premature
calling bingo above for v9.
- today I repeated v9 'make check' and found it failing still.
- the new xmlformat tests are OK, but some pre-existing xmlparse tests
are broken.
- see attached file pretty-v9-results

----

OTOH, the absence of xml_2.out from this patch appears to be the
correct explanation for why my results have been differing.

----

Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

------

The attached patch update (v12-0002) fixes the xml_2.out for me.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Nikolay Samokhvalov
Date:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
I suggest we call it "xmlformat", which is an established term for this.

Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
I suggest we call it "xmlformat", which is an established term for this.

Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

Hi Nikolay,

My first thought was to call it xmlpretty, to make it consistent with the jsonb equivalent "jsonb_pretty". But yes, you make a good observation .. xmlserialize seems to be a much better candidate.

I would be willing to refactor my patch if we agree on xmlserialize.

Thanks for the suggestion!

Jim

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 16.02.23 00:13, Peter Smith wrote:
> Today I fetched and tried the latest v11.
>
> It is failing too, but only just.
> - see attached file pretty-v11-results
>
> It looks only due to a whitespace EOF issue in the xml_2.out
>
> @@ -1679,4 +1679,4 @@
>   -- XML format: empty string
>   SELECT xmlformat('');
>   ERROR:  invalid XML document
> -\set VERBOSITY default
> \ No newline at end of file
> +\set VERBOSITY default
>
> ------
>
> The attached patch update (v12-0002) fixes the xml_2.out for me.

It works for me too.

Thanks for the quick fix!

Jim




Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 16.02.23 00:13, Peter Smith wrote:
> Today I fetched and tried the latest v11.
> It is failing too, but only just.
> - see attached file pretty-v11-results
>
> It looks only due to a whitespace EOF issue in the xml_2.out
>
> @@ -1679,4 +1679,4 @@
>   -- XML format: empty string
>   SELECT xmlformat('');
>   ERROR:  invalid XML document
> -\set VERBOSITY default
> \ No newline at end of file
> +\set VERBOSITY default
>
> ------
>
> The attached patch update (v12-0002) fixes the xml_2.out for me.

I'm squashing v12-0001 and v12-0002 (v13 attached). There is still an 
open discussion on renaming the function to xmlserialize,[1] but it 
shouldn't be too difficult to change it later in case we reach a 
consensus :)

Thanks!

Jim

1- 
https://www.postgresql.org/message-id/CANNMO%2BKwb4_87G8qDeN%2BVk1B1vX3HvgoGW%2B13fJ-b6rj7qbAww%40mail.gmail.com

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Andrey Borodin
Date:
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
>
> I'm squashing v12-0001 and v12-0002 (v13 attached).
I've looked into the patch. The code looks to conform to usual expectations.
One nit: this comment should have just one asterisk.
+ /**
And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...

Best regards, Andrey Borodin.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
I suggest we call it "xmlformat", which is an established term for this.

Some very-very old, rusted memory told me that there was something in standard – and indeed, it seems it described an optional Feature X069, “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly different form, with ability to specify the number of spaces for indentation https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

After your comment I'm studying the possibility to extend the existing xmlserialize function to add the indentation feature. If so, how do you think it should look like? An extra parameter? e.g.

SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text, true);

.. or more or like Oracle does it

SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB INDENT)
FROM dual;

Thanks!

Best, Jim

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 17.02.23 01:08, Andrey Borodin wrote:
On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
I've looked into the patch. The code looks to conform to usual expectations.
One nit: this comment should have just one asterisk.
+ /**

Thanks for reviewing! Asterisk removed in v14.

And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...

According to the documentation,[1] such validations are not supported.

"The xml type does not validate input values against a document type declaration (DTD), even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema."

But I'll have a look at the code to be sure :)

Best, Jim

1- https://www.postgresql.org/docs/15/datatype-xml.html

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Nikolay Samokhvalov
Date:


On Fri, Feb 17, 2023 at 1:14 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
After your comment I'm studying the possibility to extend the existing xmlserialize function to add the indentation feature. If so, how do you think it should look like? An extra parameter? e.g.

SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text, true);

.. or more or like Oracle does it

SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB INDENT)
FROM dual;


My idea was to follow the SQL standard (part 14, SQL/XML); unfortunately, there is no free version, but there are drafts at http://www.wiscorp.com/SQLStandards.html.
<XML character string serialization> ::=  XMLSERIALIZE <left paren> [ <document or content> ]
      <XML value expression> AS <data type>      [ <XML serialize bom> ]      [ <XML serialize version> ]      [ <XML declaration option> ]
      [ <XML serialize indent> ]      <right paren>
<XML serialize indent> ::=  [ NO ] INDENT

Oracle's extension SIZE=n also seems interesting (including a special case SIZE=0, which means using new-line characters without spaces for each line).

Re: [PATCH] Add pretty-printed XML output option

From
Peter Eisentraut
Date:
On 17.02.23 23:24, Nikolay Samokhvalov wrote:
> 
> My idea was to follow the SQL standard (part 14, SQL/XML); 
> unfortunately, there is no free version, but there are drafts at 
> http://www.wiscorp.com/SQLStandards.html 
> <http://www.wiscorp.com/SQLStandards.html>.
> 
> <XML character string serialization> ::= XMLSERIALIZE <left paren> [ 
> <document or content> ]
> 
> <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML 
> serialize version> ] [ <XML declaration option> ]
> 
> [ <XML serialize indent> ] <right paren>
> 
> <XML serialize indent> ::= [ NO ] INDENT

Good find.  It would be better to use this standard syntax.



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 18.02.23 19:09, Peter Eisentraut wrote:
> On 17.02.23 23:24, Nikolay Samokhvalov wrote:
>>
>> My idea was to follow the SQL standard (part 14, SQL/XML); 
>> unfortunately, there is no free version, but there are drafts at 
>> http://www.wiscorp.com/SQLStandards.html 
>> <http://www.wiscorp.com/SQLStandards.html>.
>>
>> <XML character string serialization> ::= XMLSERIALIZE <left paren> [ 
>> <document or content> ]
>>
>> <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML 
>> serialize version> ] [ <XML declaration option> ]
>>
>> [ <XML serialize indent> ] <right paren>
>>
>> <XML serialize indent> ::= [ NO ] INDENT
>
> Good find.  It would be better to use this standard syntax.

As suggested by Peter and Nikolay, v15 now removes the xmlformat 
function from the catalog and adds the [NO] INDENT option to 
xmlserialize, as described in X069.

postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val 
x="y">42</val></bar></foo>' AS text INDENT);
               xmlserialize
----------------------------------------
  <?xml version="1.0" encoding="UTF-8"?>+
  <foo>                                 +
    <bar>                               +
      <val x="y">42</val>               +
    </bar>                              +
  </foo>                                +

(1 row)

postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val 
x="y">42</val></bar></foo>' AS text NO INDENT);
                xmlserialize
-------------------------------------------
  <foo><bar><val x="y">42</val></bar></foo>
(1 row)

Although the indent feature is designed to work with xml strings of type 
DOCUMENT, this implementation also allows the usage of CONTENT type 
strings as long as it contains a well-formed xml. It will throw an error 
otherwise.

Thanks!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Nikolay Samokhvalov
Date:
On Mon, Feb 20, 2023 at 3:06 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
As suggested by Peter and Nikolay, v15 now removes the xmlformat
function from the catalog and adds the [NO] INDENT option to
xmlserialize, as described in X069.\

Great. I'm checking this patch and it seems, indentation stops working if we have a text node inside:

gitpod=# select xmlserialize(document '<xml><more>13</more></xml>' as text indent);
              xmlserialize              
----------------------------------------
 <?xml version="1.0" encoding="UTF-8"?>+
 <xml>                                 +
   <more>13</more>                     +
 </xml>                                +
 
(1 row)

gitpod=# select xmlserialize(document '<xml>text<more>13</more></xml>' as text indent);
              xmlserialize              
----------------------------------------
 <?xml version="1.0" encoding="UTF-8"?>+
 <xml>text<more>13</more></xml>        +
 
(1 row) 

Worth to mention, Oracle behaves similarly -- indentation doesn't work: https://dbfiddle.uk/hRz5sXdM.

But is this as expected? Shouldn't it be like this:
<xml>
  text
  <more>13</more>
</xml>
?

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
Here are some review comments for patch v15-0001

FYI, the patch applies clean and tests OK for me.

======
doc/src/sgml/datatype.sgml

1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )

~

Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.

XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ [NO] INDENT ] )

======
src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
  {
  Datum    *argvalue = op->d.xmlexpr.argvalue;
  bool    *argnull = op->d.xmlexpr.argnull;
-
+ bool    indent = op->d.xmlexpr.xexpr->indent;
+ text    *data;
  /* argument type is known to be xml */
  Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

~~~

3.
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
  }

Unnecessary blank line at the end.
======
src/backend/utils/adt/xml.

4. xmlformat

+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif

Wrong indentation (return 0) in the indentation function? ;-)

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 22.02.23 08:05, Nikolay Samokhvalov wrote:

But is this as expected? Shouldn't it be like this:
<xml>
  text
  <more>13</more>
</xml>
?

Oracle and other parsers I know also do not work well with mixed contents.[1,2] I believe libxml2's parser does not know where to put the newline, as mixed values can contain more than one text node:

<xml>text<more>13</more> text2 text3</xml> [3]

And applying this logic the output could look like this ..

<xml>text
  <more>13</more>text2 text3
</xml>

or even this

<xml>
  text
  <more>13</more>
  text2 text3
</xml>

.. which doesn't seem right either. Perhaps a note about mixed contents in the docs would make things clearer?

Thanks for the review!

Jim

1- https://xmlpretty.com/

2- https://www.samltool.com/prettyprint.php

3- https://dbfiddle.uk/_CcC8h3I

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 22.02.23 08:20, Peter Smith wrote:
> Here are some review comments for patch v15-0001
>
> FYI, the patch applies clean and tests OK for me.
>
> ======
> doc/src/sgml/datatype.sgml
>
> 1.
> XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
> AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )
>
> ~
>
> Another/shorter way to write that syntax is like below. For me, it is
> easier to read. YMMV.
>
> XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
Indeed simpler to read.
> ======
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>    {
>    Datum    *argvalue = op->d.xmlexpr.argvalue;
>    bool    *argnull = op->d.xmlexpr.argnull;
> -
> + bool    indent = op->d.xmlexpr.xexpr->indent;
> + text    *data;
>    /* argument type is known to be xml */
>    Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.
> ~~~
>
> 3.
> +
> + data = xmltotext_with_xmloption(DatumGetXmlP(value),
> + xexpr->xmloption);
> + if(indent)
> + *op->resvalue = PointerGetDatum(xmlformat(data));
> + else
> + *op->resvalue = PointerGetDatum(data);
> +
>    }
>
> Unnecessary blank line at the end.
blank line removed.
> ======
> src/backend/utils/adt/xml.
>
> 4. xmlformat
>
> +#else
> + NO_XML_SUPPORT();
> +return 0;
> +#endif
>
> Wrong indentation (return 0) in the indentation function? ;-)

indentation corrected.

v16 attached.

Thanks a lot!

Jim


Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
Here are some review comments for patch v16-0001.

======
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>    {
>    Datum    *argvalue = op->d.xmlexpr.argvalue;
>    bool    *argnull = op->d.xmlexpr.argnull;
> -
> + bool    indent = op->d.xmlexpr.xexpr->indent;
> + text    *data;
>    /* argument type is known to be xml */
>    Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

======
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
 ?column?
----------
 t
(1 row)

test=# SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
 ?column?
----------
 t
(1 row)

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 22.02.23 23:45, Peter Smith wrote:
> src/backend/executor/execExprInterp.c
>> 2. ExecEvalXmlExpr
>>
>> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>>     {
>>     Datum    *argvalue = op->d.xmlexpr.argvalue;
>>     bool    *argnull = op->d.xmlexpr.argnull;
>> -
>> + bool    indent = op->d.xmlexpr.xexpr->indent;
>> + text    *data;
>>     /* argument type is known to be xml */
>>     Assert(list_length(xexpr->args) == 1);
>> Missing whitespace after the variable declarations
> Whitespace added.
>
> ~
>
> Oh, I meant something different to that fix. I meant there is a
> missing blank line after the last ('data') variable declaration.
I believe I see it now (it took me a while) :)
> ======
> Test code.
>
> I wondered if there ought to be a test that demonstrates explicitly
> saying NO INDENT will give the identical result to just omitting it.
>
> For example:
>
> test=# -- no indent is default
> test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
> x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
> '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
>   ?column?
> ----------
>   t
> (1 row)
>
> test=# SELECT xmlserialize(CONTENT '<foo><bar><val
> x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
> '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
>   ?column?
> ----------
>   t
> (1 row)

Actually NO INDENT just ignores this feature and doesn't call the 
function at all, so in this particular case the result sets will always 
be identical. But yes, I totally agree that a test case for that is also 
important.

v17 attached.

Thanks!

Best, Jim


Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
Here are my review comments for patch v17-0001.

======
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


e.g.

-- indent different encoding (returns UTF-8)
SELECT xmlserialize(DOCUMENT '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS
text INDENT);
SELECT xmlserialize(CONTENT  '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>' AS
text INDENT);
-- 'no indent' = not using 'no indent'
SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
SELECT xml '<abc/>' IS NOT DOCUMENT;
SELECT xml 'abc' IS NOT DOCUMENT;
SELECT '<>' IS NOT DOCUMENT;

~~

Apart from that, patch v17 LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 23.02.23 02:52, Peter Smith wrote:
> Here are my review comments for patch v17-0001.
>
> ======
> src/test/regress/sql/xml.sql
>
> The blank line(s) which previously separated the xmlserialize tests
> from the xml IS [NOT] DOCUMENT tests are now missing...

v18 adds a new line in the xml.sql file to separate the xmlserialize 
test cases from the rest.

Thanks!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Eisentraut
Date:
On 23.02.23 07:36, Jim Jones wrote:
> On 23.02.23 02:52, Peter Smith wrote:
>> Here are my review comments for patch v17-0001.
>>
>> ======
>> src/test/regress/sql/xml.sql
>>
>> The blank line(s) which previously separated the xmlserialize tests
>> from the xml IS [NOT] DOCUMENT tests are now missing...
> 
> v18 adds a new line in the xml.sql file to separate the xmlserialize 
> test cases from the rest.

In kwlist.h you have

     PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)

but you can actually make it BARE_LABEL, which is preferable.

More importantly, you need to add the new keyword to the 
bare_label_keyword production in gram.y.  I thought we had some tooling 
in the build system to catch this kind of omission, but it's apparently 
not working right now.

Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or 
maybe something like xmlserialize_indent()), so the association is clearer.




Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 23.02.23 08:51, Peter Eisentraut wrote:
> In kwlist.h you have
>
>     PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
>
> but you can actually make it BARE_LABEL, which is preferable.
>
> More importantly, you need to add the new keyword to the 
> bare_label_keyword production in gram.y.  I thought we had some 
> tooling in the build system to catch this kind of omission, but it's 
> apparently not working right now.
Entry in kwlist.h changed to BARE_LABEL.
>
> Elsewhere, let's rename the xmlformat() C function to xmlserialize() 
> (or maybe something like xmlserialize_indent()), so the association is 
> clearer.
>
xmlserialize_indent sounds much better and makes the association indeed 
clearer. Changed in v19.

v19 attached.

Thanks for the review!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Peter Smith
Date:
The patch v19 LGTM.

- v19 applies cleanly for me
- Full clean build OK
- HTML docs build and render OK
- The 'make check' tests all pass for me
- Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/

So, I marked it a "Ready for Committer" in the CF --
https://commitfest.postgresql.org/42/4162/

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT '<value>one</value>'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in.  I'm not sure how much
that's worth though.  The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 09.03.23 18:38, Tom Lane wrote:
> While reviewing this patch, I started to wonder why we don't eliminate
> the maintenance hassle of xml_1.out by putting in a short-circuit
> at the top of the test, similar to those in some other scripts:
>
> /* skip test if XML support not compiled in */
> SELECT '<value>one</value>'::xml;
> \if :ERROR
> \quit
> \endif
>
> (and I guess xmlmap.sql could get the same treatment).
>
> The only argument I can think of against it is that the current
> approach ensures we produce a clean error (and not, say, a crash)
> for all xml.c entry points not just xml_in.  I'm not sure how much
> that's worth though.  The compiler/linker would tell us if we miss
> compiling out every reference to libxml2.
>
> Thoughts?
>
>             regards, tom lane

Hi Tom,

I agree it would make things easier and it could indeed save some time 
(and some CI runs ;)).

However, checking in the absence of libxml2 if an error message is 
raised, and checking if this error message is the one we expect, is IMHO 
also a very nice test. But I guess I could also live with skipping the 
whole thing.

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Peter Smith <smithpb2250@gmail.com> writes:
> The patch v19 LGTM.

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
 ?column?
----------
 f
(1 row)

regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
               xmlserialize
-------------------------------------------
 <bar><val x="y">42</val></bar><foo></foo>
(1 row)

regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
ERROR:  invalid XML document
DETAIL:  line 1: Extra content at the end of the document
<bar><val x="y">42</val></bar><foo></foo>
                              ^

This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT.  I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case.  But libxml2 has a few other "dump"
functions, so maybe we can use a different one?  I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not.  I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case.  We have code that can strip off the header again, but we
need to figure out exactly when to apply it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding.  If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20.  This doesn't address any of the above, however.

            regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml '<foo>bar</foo>'
     <type>xml</type>, uses the function
     <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
 <synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> [ [NO]
INDENT] ) 
 </synopsis>
     <replaceable>type</replaceable> can be
     <type>character</type>, <type>character varying</type>, or
     <type>text</type> (or an alias for one of those).  Again, according
     to the SQL standard, this is the only way to convert between type
     <type>xml</type> and character types, but PostgreSQL also allows
-    you to simply cast the value.
+    you to simply cast the value. The option <type>INDENT</type> allows to
+    indent the serialized xml output - the default is <type>NO INDENT</type>.
+    It is designed to indent XML strings of type <type>DOCUMENT</type>, but it can also
+   be used with <type>CONTENT</type> as long as <replaceable>value</replaceable>
+   contains a well-formed XML.
    </para>

    <para>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 0fb9ab7533..bb4c135a7f 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -621,7 +621,7 @@ X061    XMLParse: character string input and DOCUMENT option            YES
 X065    XMLParse: binary string input and CONTENT option            NO
 X066    XMLParse: binary string input and DOCUMENT option            NO
 X068    XMLSerialize: BOM            NO
-X069    XMLSerialize: INDENT            NO
+X069    XMLSerialize: INDENT            YES
 X070    XMLSerialize: character string serialization and CONTENT option            YES
 X071    XMLSerialize: character string serialization and DOCUMENT option            YES
 X072    XMLSerialize: character string serialization            YES
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..3dcd15d5f0 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
             {
                 Datum       *argvalue = op->d.xmlexpr.argvalue;
                 bool       *argnull = op->d.xmlexpr.argnull;
+                text       *result;

                 /* argument type is known to be xml */
                 Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,12 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
                     return;
                 value = argvalue[0];

-                *op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-                                                                         xexpr->xmloption));
+                result = xmltotext_with_xmloption(DatumGetXmlP(value),
+                                                  xexpr->xmloption);
+                if (xexpr->indent)
+                    result = xmlserialize_indent(result);
+
+                *op->resvalue = PointerGetDatum(result);
                 *op->resnull = false;
             }
             break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..efe88ccf9d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -613,7 +613,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <node>    xml_root_version opt_xml_root_standalone
 %type <node>    xmlexists_argument
 %type <ival>    document_or_content
-%type <boolean> xml_whitespace_option
+%type <boolean>    xml_indent_option xml_whitespace_option
 %type <list>    xmltable_column_list xmltable_column_option_list
 %type <node>    xmltable_column_el
 %type <defelt>    xmltable_column_option_el
@@ -702,7 +702,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
     HANDLER HAVING HEADER_P HOLD HOUR_P

     IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
-    INCLUDING INCREMENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
+    INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
     INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
     INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION

@@ -15532,13 +15532,14 @@ func_expr_common_subexpr:
                     $$ = makeXmlExpr(IS_XMLROOT, NULL, NIL,
                                      list_make3($3, $5, $6), @1);
                 }
-            | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename ')'
+            | XMLSERIALIZE '(' document_or_content a_expr AS SimpleTypename xml_indent_option ')'
                 {
                     XmlSerialize *n = makeNode(XmlSerialize);

                     n->xmloption = $3;
                     n->expr = $4;
                     n->typeName = $6;
+                    n->indent = $7;
                     n->location = @1;
                     $$ = (Node *) n;
                 }
@@ -15592,6 +15593,11 @@ document_or_content: DOCUMENT_P                        { $$ = XMLOPTION_DOCUMENT; }
             | CONTENT_P                                { $$ = XMLOPTION_CONTENT; }
         ;

+xml_indent_option: INDENT                            { $$ = true; }
+            | NO INDENT                                { $$ = false; }
+            | /*EMPTY*/                                { $$ = false; }
+        ;
+
 xml_whitespace_option: PRESERVE WHITESPACE_P        { $$ = true; }
             | STRIP_P WHITESPACE_P                    { $$ = false; }
             | /*EMPTY*/                                { $$ = false; }
@@ -16828,6 +16834,7 @@ unreserved_keyword:
             | INCLUDE
             | INCLUDING
             | INCREMENT
+            | INDENT
             | INDEX
             | INDEXES
             | INHERIT
@@ -17384,6 +17391,7 @@ bare_label_keyword:
             | INCLUDE
             | INCLUDING
             | INCREMENT
+            | INDENT
             | INDEX
             | INDEXES
             | INHERIT
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 78221d2e0f..2331417552 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -2331,6 +2331,7 @@ transformXmlSerialize(ParseState *pstate, XmlSerialize *xs)
     typenameTypeIdAndMod(pstate, xs->typeName, &targetType, &targetTypmod);

     xexpr->xmloption = xs->xmloption;
+    xexpr->indent = xs->indent;
     xexpr->location = xs->location;
     /* We actually only need these to be able to parse back the expression. */
     xexpr->type = targetType;
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..4d2549ed03 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -631,6 +631,39 @@ xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg)
 }


+text *
+xmlserialize_indent(text *data)
+{
+#ifdef USE_LIBXML
+    text       *result;
+    xmlDocPtr    doc;
+    xmlChar    *xmlbuf;
+    int            nbytes;
+
+    doc = xml_parse(data, XMLOPTION_DOCUMENT, false,
+                    GetDatabaseEncoding(), NULL);
+    Assert(doc);
+
+    /* Reformat with indenting requested */
+    xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+    xmlFreeDoc(doc);
+
+    if (!nbytes)
+        elog(ERROR, "could not indent the given XML document");
+
+    result = cstring_to_text_with_len((const char *) xmlbuf, nbytes);
+
+    xmlFree(xmlbuf);
+
+    return result;
+#else
+    NO_XML_SUPPORT();
+    return NULL;
+#endif
+}
+
+
 xmltype *
 xmlelement(XmlExpr *xexpr,
            Datum *named_argvalue, bool *named_argnull,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 371aa0ffc5..028588fb33 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -840,6 +840,7 @@ typedef struct XmlSerialize
     XmlOptionType xmloption;    /* DOCUMENT or CONTENT */
     Node       *expr;
     TypeName   *typeName;
+    bool        indent;            /* [NO] INDENT */
     int            location;        /* token location, or -1 if unknown */
 } XmlSerialize;

diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 4220c63ab7..8fb5b4b919 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1464,7 +1464,7 @@ typedef enum XmlExprOp
     IS_XMLPARSE,                /* XMLPARSE(text, is_doc, preserve_ws) */
     IS_XMLPI,                    /* XMLPI(name [, args]) */
     IS_XMLROOT,                    /* XMLROOT(xml, version, standalone) */
-    IS_XMLSERIALIZE,            /* XMLSERIALIZE(is_document, xmlval) */
+    IS_XMLSERIALIZE,            /* XMLSERIALIZE(is_document, xmlval, indent) */
     IS_DOCUMENT                    /* xmlval IS DOCUMENT */
 } XmlExprOp;

@@ -1489,6 +1489,8 @@ typedef struct XmlExpr
     List       *args;
     /* DOCUMENT or CONTENT */
     XmlOptionType xmloption pg_node_attr(query_jumble_ignore);
+    /* INDENT option for XMLSERIALIZE */
+    bool        indent;
     /* target type/typmod for XMLSERIALIZE */
     Oid            type pg_node_attr(query_jumble_ignore);
     int32        typmod pg_node_attr(query_jumble_ignore);
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index bb36213e6f..753e9ee174 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -205,6 +205,7 @@ PG_KEYWORD("in", IN_P, RESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("include", INCLUDE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("including", INCLUDING, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("increment", INCREMENT, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/utils/xml.h b/src/include/utils/xml.h
index 311da06cd6..a1dfe4c631 100644
--- a/src/include/utils/xml.h
+++ b/src/include/utils/xml.h
@@ -78,6 +78,7 @@ extern xmltype *xmlpi(const char *target, text *arg, bool arg_is_null, bool *res
 extern xmltype *xmlroot(xmltype *data, text *version, int standalone);
 extern bool xml_is_document(xmltype *arg);
 extern text *xmltotext_with_xmloption(xmltype *data, XmlOptionType xmloption_arg);
+extern text *xmlserialize_indent(text *data);
 extern char *escape_xml(const char *str);

 extern char *map_sql_identifier_to_xml_name(const char *ident, bool fully_escaped, bool escape_period);
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index ad852dc2f7..ddbf0ca16b 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -486,6 +486,112 @@ SELECT xmlserialize(content 'good' as char(10));

 SELECT xmlserialize(document 'bad' as text);
 ERROR:  not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val x="y">42</val>               +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val x="y">42</val>               +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+               xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+               xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '' AS text INDENT);
+ERROR:  invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT '  ' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '  ' AS text INDENT);
+ERROR:  invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val>42</val>                     +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val>42</val>                     +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ ?column?
+----------
+ t
+(1 row)
+
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
  ?column?
 ----------
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index 70fe34a04f..2944f84103 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -309,6 +309,80 @@ ERROR:  unsupported XML feature
 LINE 1: SELECT xmlserialize(document 'bad' as text);
                                      ^
 DETAIL:  This functionality requires the server to be built with libxml support.
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val><...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val><...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT  '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT  '' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+-- whitespaces
+SELECT xmlserialize(DOCUMENT '  ' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+SELECT xmlserialize(CONTENT  '  ' AS text INDENT);
+ERROR:  unsupported XML feature at character 30
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT  '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT  '<?xml version="1.0" encoding="...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val><...
+                                     ^
+DETAIL:  This functionality requires the server to be built with libxml support.
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ERROR:  unsupported XML feature
+LINE 1: SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></...
+                                    ^
+DETAIL:  This functionality requires the server to be built with libxml support.
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
 ERROR:  unsupported XML feature
 LINE 1: SELECT xml '<foo>bar</foo>' IS DOCUMENT;
diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out
index 4f029d0072..60dcb3d36a 100644
--- a/src/test/regress/expected/xml_2.out
+++ b/src/test/regress/expected/xml_2.out
@@ -466,6 +466,112 @@ SELECT xmlserialize(content 'good' as char(10));

 SELECT xmlserialize(document 'bad' as text);
 ERROR:  not an XML document
+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val x="y">42</val>               +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val x="y">42</val>               +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+               xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+               xmlserialize
+-------------------------------------------
+ <foo><bar><val x="y">42</val></bar></foo>
+(1 row)
+
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+ERROR:  invalid XML document
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '' AS text INDENT);
+ERROR:  invalid XML document
+-- whitespaces
+SELECT xmlserialize(DOCUMENT '  ' AS text INDENT);
+ERROR:  not an XML document
+SELECT xmlserialize(CONTENT  '  ' AS text INDENT);
+ERROR:  invalid XML document
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  NULL AS text INDENT);
+ xmlserialize
+--------------
+
+(1 row)
+
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val>42</val>                     +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+SELECT xmlserialize(CONTENT  '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+              xmlserialize
+----------------------------------------
+ <?xml version="1.0" encoding="UTF-8"?>+
+ <foo>                                 +
+   <bar>                               +
+     <val>42</val>                     +
+   </bar>                              +
+ </foo>                                +
+
+(1 row)
+
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+ ?column?
+----------
+ t
+(1 row)
+
 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
  ?column?
 ----------
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 24e40d2653..fea875adfd 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -132,6 +132,32 @@ SELECT xmlserialize(content data as character varying(20)) FROM xmltest;
 SELECT xmlserialize(content 'good' as char(10));
 SELECT xmlserialize(document 'bad' as text);

+-- indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text INDENT);
+-- no indent
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+SELECT xmlserialize(CONTENT  '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
+\set VERBOSITY terse
+-- indent malformed xml
+SELECT xmlserialize(DOCUMENT '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+SELECT xmlserialize(CONTENT  '<foo></foo><bar><val x="y">42</val></bar>' AS text INDENT);
+-- indent empty string
+SELECT xmlserialize(DOCUMENT '' AS text INDENT);
+SELECT xmlserialize(CONTENT  '' AS text INDENT);
+-- whitespaces
+SELECT xmlserialize(DOCUMENT '  ' AS text INDENT);
+SELECT xmlserialize(CONTENT  '  ' AS text INDENT);
+\set VERBOSITY default
+-- indent null
+SELECT xmlserialize(DOCUMENT NULL AS text INDENT);
+SELECT xmlserialize(CONTENT  NULL AS text INDENT);
+-- indent different encoding (returns UTF-8)
+SELECT xmlserialize(DOCUMENT '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+SELECT xmlserialize(CONTENT  '<?xml version="1.0" encoding="ISO-8859-1"?><foo><bar><val>42</val></bar></foo>'
AStext INDENT); 
+-- 'no indent' = not using 'no indent'
+SELECT xmlserialize(DOCUMENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 
+SELECT xmlserialize(CONTENT '<foo><bar><val x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><valx="y">42</val></bar></foo>' AS text NO INDENT); 

 SELECT xml '<foo>bar</foo>' IS DOCUMENT;
 SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
Thanks a lot for the review!

On 09.03.23 21:21, Tom Lane wrote:
> I've looked through this now, and have some minor complaints and a major
> one.  The major one is that it doesn't work for XML that doesn't satisfy
> IS DOCUMENT.  For example,
>
> regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
>   ?column?
> ----------
>   f
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
>                 xmlserialize
> -------------------------------------------
>   <bar><val x="y">42</val></bar><foo></foo>
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
> ERROR:  invalid XML document
> DETAIL:  line 1: Extra content at the end of the document
> <bar><val x="y">42</val></bar><foo></foo>
>                                ^

I assumed it should fail because the XML string doesn't have a 
singly-rooted XML. Oracle has this feature implemented and it does not 
seem to allow non singly-rooted strings either[1]. Also, some the tools 
I use also fail in this case[2,3]

How do you suggest the output should look like? Does the SQL spec also 
define it? I can't find it online :(

> This is not what the documentation promises, and I don't think it's
> good enough --- the SQL spec has no restriction saying you can't
> use INDENT with CONTENT.  I tried adjusting things so that we call
> xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
> but all that got me was empty output (except for a document header).
> It seems like xmlDocDumpFormatMemory is not the thing to use, at least
> not in the CONTENT case.  But libxml2 has a few other "dump"
> functions, so maybe we can use a different one?  I see we are using
> xmlNodeDump elsewhere, and that has a format option, so maybe there's
> a way forward there.
>
> A lesser issue is that INDENT tacks on a document header (XML declaration)
> whether there was one or not.  I'm not sure whether that's an appropriate
> thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
> case.  We have code that can strip off the header again, but we
> need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to 
option XML_SAVE_NO_DECL for input docs with XML declaration. It no 
longer returns a XML declaration if the input doc does not contain it.
> I also suspect that it's outright broken to attach a header claiming
> the data is now in UTF8 encoding.  If the database encoding isn't
> UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
uses the encoding of the given doc and UTF8 if not provided.
> Another thing that's mildly irking me is that the current
> factorization of this code will result in xml_parse'ing the data
> twice, if you have both DOCUMENT and INDENT specified.  We could
> consider avoiding that if we merged the indentation functionality
> into xmltotext_with_xmloption, but it's probably premature to do so
> when we haven't figured out how to get the output right --- we might
> end up needing two xml_parse calls anyway with different parameters,
> perhaps.
>
> I also had a bunch of cosmetic complaints (mostly around this having
> a bad case of add-at-the-end-itis), which I've cleaned up in the
> attached v20.  This doesn't address any of the above, however.
I swear to god I have no idea what "add-at-the-end-itis" means :)
>             regards, tom lane

Thanks a lot!

Best, Jim

1 - https://dbfiddle.uk/WUOWtjBU

2 - https://www.samltool.com/prettyprint.php

3 - https://xmlpretty.com/xmlpretty

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:
> On 09.03.23 21:21, Tom Lane wrote:
>> I've looked through this now, and have some minor complaints and a major
>> one.  The major one is that it doesn't work for XML that doesn't satisfy
>> IS DOCUMENT.  For example,

> How do you suggest the output should look like?

I'd say a series of node trees, each starting on a separate line.

>> I also suspect that it's outright broken to attach a header claiming
>> the data is now in UTF8 encoding.  If the database encoding isn't
>> UTF8, then either that's a lie or we now have an encoding violation.

> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
> uses the encoding of the given doc and UTF8 if not provided.

Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 10.03.23 15:32, Tom Lane wrote:
Jim Jones <jim.jones@uni-muenster.de> writes:
On 09.03.23 21:21, Tom Lane wrote:
I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,
How do you suggest the output should look like?
I'd say a series of node trees, each starting on a separate line.

v22 attached enables the usage of INDENT with non singly-rooted xml.

postgres=# SELECT xmlserialize (CONTENT '<bar><val x="y">42</val></bar><foo>73</foo>' AS text INDENT);
     xmlserialize      
-----------------------
 <bar>                +
   <val x="y">42</val>+
 </bar>               +
 <foo>73</foo>
(1 row)

I tried several libxml2 dump functions and none of them could cope very well with an xml string without a root node. So added them into a temporary root node, so that I could iterate over its children and add them one by one (formatted) into the output buffer.

I slightly modified the existing xml_parse() function to return the list of nodes parsed by xmlParseBalancedChunkMemory:

xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
          int encoding, Node *escontext, xmlNodePtr *parsed_nodes)
      
res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
                                                       utf8string + count, parsed_nodes);

I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
uses the encoding of the given doc and UTF8 if not provided.
Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();

Thanks!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 09.03.23 21:21, Tom Lane wrote:
> Peter Smith <smithpb2250@gmail.com> writes:
>> The patch v19 LGTM.
> Another thing that's mildly irking me is that the current
> factorization of this code will result in xml_parse'ing the data
> twice, if you have both DOCUMENT and INDENT specified.  We could
> consider avoiding that if we merged the indentation functionality
> into xmltotext_with_xmloption, but it's probably premature to do so
> when we haven't figured out how to get the output right --- we might
> end up needing two xml_parse calls anyway with different parameters,
> perhaps.

Just a thought: since xmlserialize_indent also calls xml_parse() to 
build the xmlDocPtr, couldn't we simply bypass 
xmltotext_with_xmloption() in case of INDENT is specified?

Something like this:

diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index 19351fe..ea808dd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
         {
                 Datum      *argvalue = op->d.xmlexpr.argvalue;
                 bool       *argnull = op->d.xmlexpr.argnull;
+                               text       *result;

                 /* argument type is known to be xml */
                 Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
                         return;
                 value = argvalue[0];

-                               *op->resvalue = 
PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+                               if (xexpr->indent)
+                                       result = 
xmlserialize_indent(DatumGetXmlP(value),
+ xexpr->xmloption);
+                               else
+                                       result = 
xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+
+                               *op->resvalue = PointerGetDatum(result);
                 *op->resnull = false;
         }

         break;





Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:
> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
 xmlserialize 
--------------
 <!DOCTYPE a>+
 <a></a>     +
 
(1 row)

regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
 xmlserialize 
--------------
 
(1 row)

The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway.  I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess.  We could pass back more info so that xmlserialize_indent
knows what really happened.  However, that won't fix the bogus output
for the DOCUMENT case.  Are we perhaps passing incorrect flags to
xmlSaveToBuffer?

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 14.03.23 18:40, Tom Lane wrote:
> Jim Jones <jim.jones@uni-muenster.de> writes:
>> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]
> I poked at this for awhile and ran into a problem that I'm not sure
> how to solve: it misbehaves for input with embedded DOCTYPE.
>
> regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
>   xmlserialize
> --------------
>   <!DOCTYPE a>+
>   <a></a>     +
>   
> (1 row)

The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements 
to be serialized with start-end tag pairs. Removing it did the trick ...

postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
  xmlserialize
--------------
  <!DOCTYPE a>+
  <a/>        +

(1 row)

... but as a side effect empty start-end tags will be now serialized as 
empty elements

postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text 
INDENT);
  xmlserialize
--------------
  <foo>       +
    <bar/>    +
  </foo>
(1 row)

It seems to be the standard behavior of other xml indent tools 
(including Oracle)

> regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
>   xmlserialize
> --------------
>   
> (1 row)
>
> The bad result for CONTENT is because xml_parse() decides to
> parse_as_document, but xmlserialize_indent has no idea that happened
> and tries to use the content_nodes list anyway.  I don't especially
> care for the laissez faire "maybe we'll set *content_nodes and maybe
> we won't" API you adopted for xml_parse, which seems to be contributing
> to the mess.  We could pass back more info so that xmlserialize_indent
> knows what really happened.

I added a new (nullable) parameter to the xml_parse function that will 
return the actual XmlOptionType used to parse the xml data. Now 
xmlserialize_indent knows how the data was really parsed:

postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
  xmlserialize
--------------
  <!DOCTYPE a>+
  <a/>        +

(1 row)

I added test cases for these queries.

v23 attached.

Thanks!

Best, Jim

Attachment

Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
Jim Jones <jim.jones@uni-muenster.de> writes:
> On 14.03.23 18:40, Tom Lane wrote:
>> I poked at this for awhile and ran into a problem that I'm not sure
>> how to solve: it misbehaves for input with embedded DOCTYPE.

> The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements 
> to be serialized with start-end tag pairs. Removing it did the trick ...
> ... but as a side effect empty start-end tags will be now serialized as 
> empty elements

> postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text 
> INDENT);
>   xmlserialize
> --------------
>   <foo>       +
>     <bar/>    +
>   </foo>
> (1 row)

Huh, interesting.  That is a legitimate pretty-fication of the input,
I suppose, but some people might think it goes beyond the charter of
"indentation".  I'm okay with it personally; anyone want to object?

            regards, tom lane



Re: [PATCH] Add pretty-printed XML output option

From
Tom Lane
Date:
I wrote:
> Huh, interesting.  That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation".  I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
  t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad.  That problem is not the fault of this patch,
I don't think.  I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

            regards, tom lane



I wrote:
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like

> do $$
> declare x xml; t text;
> begin
> x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
> for i in 1..10000000 loop
>   t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;

> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad.  That problem is not the fault of this patch,
> I don't think.  I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).

So maybe it's worth working around that, or maybe it isn't.

            regards, tom lane



Re: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)

From
Daniel Gustafsson
Date:
> On 15 Mar 2023, at 22:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> After a bit of further testing: the leak is present in libxml2 2.9.7
> which is what I have on this RHEL8 box, but it seems not to occur
> in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
> isn't carrying any relevant local patch).
>
> So maybe it's worth working around that, or maybe it isn't.

2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on
when in that timespan the issue was fixed it might be in a release which will
be with us for quite some time.  The lack of reports (that I was able to find)
indicate that it might be rare in production though?

--
Daniel Gustafsson




Re: [PATCH] Add pretty-printed XML output option

From
Jim Jones
Date:
On 15.03.23 22:13, Tom Lane wrote:
> I wrote:
> It occurred to me to test v23 for memory leaks, and it had bad ones:
> * the "newline" node used in the CONTENT case never got freed.
> Making another one for each line wasn't helping, either.
Oh, I did really miss that one. Thanks!
> Pushed with those changes and some other cosmetic cleanup.
> Thanks for working so hard on this!
Great! Thank you, Peter and Andrey for the very nice reviews.
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like
>
> do $$
> declare x xml; t text;
> begin
> x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
> for i in 1..10000000 loop
>    t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;
>
> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad.  That problem is not the fault of this patch,
> I don't think.  I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't 
reproduce this memory leak. It's been most likely fixed in further 
libxml2 versions. Unfortunately their gitlab page has no release notes 
from versions prior to 2.9.13 :(

Best, Jim