Thread: XMLDocument (SQL/XML X030)

XMLDocument (SQL/XML X030)

From

Jim Jones

Date:

04 December 2024, 19:18:54

Hi,

I'd like to propose the implementation of XMLDocument (SQL/XML X030).It 
basically returns an XML document from a given XML expression, e.g.

SELECT
   xmldocument(
     xmlelement(NAME foo,
       xmlattributes(42 AS att),
       xmlelement(NAME bar,
         xmlconcat('va', 'lue'))
     )
   );

              xmldocument
--------------------------------------
  <foo att="42"><bar>value</bar></foo>
(1 row)

XMLDocument doesn't do much. In fact, it might be reduced to a simple 
xmlparse() call as XMLOPTION_DOCUMENT...

xmlparse(data, XMLOPTION_DOCUMENT, true)

... to make sure that the given XML expression is a valid document - 
still need some more research there. One could argue that XMLDocument() 
is in most cases unnecessary, but I believe it would facilitate the 
migration of scripts from other database products.

Any thoughts?

Best, Jim

Re: XMLDocument (SQL/XML X030)

From

Andrew Dunstan

Date:

10 December 2024, 16:59:11

On 2024-12-10 Tu 2:48 AM, Jim Jones wrote:
> On 04.12.24 17:18, Jim Jones wrote:
>> I'd like to propose the implementation of XMLDocument (SQL/XML X030).
>> It basically returns an XML document from a given XML expression, e.g.
>>
>> SELECT
>>    xmldocument(
>>      xmlelement(NAME foo,
>>        xmlattributes(42 AS att),
>>        xmlelement(NAME bar,
>>          xmlconcat('va', 'lue'))
>>      )
>>    );
>>
>>               xmldocument
>> --------------------------------------
>>   <foo att="42"><bar>value</bar></foo>
>> (1 row)
> v1 attached attempts to implement XMLDocument() as described above.
>
> Feedback welcome.
>

LGTM at a first glance.


Please add this to the next CommitFest if you haven't done already.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: XMLDocument (SQL/XML X030)

From

Jim Jones

Date:

10 December 2024, 17:14:41

Hi Andrew

On 10.12.24 14:59, Andrew Dunstan wrote:
> LGTM at a first glance.
>
>
> Please add this to the next CommitFest if you haven't done already. 

Thanks!

This is the CF entry: https://commitfest.postgresql.org/51/5431/

Best, Jim

Re: XMLDocument (SQL/XML X030)

From

Pavel Stehule

Date:

25 January, 10:16:34

pá 24. 1. 2025 v 23:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 24.01.25 22:01, Chapman Flack wrote:
> It seems to me the key connection there is that the ISO SQL standard
> defines XMLDOCUMENT by equivalence to what `document { $expr }` means
> in the W3 XML Query standard.

It seems I missed one sentence. My bad.

Next try... :)

The <function>xmldocument</function> function returns the input argument
unchanged, or <literal>NULL</literal> if the argument is
<literal>NULL</literal>, and is provided for compatibility.

The SQL-standard <replaceable>XMLDocument</replaceable> function applied
to an XML value <literal>$EXPR</literal>, has effects equivalent to the
XML Query expression <replaceable>document { $EXPR }</replaceable>. It
replaces any document nodes in the input with their children and wraps
the whole result in a single <replaceable>document node</replaceable>.

In the XML Query standard, a <replaceable>document node</replaceable>
represents a relaxed version of an XML document structure. This
corresponds to what PostgreSQL's single XML type allows, meaning that
any valid non-null PostgreSQL XML value can be returned unchanged. Other
systems may support more permissive XML data types, such as
<literal>XML(SEQUENCE)</literal>, which allow values that do not conform
to this structure. In PostgreSQL, every valid non-null value of the XML
type already has that structure, making additional processing by this
function unnecessary.

v6 attached.

I think so doc is ok now

because the function does nothing, then it is useless to convert input to XML and force detosting

Maybe the body of the function should be just

{

#ifdef USE_LIBXML

PG_RETURN_DATUM(PG_GETARG_DATUM(0));

#else

NO_XML_SUPPORT();

return 0;

#endif

}

Regards

Pavel

Thanks.

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

From

Pavel Stehule

Date:

25 January, 13:50:52

so 25. 1. 2025 v 9:10 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 25.01.25 08:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input
> to XML and force detosting

Right. Fixed in v7 attached.

I don't see

+Datum
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+

you still forces detoasting (inside PG_GETARG_XML_P)

Thanks

Best, Jim

Re: XMLDocument (SQL/XML X030)

From

Pavel Stehule

Date:

25 January, 16:23:00

so 25. 1. 2025 v 12:45 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:

On 25.01.25 11:50, Pavel Stehule wrote:
> you still forces detoasting (inside PG_GETARG_XML_P)

Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P

Fixed in v8.

now it is ok.

I'll mark this patch as ready for committer

Regards

Pavel

Thanks!

Best, Jim

Re: XMLDocument (SQL/XML X030)

From

Chapman Flack

Date:

25 January, 17:10:06

On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
> 
> Maybe the body of the function should be just
> ...
>   PG_RETURN_DATUM(PG_GETARG_DATUM(0));

That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.

Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.

A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

From

Pavel Stehule

Date:

25 January, 18:05:10

so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:

On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
>
> Maybe the body of the function should be just
> ...
> PG_RETURN_DATUM(PG_GETARG_DATUM(0));

That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.

this should not be a problem, because the already created function XMLDOCUMENT surely will not be in the pg_catalog schema.

Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.

I didn't find any function like this.

Regards

Pavel

A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.

Regards,
-Chap

Re: XMLDocument (SQL/XML X030)

From

Jim Jones

Date:

26 January, 16:48:03

Hi Chapman & Pavel

On 25.01.25 16:05, Pavel Stehule wrote:
>
>
> so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:
>
>     On 01/25/25 02:16, Pavel Stehule wrote:
>     > because the function does nothing, then it is useless to convert
>     input to
>     > XML and force detosting
>     >
>     > Maybe the body of the function should be just
>     > ...
>     >   PG_RETURN_DATUM(PG_GETARG_DATUM(0));
>
>     That sort of motivated my question upthread about whether there is
>     already a function somewhere in the codebase that does exactly that
>     and could be named in the pg_proc entry for xmldocument, as an
>     alternative to supplying a brand-new one.
>
>
> this should not be a problem, because the already created function
> XMLDOCUMENT surely will not be in the pg_catalog schema.
>  
>
>
>     Maybe this is the only instance where it turns out that 'identity'
>     is the right behavior for a function. But if it could conceivably
>     happen again, a single C function (maybe even named identity) could
>     reduce code duplication and make quite clear what the behavior is
>     with a \sf.
>
>
> I didn't find any function like this.


I also couldn't find any similar function.


>
> Regards
>
> Pavel
>
>
>     A generic 'identity' function would be lacking the #ifdef USE_LIBXML
>     and the error message, but I'm not convinced those matter here
>     anyway. Without XML support, you'll already have raised that error
>     in any attempt to construct a non-null XML argument to pass, and if
>     you're passing NULL and the function is strict, you'll never see
>     the error message from here anyway.
>
>

Do you envision something like this?

Datum
identity(PG_FUNCTION_ARGS)
{
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
}

If so, where would be the right place to put it? Certainly not in xml.c

Generally speaking, reducing redundancy by reusing existing functions is
always a good thing. However, if we decide to create a generic function
for this purpose, it's important to ensure its existence is clearly
communicated to prevent others from writing their own -- something that,
given the simplicity of this function, seems like a likely scenario. :)
My point is: this function is so small that I’m not entirely sure it’s
worth the effort to make it generic. But I'd be willing to give it a try
if we agree on it.

Thanks!

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

From

Robert Treat

Date:

28 January, 07:54:20

On Sat, Jan 25, 2025 at 6:45 AM Jim Jones <jim.jones@uni-muenster.de> wrote:
> On 25.01.25 11:50, Pavel Stehule wrote:
> > you still forces detoasting (inside PG_GETARG_XML_P)
>
>
> Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P
>
> Fixed in v8.
>

Was playing around with the patch and was thinking about this wording:
"The xmldocument function returns the input argument
 unchanged... and is provided for compatibility."

When I run an example similar to the db2 example you gave earlier:

pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
    xmldocument
--------------------
 <x>10</x><y>20</y>

In the db2 case, this is preserved as UPPER (which is to say, db2 case
folds UPPER, and the input happens to match that), but we case fold
lower, because we do; presumably you'd get the opposite effect in db2
running the input with unquoted lower field names(?). In any case (no
pun intended), SQL folks probably don't care much about that
discrepancy, but given xml is case sensitive, maybe xml people do?

Robert Treat
https://xzilla.net

Re: XMLDocument (SQL/XML X030)

From

Jim Jones

Date:

28 January, 11:14:48

Hi Robert

On 28.01.25 05:54, Robert Treat wrote:
> Was playing around with the patch and was thinking about this wording:
> "The xmldocument function returns the input argument
>  unchanged... and is provided for compatibility."
>
> When I run an example similar to the db2 example you gave earlier:
>
> pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y));
>     xmldocument
> --------------------
>  <x>10</x><y>20</y>
>
> In the db2 case, this is preserved as UPPER (which is to say, db2 case
> folds UPPER, and the input happens to match that), but we case fold
> lower, because we do; presumably you'd get the opposite effect in db2
> running the input with unquoted lower field names(?). 

Yes.

SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1

FOO        
-----------
         42

  1 record(s) selected.

> In any case (no
> pun intended), SQL folks probably don't care much about that
> discrepancy, but given xml is case sensitive, maybe xml people do?

That's a good point. DB2 converts unquoted identifiers to uppercase by
default, which, if memory serves, aligns with the SQL standard. In the
case of this xmlforest example, my guess is that DB2 treats the elements
as identifiers and normalizes them to uppercase as well, as DB2 does not
handle XML documents as text like we do. To preserve case, you'd need to
explicitly quote the identifiers:

SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1

--------------------
<x>10</x><y>20</y>

  1 record(s) selected.

Things look different when constructing the xml document directly from a
string:

SELECT xmlparse(DOCUMENT '<root><foo>bar</foo></root>') FROM
SYSIBM.SYSDUMMY1

----------------------------
<root><foo>bar</foo></root>

  1 record(s) selected.

I'd say the difference is due to how the two systems handle the XML data
type and unquoted identifiers in general, rather than a difference in
the behaviour of the function itself. Sticking to quoted identifiers in
both systems helps:

SELECT xmlforest(42 AS "foo", 73 AS "bar");
         xmlforest          
----------------------------

 <foo>42</foo><bar>73</bar>

Probably that's why most DB2 examples in their documentation use quoted
identifiers :)

Best regards, Jim

Re: XMLDocument (SQL/XML X030)

From

Chapman Flack

Date:

28 January, 17:54:51

On 01/28/25 03:14, Jim Jones wrote:
> I'd say the difference is due to how the two systems handle the XML data
> type and unquoted identifiers in general, rather than a difference in
> the behaviour of the function itself. 

I'd go further and say it's entirely down to how the two systems
handle unquoted identifiers. In neither case was there ever any
XML value created with XML names in one case and then changed to
the other. The SQL names were already in their (DB2- or PostgreSQL-
specific) folded form by the first moment any XML library code ever
saw them. The XML code handled them faithfully ever after, whether
in serialized or in node-tree form.

Presumably both DB2 and PostgreSQL users soon know in their sleep
what their respective systems do to unquoted identifiers, and know
that quoting is the way to control that when it matters.

Regards,
-Chap