Thread: XMLDocument (SQL/XML X030)
Hi, I'd like to propose the implementation of XMLDocument (SQL/XML X030).It basically returns an XML document from a given XML expression, e.g. SELECT xmldocument( xmlelement(NAME foo, xmlattributes(42 AS att), xmlelement(NAME bar, xmlconcat('va', 'lue')) ) ); xmldocument -------------------------------------- <foo att="42"><bar>value</bar></foo> (1 row) XMLDocument doesn't do much. In fact, it might be reduced to a simple xmlparse() call as XMLOPTION_DOCUMENT... xmlparse(data, XMLOPTION_DOCUMENT, true) ... to make sure that the given XML expression is a valid document - still need some more research there. One could argue that XMLDocument() is in most cases unnecessary, but I believe it would facilitate the migration of scripts from other database products. Any thoughts? Best, Jim
On 2024-12-10 Tu 2:48 AM, Jim Jones wrote: > On 04.12.24 17:18, Jim Jones wrote: >> I'd like to propose the implementation of XMLDocument (SQL/XML X030). >> It basically returns an XML document from a given XML expression, e.g. >> >> SELECT >> xmldocument( >> xmlelement(NAME foo, >> xmlattributes(42 AS att), >> xmlelement(NAME bar, >> xmlconcat('va', 'lue')) >> ) >> ); >> >> xmldocument >> -------------------------------------- >> <foo att="42"><bar>value</bar></foo> >> (1 row) > v1 attached attempts to implement XMLDocument() as described above. > > Feedback welcome. > LGTM at a first glance. Please add this to the next CommitFest if you haven't done already. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi Andrew On 10.12.24 14:59, Andrew Dunstan wrote: > LGTM at a first glance. > > > Please add this to the next CommitFest if you haven't done already. Thanks! This is the CF entry: https://commitfest.postgresql.org/51/5431/ Best, Jim
pá 24. 1. 2025 v 23:11 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 24.01.25 22:01, Chapman Flack wrote:
> It seems to me the key connection there is that the ISO SQL standard
> defines XMLDOCUMENT by equivalence to what `document { $expr }` means
> in the W3 XML Query standard.
It seems I missed one sentence. My bad.
Next try... :)
The <function>xmldocument</function> function returns the input argument
unchanged, or <literal>NULL</literal> if the argument is
<literal>NULL</literal>, and is provided for compatibility.
The SQL-standard <replaceable>XMLDocument</replaceable> function applied
to an XML value <literal>$EXPR</literal>, has effects equivalent to the
XML Query expression <replaceable>document { $EXPR }</replaceable>. It
replaces any document nodes in the input with their children and wraps
the whole result in a single <replaceable>document node</replaceable>.
In the XML Query standard, a <replaceable>document node</replaceable>
represents a relaxed version of an XML document structure. This
corresponds to what PostgreSQL's single XML type allows, meaning that
any valid non-null PostgreSQL XML value can be returned unchanged. Other
systems may support more permissive XML data types, such as
<literal>XML(SEQUENCE)</literal>, which allow values that do not conform
to this structure. In PostgreSQL, every valid non-null value of the XML
type already has that structure, making additional processing by this
function unnecessary.
v6 attached.
I think so doc is ok now
because the function does nothing, then it is useless to convert input to XML and force detosting
Maybe the body of the function should be just
{
#ifdef USE_LIBXML
PG_RETURN_DATUM(PG_GETARG_DATUM(0));
#else
NO_XML_SUPPORT();
return 0;
#endif
}
Regards
Pavel
Thanks.
Best regards, Jim
so 25. 1. 2025 v 9:10 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 25.01.25 08:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input
> to XML and force detosting
Right. Fixed in v7 attached.
I don't see
+Datum
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+
+xmldocument(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+ PG_RETURN_XML_P(PG_GETARG_XML_P(0));
+#else
+ NO_XML_SUPPORT();
+ return 0;
+#endif /* not USE_LIBXML */
+}
+
you still forces detoasting (inside PG_GETARG_XML_P)
Thanks
Best, Jim
so 25. 1. 2025 v 12:45 odesílatel Jim Jones <jim.jones@uni-muenster.de> napsal:
On 25.01.25 11:50, Pavel Stehule wrote:
> you still forces detoasting (inside PG_GETARG_XML_P)
Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P
Fixed in v8.
now it is ok.
I'll mark this patch as ready for committer
Regards
Pavel
Thanks!
Best, Jim
On 01/25/25 02:16, Pavel Stehule wrote: > because the function does nothing, then it is useless to convert input to > XML and force detosting > > Maybe the body of the function should be just > ... > PG_RETURN_DATUM(PG_GETARG_DATUM(0)); That sort of motivated my question upthread about whether there is already a function somewhere in the codebase that does exactly that and could be named in the pg_proc entry for xmldocument, as an alternative to supplying a brand-new one. Maybe this is the only instance where it turns out that 'identity' is the right behavior for a function. But if it could conceivably happen again, a single C function (maybe even named identity) could reduce code duplication and make quite clear what the behavior is with a \sf. A generic 'identity' function would be lacking the #ifdef USE_LIBXML and the error message, but I'm not convinced those matter here anyway. Without XML support, you'll already have raised that error in any attempt to construct a non-null XML argument to pass, and if you're passing NULL and the function is strict, you'll never see the error message from here anyway. Regards, -Chap
so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal:
On 01/25/25 02:16, Pavel Stehule wrote:
> because the function does nothing, then it is useless to convert input to
> XML and force detosting
>
> Maybe the body of the function should be just
> ...
> PG_RETURN_DATUM(PG_GETARG_DATUM(0));
That sort of motivated my question upthread about whether there is
already a function somewhere in the codebase that does exactly that
and could be named in the pg_proc entry for xmldocument, as an
alternative to supplying a brand-new one.
this should not be a problem, because the already created function XMLDOCUMENT surely will not be in the pg_catalog schema.
Maybe this is the only instance where it turns out that 'identity'
is the right behavior for a function. But if it could conceivably
happen again, a single C function (maybe even named identity) could
reduce code duplication and make quite clear what the behavior is
with a \sf.
I didn't find any function like this.
Regards
Pavel
A generic 'identity' function would be lacking the #ifdef USE_LIBXML
and the error message, but I'm not convinced those matter here
anyway. Without XML support, you'll already have raised that error
in any attempt to construct a non-null XML argument to pass, and if
you're passing NULL and the function is strict, you'll never see
the error message from here anyway.
Regards,
-Chap
Hi Chapman & Pavel On 25.01.25 16:05, Pavel Stehule wrote: > > > so 25. 1. 2025 v 15:10 odesílatel Chapman Flack <jcflack@acm.org> napsal: > > On 01/25/25 02:16, Pavel Stehule wrote: > > because the function does nothing, then it is useless to convert > input to > > XML and force detosting > > > > Maybe the body of the function should be just > > ... > > PG_RETURN_DATUM(PG_GETARG_DATUM(0)); > > That sort of motivated my question upthread about whether there is > already a function somewhere in the codebase that does exactly that > and could be named in the pg_proc entry for xmldocument, as an > alternative to supplying a brand-new one. > > > this should not be a problem, because the already created function > XMLDOCUMENT surely will not be in the pg_catalog schema. > > > > Maybe this is the only instance where it turns out that 'identity' > is the right behavior for a function. But if it could conceivably > happen again, a single C function (maybe even named identity) could > reduce code duplication and make quite clear what the behavior is > with a \sf. > > > I didn't find any function like this. I also couldn't find any similar function. > > Regards > > Pavel > > > A generic 'identity' function would be lacking the #ifdef USE_LIBXML > and the error message, but I'm not convinced those matter here > anyway. Without XML support, you'll already have raised that error > in any attempt to construct a non-null XML argument to pass, and if > you're passing NULL and the function is strict, you'll never see > the error message from here anyway. > > Do you envision something like this? Datum identity(PG_FUNCTION_ARGS) { PG_RETURN_DATUM(PG_GETARG_DATUM(0)); } If so, where would be the right place to put it? Certainly not in xml.c Generally speaking, reducing redundancy by reusing existing functions is always a good thing. However, if we decide to create a generic function for this purpose, it's important to ensure its existence is clearly communicated to prevent others from writing their own -- something that, given the simplicity of this function, seems like a likely scenario. :) My point is: this function is so small that I’m not entirely sure it’s worth the effort to make it generic. But I'd be willing to give it a try if we agree on it. Thanks! Best regards, Jim
On Sat, Jan 25, 2025 at 6:45 AM Jim Jones <jim.jones@uni-muenster.de> wrote: > On 25.01.25 11:50, Pavel Stehule wrote: > > you still forces detoasting (inside PG_GETARG_XML_P) > > > Ah, ok .. I overlooked it. You meant _DATUM instead of _XML_P > > Fixed in v8. > Was playing around with the patch and was thinking about this wording: "The xmldocument function returns the input argument unchanged... and is provided for compatibility." When I run an example similar to the db2 example you gave earlier: pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y)); xmldocument -------------------- <x>10</x><y>20</y> In the db2 case, this is preserved as UPPER (which is to say, db2 case folds UPPER, and the input happens to match that), but we case fold lower, because we do; presumably you'd get the opposite effect in db2 running the input with unquoted lower field names(?). In any case (no pun intended), SQL folks probably don't care much about that discrepancy, but given xml is case sensitive, maybe xml people do? Robert Treat https://xzilla.net
Hi Robert On 28.01.25 05:54, Robert Treat wrote: > Was playing around with the patch and was thinking about this wording: > "The xmldocument function returns the input argument > unchanged... and is provided for compatibility." > > When I run an example similar to the db2 example you gave earlier: > > pagila=# SELECT xmldocument(xmlforest(10 as X, 20 as Y)); > xmldocument > -------------------- > <x>10</x><y>20</y> > > In the db2 case, this is preserved as UPPER (which is to say, db2 case > folds UPPER, and the input happens to match that), but we case fold > lower, because we do; presumably you'd get the opposite effect in db2 > running the input with unquoted lower field names(?). Yes. SELECT 42 AS foo FROM SYSIBM.SYSDUMMY1 FOO ----------- 42 1 record(s) selected. > In any case (no > pun intended), SQL folks probably don't care much about that > discrepancy, but given xml is case sensitive, maybe xml people do? That's a good point. DB2 converts unquoted identifiers to uppercase by default, which, if memory serves, aligns with the SQL standard. In the case of this xmlforest example, my guess is that DB2 treats the elements as identifiers and normalizes them to uppercase as well, as DB2 does not handle XML documents as text like we do. To preserve case, you'd need to explicitly quote the identifiers: SELECT xmlforest(10 AS "x", 20 AS "y") FROM SYSIBM.SYSDUMMY1 -------------------- <x>10</x><y>20</y> 1 record(s) selected. Things look different when constructing the xml document directly from a string: SELECT xmlparse(DOCUMENT '<root><foo>bar</foo></root>') FROM SYSIBM.SYSDUMMY1 ---------------------------- <root><foo>bar</foo></root> 1 record(s) selected. I'd say the difference is due to how the two systems handle the XML data type and unquoted identifiers in general, rather than a difference in the behaviour of the function itself. Sticking to quoted identifiers in both systems helps: SELECT xmlforest(42 AS "foo", 73 AS "bar"); xmlforest ---------------------------- <foo>42</foo><bar>73</bar> Probably that's why most DB2 examples in their documentation use quoted identifiers :) Best regards, Jim
On 01/28/25 03:14, Jim Jones wrote: > I'd say the difference is due to how the two systems handle the XML data > type and unquoted identifiers in general, rather than a difference in > the behaviour of the function itself. I'd go further and say it's entirely down to how the two systems handle unquoted identifiers. In neither case was there ever any XML value created with XML names in one case and then changed to the other. The SQL names were already in their (DB2- or PostgreSQL- specific) folded form by the first moment any XML library code ever saw them. The XML code handled them faithfully ever after, whether in serialized or in node-tree form. Presumably both DB2 and PostgreSQL users soon know in their sleep what their respective systems do to unquoted identifiers, and know that quoting is the way to control that when it matters. Regards, -Chap