Thread: [PoC] XMLCast (SQL/XML X025)

[PoC] XMLCast (SQL/XML X025)

From
Jim Jones
Date:
Hi,

This is a PoC that implements XMLCast (SQL/XML X025), which enables
conversions between SQL and XML data type.

It basically does the following:

* When casting an XML value to a SQL data type, XML values containing
XSD literals will be converted to their equivalent SQL data type.
* When casting from a SQL data type to XML, the cast operand will be
translated to its corresponding XSD data type.

SELECT xmlcast(now() AS xml);
             xmlcast              
----------------------------------
 2024-07-02T17:03:11.189073+02:00
(1 row)

SELECT xmlcast('2024-07-02T17:03:11.189073+02:00'::xml AS timestamp with
time zone);
            xmlcast            
-------------------------------
 2024-07-02 17:03:11.189073+02
(1 row)

SELECT xmlcast('P1Y2M3DT4H5M6S'::xml AS interval);
            xmlcast            
-------------------------------
 1 year 2 mons 3 days 04:05:06
(1 row)

SELECT xmlcast('<foo&bar>'::xml AS text);
  xmlcast  
-----------
 <foo&bar>
(1 row)

SELECT xmlcast('1 year 2 months 3 days 4 hours 5 minutes 6
seconds'::interval AS xml) ;
    xmlcast     
----------------
 P1Y2M3DT4H5M6S
(1 row)

SELECT xmlcast('42.73'::xml AS numeric);
 xmlcast
---------
   42.73
(1 row)

SELECT xmlcast(42730102030405 AS xml);
    xmlcast     
----------------
 42730102030405
(1 row)


Is it starting in the right direction? Any feedback would be much
appreciated.

Best,
Jim
Attachment

Re: [PoC] XMLCast (SQL/XML X025)

From
Jim Jones
Date:
On 02.07.24 18:02, Jim Jones wrote:
> It basically does the following:
>
> * When casting an XML value to a SQL data type, XML values containing
> XSD literals will be converted to their equivalent SQL data type.
> * When casting from a SQL data type to XML, the cast operand will be
> translated to its corresponding XSD data type.
>
v2 attached adds missing return for NO_XML_SUPPORT control path in
unescape_xml

-- 
Jim

Attachment

Re: [PoC] XMLCast (SQL/XML X025)

From
Robert Haas
Date:
On Sun, Nov 10, 2024 at 1:14 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
> rebase.

Hmm, this patch has gotten no responses for 4 months. That's kind of
unfortunate. Sadly, there's not a whole lot that I can do to better
the situation, because I know very little either about XML-related
standards or about how people make use of XML in practice. It's not
that much code, so if it does a useful thing that we actually want, we
can probably figure out how to verify that the code is correct, or fix
it. But I don't know whether it's a useful thing that we actually
want. Syntactically, XMLCAST() looks a lot like CAST(), so one might
ask whether the things that it does can already be accomplished using
CAST(); or whether, perhaps, we have some other existing method for
performing such conversions.

The only thing I found during a quick perusal of the documentation was
XMLTABLE(), which seems a bit baroque if you just want to convert one
value. Is this intended to plug that gap? Is there any other current
way of doing it?

Do we need to ensure some kind of consistency between XMLTABLE() and
XMLCAST() in terms of how they behave? The documentation at
https://www.postgresql.org/docs/current/xml-limits-conformance.html#FUNCTIONS-XML-LIMITS-CASTS
says that "When PostgreSQL maps SQL data values to XML (as in
xmlelement), or XML to SQL (as in the output columns of xmltable),
except for a few cases treated specially, PostgreSQL simply assumes
that the XML data type's XPath 1.0 string form will be valid as the
text-input form of the SQL datatype, and conversely." Unfortunately,
it does not specify what those cases treated specially are, and the
commit that added that documentation text is not the one that added
the underlying code, so I don't actually know where that code is, but
one would expect this function to conform to that general rule.

I emphasize again that if there are people other than the submitter
who are interested in this patch, they should really chime in. This
can't progress in a vacuum.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PoC] XMLCast (SQL/XML X025)

From
Jim Jones
Date:
Hi Robert
Thanks for taking a look at it.

On 11.11.24 19:15, Robert Haas wrote:
> Hmm, this patch has gotten no responses for 4 months. That's kind of
> unfortunate. Sadly, there's not a whole lot that I can do to better
> the situation, because I know very little either about XML-related
> standards or about how people make use of XML in practice. It's not
> that much code, so if it does a useful thing that we actually want, we
> can probably figure out how to verify that the code is correct, or fix
> it. But I don't know whether it's a useful thing that we actually
> want. Syntactically, XMLCAST() looks a lot like CAST(), so one might
> ask whether the things that it does can already be accomplished using
> CAST(); or whether, perhaps, we have some other existing method for
> performing such conversions.
It indeed has a huge overlap with CAST(), except for a few handy SQL <->
XML mappings, such as

SELECT xmlcast('foo & <"bar">'::xml AS text);

    xmlcast    
---------------
 foo & <"bar">
(1 row)

--

SELECT
  xmlcast('2024-05-29 12:04:10.703585+02'::timestamp without time zone
AS xml),
  xmlcast('2024-05-29T12:04:10.703585'::xml AS timestamp without time zone);
 
          xmlcast           |          xmlcast           
----------------------------+----------------------------
 2024-05-29T12:04:10.703585 | 2024-05-29 12:04:10.703585
(1 row)

--

SELECT
  xmlcast('P1Y2M3DT4H5M6S'::xml AS interval),
  xmlcast('1 year 2 mons 3 days 4 hours 5 minutes 6 seconds'::interval
AS xml);
 
            xmlcast            |    xmlcast     
-------------------------------+----------------
 1 year 2 mons 3 days 04:05:06 | P1Y2M3DT4H5M6S
(1 row)

--

SELECT CAST('42'::xml AS int);

ERROR:  cannot cast type xml to integer
LINE 1: SELECT CAST('42'::xml AS int);
               ^
--

SELECT XMLCAST('42'::xml AS int);
 xmlcast
---------
      42
(1 row)


> The only thing I found during a quick perusal of the documentation was
> XMLTABLE(), which seems a bit baroque if you just want to convert one
> value. Is this intended to plug that gap? Is there any other current
> way of doing it?
>
> Do we need to ensure some kind of consistency between XMLTABLE() and
> XMLCAST() in terms of how they behave? 

I haven't considered any compatibility to XMLTABLE(), as it has a
different spec (X300-X305), but I can take a look at it! To implement
this function I just followed the SQL/XML spec "ISO/IEC IWD 9075-14" -
and from time to time I also took a look on how other databases
implemented it.[1]

> The documentation at
> https://www.postgresql.org/docs/current/xml-limits-conformance.html#FUNCTIONS-XML-LIMITS-CASTS
> says that "When PostgreSQL maps SQL data values to XML (as in
> xmlelement), or XML to SQL (as in the output columns of xmltable),
> except for a few cases treated specially, PostgreSQL simply assumes
> that the XML data type's XPath 1.0 string form will be valid as the
> text-input form of the SQL datatype, and conversely." Unfortunately,
> it does not specify what those cases treated specially are, and the
> commit that added that documentation text is not the one that added
> the underlying code, so I don't actually know where that code is, but
> one would expect this function to conform to that general rule.

I agree. It would be nice to know which cases those are.
However, invalid inputs should normally return an error, e.g.

SELECT xmlcast('foo&bar'::xml AS text);

ERROR:  invalid XML content
LINE 1: SELECT xmlcast('foo&bar'::xml AS text);
                       ^
DETAIL:  line 1: EntityRef: expecting ';'
foo&bar
       ^
--

SELECT xmlcast('foo'::xml AS date);
ERROR:  invalid input syntax for type date: "foo"

--

.. but perhaps the text means something else?

Thanks!

Best, Jim

1 - https://dbfiddle.uk/ZSpsyIal



Re: [PoC] XMLCast (SQL/XML X025)

From
Robert Haas
Date:
Hi Jim,

On Mon, Nov 11, 2024 at 2:43 PM Jim Jones <jim.jones@uni-muenster.de> wrote:
> > The only thing I found during a quick perusal of the documentation was
> > XMLTABLE(), which seems a bit baroque if you just want to convert one
> > value. Is this intended to plug that gap? Is there any other current
> > way of doing it?
> >
> > Do we need to ensure some kind of consistency between XMLTABLE() and
> > XMLCAST() in terms of how they behave?
>
> I haven't considered any compatibility to XMLTABLE(), as it has a
> different spec (X300-X305), but I can take a look at it! To implement
> this function I just followed the SQL/XML spec "ISO/IEC IWD 9075-14" -
> and from time to time I also took a look on how other databases
> implemented it.[1]

Those are good things to check, but we also need to consider how it
interacts with features PostgreSQL itself already has. In particular,
I'm concerned about the special handling you seem to have for times
and intervals. That handling might be different from what, say,
XMLTABLE() does. In a perfect world, we'd probably like the features
to share code, unless there is some good reason to do otherwise. But
at the very least we want them to work in compatible ways. For
example, if the way you convert a date into the JSON-preferred format
happened to use slightly different time zone handling than the way
that some other existing feature does it, that would be extremely sad.
Or if the existing features don't have interval handling and you do,
perhaps we ought to add that capability to the existing features and
then have your new feature call the same code so that it works the
same way. I haven't researched what the exact situation is here too
and these examples I'm giving you here are strictly hypothetical --
they're just the kind of thing that needs to be sorted out before we
can think about committing anything.

There's still also the question of desirability. I take it for granted
that you want this feature and consider it valuable, but sometimes
people submit patches for a feature that only the submitter wants and
nobody else cares about it (or even, other people actively dislike
it). I am in a very poor position to assess how important this feature
is or to what extent it complies with the relevant specification. Vik,
who I see you copied, is probably in a much better position to
interpret the spec than I am, and may or may not also know something
about whether people want this. I continue to hope that we'll get some
comments from others as well.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [PoC] XMLCast (SQL/XML X025)

From
Jim Jones
Date:

On 12.11.24 15:59, Robert Haas wrote:
> Those are good things to check, but we also need to consider how it
> interacts with features PostgreSQL itself already has. 

I totally agree. It just didn't occur to me to check how XMLTABLE()
deals with these conversions :)

> In particular,
> I'm concerned about the special handling you seem to have for times
> and intervals. 

The spec dictates that SQL types should be converted to their xsd
equivalents, e.g.

6.7 <XML cast specification>: Syntax Rules
...
15 e)
 * i)   If the type designator of SQLT is DATE, then let XT be xs:date.
 * ii)  If the type designator of SQLT is TIME WITH TIME ZONE, then let
XT be xs:time.
 * iii) If the type designator of SQLT is TIME WITHOUT TIME ZONE, then
let XT be xs:time.
 * iv)  If the type designator of SQLT is TIMESTAMP WITH TIME ZONE, then
let XT be xs:dateTime.
 * v)   If the type designator of SQLT is TIMESTAMP WITHOUT TIME ZONE,
then let XT be xs:dateTime.

> That handling might be different from what, say,
> XMLTABLE() does. 

XMLTABLE() does seem to have a similar behaviour (also regarding
intervals and timestamps):

WITH j (val) AS (
 SELECT
  '<foo>
    <interval>P1Y2M3DT4H5M6S</interval>
    <timestamp>2002-05-30T09:30:10</timestamp>
    <integer>42</integer>
    <numeric>-42.73</numeric>
    <text>foo & <"bar"></text>
    <boolean>false</boolean>
  </foo>'::xml
)
SELECT a, b, c, d, e, f
FROM j,
  XMLTABLE(
    '/foo'
    PASSING val
    COLUMNS
      a interval PATH 'interval',
      b timestamp PATH 'timestamp',
      c integer PATH 'integer',
      d numeric PATH 'numeric',
      e text PATH 'text',
      f boolean PATH 'boolean');
               a               |          b          | c  |   d   
|       e       | f
-------------------------------+---------------------+----+--------+---------------+---
 1 year 2 mons 3 days 04:05:06 | 2002-05-30 09:30:10 | 42 | -42.73 | foo
& <"bar"> | f
(1 row)


> In a perfect world, we'd probably like the features
> to share code, unless there is some good reason to do otherwise. But
> at the very least we want them to work in compatible ways. For
> example, if the way you convert a date into the JSON-preferred format
> happened to use slightly different time zone handling than the way
> that some other existing feature does it, that would be extremely sad.
> Or if the existing features don't have interval handling and you do,
> perhaps we ought to add that capability to the existing features and
> then have your new feature call the same code so that it works the
> same way. 
At least XMLTABLE() does handle intervals in the same way. I'll do some
research to check if maybe other related XML features follow a different
path.
> I haven't researched what the exact situation is here too
> and these examples I'm giving you here are strictly hypothetical --
> they're just the kind of thing that needs to be sorted out before we
> can think about committing anything.
+1
> There's still also the question of desirability. I take it for granted
> that you want this feature and consider it valuable, but sometimes
> people submit patches for a feature that only the submitter wants and
> nobody else cares about it (or even, other people actively dislike
> it). 
I've been there a few times :)
> I am in a very poor position to assess how important this feature
> is or to what extent it complies with the relevant specification. Vik,
> who I see you copied, is probably in a much better position to
> interpret the spec than I am, and may or may not also know something
> about whether people want this. I continue to hope that we'll get some
> comments from others as well.

Thanks for taking a look at this patch. Much appreciated!

-- 
Jim