Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath - Mailing list pgsql-bugs

From David G. Johnston
Subject Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath
Date
Msg-id CAKFQuwauM5suzvEMzeGtA_VuhEhXDib8p9v-cp==a2gZaWs7DA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Thu, May 19, 2016 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> onic@live.fr writes:
> > SELECT unnest(xpath('/name/text()', xmlelement(name name, 'AT&T', null
> )))
> > This gives me 'AT&T' and
>
> AFAICS, that behavior is correct.  xpath returns a value of type xml
> (well, really xml[]) and 'AT&T' is not a legal value of that type;
> only 'AT&T' is.
>
> > I have NO WAY inside pgsql to get 'At&t' value
>
> I agree that there ought to be an "unescape" function that would convert
> this back to 'AT&T', but the lack of one seems like a missing feature not
> a bug.  I'd wonder for example what an unescape function ought to do with
> other markup such as '<name>'.
>
> The last concrete discussion we had on this seems to have been
> this thread:
>

>
> http://www.postgresql.org/message-id/flat/C71079E6-12D8-4048-B8C5-1836893=
6FD5D@phlo.org
>
> which petered out for lack of anyone finding the time to investigate
> the relevant standards.
>

This
2013
=E2=80=8B thread further discusses our=E2=80=8B existing problematic behavi=
or.

http://www.postgresql.org/message-id/flat/25508.1383590668@sss.pgh.pa.us#25=
508.1383590668@sss.pgh.pa.us

I don't recall anything discussed in depth since then - just a bug report
or two from users.

http://www.postgresql.org/message-id/CALr6pkhSe20gh5Hci1H=3DuT_7QE4av0m9h2e=
QMjqUX6D6AD9H1Q@mail.gmail.com

A basic entity decoder is fairly simply to write and doesn't require C code
- a single SQL function using replace will likely suffice.

The 2013 post I linked shows a potentially deeper consideration that we
need to take into account.

I'd challenge any community members who really want to see this get fixed
at least start things off by generating a set of queries and expected
outputs that can be commented upon and documented as being the desired
behavior with respect to both internal and external entities.  At least
then we'd have clearly defined what the current and expected behaviors
are.  With that in hand there is at least hope that someone less familiar
with xpath/xml generally, but familiar with PostgreSQL internals, would be
willing to take on the task of writing a patch that causes the tests to
pass.  The existing test queries and resultant .out files should be
reasonably accessible even for someone not familiar with the code.

I'm inclined to think, however, that we will end up wanting an
"xpath_text(...)" function that returns text instead of xml.  The
underlying problem here is that the xpath standard allows for different
kinds of outputs - which depend upon the xpath expression - while
PostgreSQL only allows for xml regardless of the underlying expression.
This dichotomy seems likely to be impossible to overcome without
introducing a new function.

Please, help us scratch your itch.

David J.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath
Next
From: vigneshwaran balaji
Date:
Subject: Postgres Help - Reg