Re: XPATH evaluation - Mailing list pgsql-hackers

From Radosław Smogura
Subject Re: XPATH evaluation
Date
Msg-id 201106171743.04781.rsmogura@softperience.eu
Whole thread Raw
In response to Re: XPATH evaluation  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> Friday 17 of June 2011 17:09:25
> On 06/17/2011 10:55 AM, Radosław Smogura wrote:
> > Andrew Dunstan<andrew@dunslane.net>  Friday 17 of June 2011 15:47:04
> >
> >> On 06/17/2011 05:41 AM, Florian Pflug wrote:
> >>> On Jun17, 2011, at 11:09 , Radosław Smogura wrote:
> >>>> 1.
> >>>> SELECT (XPATH('/root/*', '<root xmlns:o="http://olacle.com/db"
> >>>> xmlns:p="http://postgresql.org/db"><o:db><a><b></b></a></o:db><p:db></
> >>>> p
> >>>>
> >>>> :db></root>')); Produces:
> >>>> "{"<o:db>
> >>>>
> >>>>    <a>
> >>>>
> >>>>      <b/>
> >>>>
> >>>>    </a>
> >>>>
> >>>> </o:db>",<p:db/>}"
> >>>> In above<b></b>   was reduced to<b/>   this is different infoset then
> >>>> input, and those notations are differently interpreted e.g. by XML
> >>>> Binding&   WebServices. The 1st one will may be mapped to empty
> >>>> string, and 2nd one to to null.
> >>>
> >>> Oh, joy :-(
> >>
> >> I thought these were basically supposed to be the same.
> >>
> >> The XML Information Set for example specifically excludes:
> >>      The difference between the two forms of an empty element: |<foo/>
> >>      | and |<foo></foo>|.||||
> >>
> >> See<http://www.w3.org/TR/2004/REC-xml-infoset-20040204/>  Appendix D.
> >> Note that this implies that<foo></foo>  does not have content of an
> >> empty string, but that it has no content.
> >>
> >>
> >> cheers
> >>
> >> andrew
> >
> > Indeed, Infoset Spec, and XML Canonization Spec treats<foo></foo>  same,
> > as <foo/>  - my wrong, but XML canonization preservs whitespaces, if I
> > remember well, I think there is example.
> >
> > In any case if I will store image in XML (I've seen this), preservation
> > of white spaces and new lines is important.
>
> If you store images you should encode them anyway, in base64 or hex.
>
> More generally, data that needs that sort of preservation should
> possibly be in CDATA nodes.
>
> cheers
>
> andrew
I know this answer, because this solution is better. But, during one work I
created XSL-FO with whitespace preserve attribute, if I would like to get part
of such XSL-FO I could destroy output document.

But those use-cases doesn't change fact that XPATH output doesn't preserves
whitepsaces, newlines, and produces different node, then was in original. It
same as regexp form varchar will trim result without control.

I emphasize this because it may cause problems with XML Digest algorithms
which are quite popular and may cause some legal! problems when you try to use
Advance Signature in Europe Union, as well with other application.

With XML Binding it's quite popular to interpret <foo/> as null, <foo></foo>
as empty string. In particulary mantoined Infoset Spec doesn't matters here.

I think no-formatting is reasonable requirement for XPATH function.

Regards,
Radek.


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Boolean operators without commutators vs. ALL/ANY
Next
From: Andrew Dunstan
Date:
Subject: Re: XPATH evaluation