Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
Date
Msg-id 20180129.175008.100739543.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
List pgsql-hackers
Hello.

At Wed, 24 Jan 2018 10:30:39 +0100, Pavel Stehule <pavel.stehule@gmail.com> wrote in
<CAFj8pRBVUVvG1CXxgrs0UipTziUX6M788z-=L9gQvwAB4UGLeg@mail.gmail.com>
> Hi
> 
> 2018-01-23 8:13 GMT+01:00 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp
> > I have three comments on the behavior and one on documentation.
> >
> > 1. Lack of syntax handling.
> >
> > ["'" [^'] "'"] is also a string literal, but getXPathToken is
> > forgetting that and applying default namespace mistakenly to the
> > astring content.
> >
> 
> In this case, I am not sure if I understand well.
> 
> Within expressions, literal strings are delimited by single or double
> quotation marks, which are also used to delimit XML attributes. To avoid a
> quotation mark in an expression being interpreted by the XML processor as
> terminating the attribute value the quotation mark can be entered as a
> character reference (" or '). Alternatively, the expression can
> use single quotation marks if the XML attribute is delimited with double
> quotation marks or vice-versa.

I think it is correct understanding.

> So if I understand well, then XML string can looks like ' some " some ' or
> " some ' some ". I fixed it.

Thanks. It looks good.

> > 2. Additional comment might be good.
> >
> > It might be better having additional description about default
> > namespace in the comment starts from "Namespace mappings are
> > passed as text[]" in xpth_internal().
> >
> 
> fixed

Thanks.


> > 3. Inconsistent behavior from named namespace.
> >
> > | -     function context, aliases are <emphasis>local</emphasis>).
> > | +     function context, aliases are <emphasis>local</emphasis>). Default
> > namespace has
> > | +     empty name (empty string) and should be only one.
> >
> > This works as the description, on the other hand the same
> > namespace prefix can be defined twice or more in the array and
> > the last one is in effect. I don't see a reason for
> > differenciating the default namespace case.
> >
> 
> It looks like libxml2 bug. There is no sense to use more than one default
> namespace, and although it is inconsistent with other namespaces, I am
> thinking so it is correct. Is better to raise error early. In this case
> Postgres expects so libxml2 ensure all namespace checks and it is tolerant.
> Default namespace is implemented inside Postgres, and I don't see any
> advantage of tolerant behave. More default namespaces is disallowed for
> XMLTABLE - so some inconsistency there should be. In this case I prefer
> raise error to signalize ambiguous or badly formatted input clearly.

Ok. I'm fine with that.

> > 4. Comments on the documentation part.
> >
> > # Even though I'm not sutable for commenting on wording...
> >
> > | +     Inside predicate literals <literal>and</literal>,
> > <literal>or</literal>,
> > | +     <literal>div</literal> and <literal>mod</literal> are used as
> > keywords
> > | +     (XPath operators) every time and default namespace are not applied
> > there.
> >
> > *I*'d like to have a comma between the predicate and literals,
> > and have a 'a' before prediate. Or 'Literals .. inside a
> > predicate' might be better?
> >
> 
> fixed
> 
> >
> > 'are used as keywords' might be better being 'are identifed as
> > keywords'?
> >
> 
> fixed
> 
> 
> > Default namespace is applied to tag names except the listed
> > keywords even inside a predicate. So 'are not applied there'
> > might be better being 'are not applied to them'?  Or 'are not
> > applied in the case'?
> >
> > | +     If you would to use these literals like tag names, then the
> > default namespace
> > | +     should not be used, and these literals should be explicitly
> > | +     labeled.
> > | +    </para>
> >
> > Default namespace is not applied *only to* such keywords inside a
> > predicate. Even if an Xpath expression contains such a tag name,
> > default namespace still works for other tags. Does the following
> > make sense?
> >
> > + Use named namespace to qualify such tag names appear in an
> > + XPath predicate.
> >
> >
> fixed
> 
> I hope so some native speaker finalize doc. It is out of my knowledges
> .

I am also anxious for that.


> > ===
> > After the aboves are addressed (even or rejected), I think I
> > don't have no additional comment.
> >
> >
> > - This current patch applies safely (with small shifts) on the
> >   current master.
> >
> > - The code looks fine for me.
> >
> > - This patch translates the given XPath expression by prefixing
> >   unprefixed tag names with a special namespace prefix only in
> >   the case where default namespace is defined, so the existing
> >   behavior is not affected.
> >
> > - The syntax is existing but just not implemented so I don't
> >   think no arguemnts needed here.
> >
> > - It undocumentedly inhibits the usage of the namespace prefix
> >   "pgdefnamespace.pgsqlxml.internal" but I believe no one can
> >   notice that.
> >
> > - The default-ns translator (xpath_parser.c) seems working
> >   perfectly with some harmless exceptions.
> >
> >   (xpath specifications is here: https://www.w3.org/TR/1999/
> > REC-xpath-19991116/)
> >
> >   Related unused features (and not documented?):
> >     context variables ($n notations),
> >     user-defined functions (or function names prefixed by a namespace
> > prefix)
> 
> 
> I did some fast check - and these features are not supported by libxml2 -
> so it is question if it should be parsed by out xpath parser. So there are
> not possible to check it, test it :(

I'm fine with that. I don't think that test for them is needed
since PostgreSQL doesn't support them anyway. Sorry for the
confusing comment. (libxml2 complains about that in the following
way.)

 | =# select xpath('/a/text()=$', '<a>test</a>');
 | ERROR:  invalid XPath expression
!| DETAIL:  Expected $ for variable reference
 | CONTEXT:  SQL function "xpath" statement 1

 | =# select xpath('func(/a/text())', '<a>test</a>');
 | ERROR:  could not create XPath object
!| DETAIL:  Unregistered function
 | CONTEXT:  SQL function "xpath" statement 1

> >   Newly documented behavior:
> >     the default namespace isn't applied to and/or/div/mod.
> >
> > - Dodumentation looks enough.
> >
> > - Regression test doesn't cover the XPath syntax but it's not
> >   viable.  I am fine with the basic test cases added by the
> >   current patch.
> >
> > regards,
> >
> 
> I am sending updated version.
> 
> Very much thanks for very precious review

It's my pleasure. Sorry for my slow responses.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Jeevan Chalke
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Built-in connection pooling