Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
Date
Msg-id 20171106.220007.07289832.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Thank you for the new patch.

- The latest patch is missing xpath_parser.h at least since ns-3. That of the first (not-numbered) version was still
usable.

- c29c578 conflicts on doc/src/sgml/func.sgml


At Sun, 15 Oct 2017 12:06:11 +0200, Pavel Stehule <pavel.stehule@gmail.com> wrote in
<CAFj8pRCYBH+a6oJoEYUFDUpBQ1ySwtt2CfnFZxs2Ab9EfONbUQ@mail.gmail.com>
> 2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp>:
> 
> > Hi, thanks for the new patch.
> >
> > # The patch is missing xpath_parser.h. That of the first patch was usable.
> >
> > At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule <pavel.stehule@gmail.com>
> > wrote in <CAFj8pRBMQa07a=+qQAVMtz5M_hqkJBhiQSOP76+-BrFDj37pvg@
> > mail.gmail.com>
> > > Hi
> > >
> > > now xpath and xpath_exists supports default namespace too
> >
> > At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule <pavel.stehule@gmail.com>
> > wrote in <CAFj8pRCZ8oneG7g2vxs9ux71n8A9twwUO7zQpJiuz+7RGSpSuw@mail.
> > gmail.com>
> > > > 1. Uniformity among simliar features
> > > >
> > > >   As mentioned in the proposal, but it is lack of uniformity that
> > > >   the xpath transformer is applied only to xmltable and not for
> > > >   other xpath related functions.
> > > >
> > >
> > > I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> > > support namespaces/
> >
> > Sorry, I forgot to care about that. (And the definition of
> > namespace array is of course fabricated by me). I'd like to leave
> > this to committers.  Anyway it is working but the syntax (or
> > whether it is acceptable) is still arguable.
> >
> > SELECT xpath('/a/text()', '<my:a xmlns:my="http://example.com">
> > test</my:a>',
> >              ARRAY[ARRAY['', 'http://example.com']]);
> > |  xpath
> > | --------
> > |  {test}
> > | (1 row)
> >
> >
> > The internal name is properly rejected, but the current internal
> > name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> > are preserving some short names and reject them as
> > user-defined. Doesn't just 'pgsqlxml' work?
> >
> >
> > Default namespace correctly become to be applied on bare
> > attribute names.
> >
> > > updated doc,
> > > fixed all variants of expected result test file
> >
> > Sorry for one by one comment but I found another misbehavior.
> >
> > create table t1 (id int, doc xml);
> > insert into t1
> >    values
> >    (5, '<rows xmlns="http://x.y"><row><a hoge="haha">50</a></row></
> > rows>');
> > select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> > '/x:rows/x:row' passing t1.doc columns data int PATH
> > 'child::x:a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | ------
> > |    50
> >
> > but the following fails.
> >
> > select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > '/rows/row' passing t1.doc columns data int PATH
> > 'child::a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | ------
> > |
> > | (1 row)
> >
> > Perhaps child::a is not prefixed by the transformation.
> >
> 
> the problem was in unwanted attribute modification. The parser didn't
> detect "attribute::hoge" as attribute. Updated parser does it. I reduce
> duplicated code there more.

It worked as expected. But the comparison of "attribute" is
missing t1.length = 9 so the following expression wrongly passes.

child::a[1][attributeabcdefg::hoge="haha"

It is confusing that is_qual_name becomes true when t2 is not a
"qual name", and the way it treats a double-colon is hard to
understand.

It essentially does inserting the default namespace before
unqualified non-attribute name. I believe we can easily
look-ahead to detect a double colon and it would make things
simpler. Could you consider something like the attached patch?
(applies on top of ns-4 patch.)

> > XPath might be complex enough so that it's worth switching to
> > yacc/lex based transformer that is formally verifiable and won't
> > need a bunch of cryptic tests that finally cannot prove the
> > completeness. synchronous_standy_names is far simpler than XPath
> > but using yacc/lex parser.
> >
> >
> > Anyway the following is nitpicking of the current xpath_parser.c.
> >
> > - NODENAME_FIRSTCHAR allows '-' as the first char but it is
> >   excluded from NameStartChar (https://www.w3.org/TR/REC-
> > xml/#NT-NameStartChar)
> >   I think characters with high-bit set is okay.
> >   Also IS_NODENAME_CHAR should be changed.
> >
> 
> fixed
> 
> 
> > - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
> >   but have different naming schemes. Can these are named in the same way?
> >
> 
> fixed
> 
> 
> > - The current transoformer seems to using up to one token stack
> >   depth. Maybe the stack is needless. (pushed token is always
> >   popped just after)
> >
> 
> fixed

Thank you.

I found another (and should be the last, so sorry..) functional
defect in this. This doesn't add default namespace if the tag
name in a predicate is 'and' or 'or'. It needs to be fixed, or
wrote in the documentation as a restriction. (seem hard to fix
it..)

create table t1 (id int, doc xml);
insert into t1 values (1, '<rows xmlns="http://x.y"><row><val>50</val><and>60</and></row></rows>');
select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
'/x:rows/x:row' passing t1.doc columns data int PATH
'x:val[../x:and = 60]') as x;data
------  50
(1 row)
select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
'/rows/row' passing t1.doc columns data int PATH
'val[../and = 60]') as x;data
------

(1 row)



Other comments are follows.

- Please add more comments. XPATH_TOKEN_NAME in _transformXPath in my patch has more

- Debug output might be needed.

# sorry now time's up. will continue tomorrow.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/adt/xpath_parser.c b/src/backend/utils/adt/xpath_parser.c
index f22ec76..1d8d93c 100644
--- a/src/backend/utils/adt/xpath_parser.c
+++ b/src/backend/utils/adt/xpath_parser.c
@@ -41,6 +41,8 @@ typedef enum    XPATH_TOKEN_NAME,    XPATH_TOKEN_STRING,    XPATH_TOKEN_NUMBER,
+    XPATH_TOKEN_COLON,
+    XPATH_TOKEN_DCOLON,    XPATH_TOKEN_OTHER}    XPathTokenType;
@@ -68,6 +70,7 @@ typedef struct ParserData#define IS_NODENAME_CHAR(c)        (IS_NODENAME_FIRSTCHAR(c) || (c) == '-'
||(c) == '.' || \                                 ((c) >= '0' && (c) <= '9'))
 
+#define TOKEN_IS_EMPTY(t)    ((t)->ttype == XPATH_TOKEN_NONE)/* * Returns next char after last char of token - XPath
lexer
@@ -112,6 +115,17 @@ getXPathToken(char *str, XPathTokenInfo * ti)            ti->ttype = XPATH_TOKEN_STRING;        }
+        else if (c == ':')
+        {
+            /* look ahead to detect a doulbe-colon */
+            if (*str == ':')
+            {
+                ti->ttype = XPATH_TOKEN_DCOLON;
+                str++;
+            }
+            else
+                ti->ttype = XPATH_TOKEN_COLON;
+        }        else            ti->ttype = XPATH_TOKEN_OTHER;
@@ -165,6 +179,7 @@ pushXPathToken(XPathParserData * parser, XPathTokenInfo * ti)    memcpy(&parser->buffer, ti,
sizeof(XPathTokenInfo));   parser->buffer_is_empty = false;
 
+    ti->ttype = XPATH_TOKEN_NONE;}/*
@@ -179,6 +194,9 @@ writeXPathToken(StringInfo str, XPathTokenInfo * ti)        appendBinaryStringInfo(str, ti->start,
ti->length);   else        appendStringInfoChar(str, *ti->start);
 
+
+    /* this token is comsumed */
+    ti->ttype = XPATH_TOKEN_NONE;}/*
@@ -192,7 +210,7 @@ _transformXPath(StringInfo str, XPathParserData * parser,{    XPathTokenInfo t1,
t2;
-    bool        token_is_tagname = false;
+    bool        tagname_needs_defnsp = false;    bool        token_is_tagattrib = false;    nextXPathToken(parser,
&t1);
@@ -203,7 +221,11 @@ _transformXPath(StringInfo str, XPathParserData * parser,        {            case
XPATH_TOKEN_NUMBER:           case XPATH_TOKEN_STRING:
 
-                token_is_tagname = false;
+                /*
+                 * string cannot be a tag name. write out it immediately and
+                 * go ahead
+                 */
+                tagname_needs_defnsp = false;                token_is_tagattrib = false;
writeXPathToken(str,&t1);
 
@@ -212,8 +234,6 @@ _transformXPath(StringInfo str, XPathParserData * parser,            case XPATH_TOKEN_NAME:
      {
 
-                    bool        is_qual_name = false;
-                    /* inside predicate ignore keywords "and" "or" */                    if (inside_predicate)
          {
 
@@ -226,53 +246,56 @@ _transformXPath(StringInfo str, XPathParserData * parser,                        }
   }
 
-                    token_is_tagname = true;
+                    /* look ahead what is following the name token */
+                    tagname_needs_defnsp = true;                    nextXPathToken(parser, &t2);
-                    if (t2.ttype == XPATH_TOKEN_OTHER)
+                    if (t2.ttype == XPATH_TOKEN_COLON)
+                    {
+                        /* t1 is a quilified node name. no need to add default one. */
+                        tagname_needs_defnsp = false;
+                        writeXPathToken(str, &t1);    /* namespace name */
+                        writeXPathToken(str, &t2);    /* colon */
+                        /* get node name */
+                        nextXPathToken(parser, &t1);
+                    }
+                    else if (t2.ttype == XPATH_TOKEN_DCOLON)
+                    {
+                        /* t1 is an axis name. write out as it is */
+                        if (strncmp(t1.start, "attribute", 9) == 0 && t1.length == 9)
+                            token_is_tagattrib = true;
+
+                        writeXPathToken(str, &t1);    /* axis name */
+                        writeXPathToken(str, &t2);    /* double colon */
+
+                        /*
+                         * The next token may be qualified tag name, process
+                         * it as a fresh token.
+                         */
+                        nextXPathToken(parser, &t1);
+                        break;
+                    }
+                    else if (t2.ttype == XPATH_TOKEN_OTHER)                    {
+                        /* function name doesn't require namespace */                        if (*t2.start == '(')
-                            token_is_tagname = false;
-                        else if (*t2.start == ':')
-                        {
-                            XPathTokenInfo t3;
-
-                            nextXPathToken(parser, &t3);
-                            if (t3.ttype == XPATH_TOKEN_OTHER && *t3.start == ':'
-                                     && strncmp(t1.start, "attribute", 9) == 0)
-                            {
-                                /* other syntax for attribute, where we should not apply def namespace */
-                                appendStringInfo(str, "attribute::");
-                                nextXPathToken(parser, &t1);
-                                token_is_tagattrib = true;
-                                break;
-                            }
-                            else
-                            {
-                                pushXPathToken(parser, &t3);
-                                is_qual_name = true;
-                            }
-                        }
+                            tagname_needs_defnsp = false;
+                        else
+                            pushXPathToken(parser, &t2);                    }
-                    if (token_is_tagname && !token_is_tagattrib
-                                 && !is_qual_name && def_namespace_name != NULL)
+                    if (tagname_needs_defnsp && !token_is_tagattrib &&
+                        def_namespace_name != NULL)                        appendStringInfo(str, "%s:",
def_namespace_name);                   token_is_tagattrib = false;
 
-                    writeXPathToken(str, &t1);
+                    /* write maybe-tagname if not consumed */
+                    if (!TOKEN_IS_EMPTY(&t1))
+                        writeXPathToken(str, &t1);
-                    if (is_qual_name)
-                    {
+                    /* output t2 if not consumed yet */
+                    if (!TOKEN_IS_EMPTY(&t2))                        writeXPathToken(str, &t2);
-                        nextXPathToken(parser, &t1);
-                        if (t1.ttype == XPATH_TOKEN_NAME)
-                            writeXPathToken(str, &t1);
-                        else
-                            pushXPathToken(parser, &t1);
-                    }
-                    else
-                        pushXPathToken(parser, &t2);                    nextXPathToken(parser, &t1);                }
@@ -283,7 +306,6 @@ _transformXPath(StringInfo str, XPathParserData * parser,                    char        c =
*t1.start;                   token_is_tagattrib = false;
 
-                    token_is_tagname = false;                    writeXPathToken(str, &t1);
@@ -307,10 +329,14 @@ _transformXPath(StringInfo str, XPathParserData * parser,                }                break;
+            case XPATH_TOKEN_COLON:
+            case XPATH_TOKEN_DCOLON:            case XPATH_TOKEN_NONE:                elog(ERROR, "should not be
here");       }    }
 
+
+    elog(LOG, "\"%s\"", str->data);}void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [HACKERS] path toward faster partition pruning
Next
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] pow support for pgbench