Re: patch: function xmltable - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: patch: function xmltable
Date
Msg-id CAMsr+YEXQJOHJQsZGLo-Y=G6m0adF3E=eeuHJK2ei=eUg+nw_Q@mail.gmail.com
Whole thread Raw
In response to Re: patch: function xmltable  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: patch: function xmltable
List pgsql-hackers
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> minor update - using DefElem instead own private parser type

I'm really glad that you're doing this and I'll take a look at it for this CF.

It's quite a big patch so I expect this will take a few rounds of
review and updating.


Patch applies cleanly and builds cleanly on master both with and
without --with-xml .

Overall, I think this needs to be revised with appropriate comments.
Whitespace/formatting needs fixing since it's all over the place.
Documentation is insufficient (per notes below).

Re identifier naming, some of this code uses XmlTable naming patterns,
some uses TableExpr prefixes. Is that intended to indicate a bounary
between things re-usable for other structured data ingesting
functions? Do you expect a "JSONEXPR" or similar in future? That's
alluded to by

+/*----------
+ * TableExpr - used for XMLTABLE function
+ *
+ * This can be used for json_table, jsonb_table functions in future
+ *----------
+ */
+typedef struct TableExpr
+{
...

If so, should this really be two patches, one to add the table
expression infrastructure and another to add XMLTABLE that uses it?
Also, why in that case does so much of the TableExpr code call
directly into XmlTable code? It doesn't look very generic.

Overall I find identifier naming to be a bit inconsisent and think
it's necessary to make it clear that all the "TableExpr" stuff is for
XMLTABLE specifically, if that's the case, or make the delineation
clearer if not.

I'd also like to see tests that exercise the ruleutils get_rule_expr
parts of the code for the various XMLTABLE variants.

Similarly, since this seems to add a new xpath parser, that needs
comprehensive tests. Maybe re-purpose an existing xpath test data set?




More detailed comments:
====

Docs comments:
     The <function>xmltable</function> produces [a] table based on
[the] passed XML value.

The docs are pretty minimal and don't explain the various clauses of
XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
there a good cross reference link available? The PASSING clause? etc.

How does XMLTABLE decide what to iterate over, and how to iterate over it?

Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.

What standard, if any, does this conform to? Does it resemble
implementations elsewhere? What limitations or unsupported features
does it have relative to those standards?



execEvalTableExpr seems to be defined twice, with a difference in
case. This is probably not going to fly:


+static Datum
+execEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{

+static Datum
+ExecEvalTableExpr(TableExprState *tstate,
+                        ExprContext *econtext,
+                        bool *isNull, ExprDoneCond *isDone)
+{


It looks like you've split the function into a "guts" and "wrapper"
part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
That seems reasonable for readability, but the naming isn't.

A comment is needed to explain what ExecEvalTableExpr is / does. If
it's XMLTABLE specific (which it looks like based on the code), its
name should reflect that. This pattern is repeated elsewhere; e.g.
TableExprState is really the state for an XMLTABLE expression. But
PostgreSQL actually has TABLE statements, and in future we might want
to support table-expressions, so I don't think this naming is
appropriate. This is made worse by the lack of comments on things like
the definition of TableExprState. Please use something that makes it
clear it's for XMLTABLE and add appropriate comments.

Formatting of variables, arguments, function signatures etc is
random/haphazard and doesn't follow project convention. It's neither
aligned or unaligned in the normal way, I don't understand the random
spacing at all. Maybe you should try to run pgindent and then extract
just the changes related to your patch? Or run your IDE/editor's
indent function on your changes? Right now it's actually kind of hard
to read. Do you edit with tabstop set to 1 normally or something like
that?

There's a general lack of comments throughout the added code.

In execEvalTableExpr, why are we looping over namespaces? What's that
for? Comment would be nice.

Typo: Path caclulation => Path calculation

What does XmlTableSetRowPath() do? It seems to copy its argument.
Nothing further is done with the row_path argument after it's called
by execEvalTableExpr, so what context is that memory in and do we have
to worry about it if it's large?

execEvalTableExpr says it's doing "path calculation". What it actually
appears to do is evaluate the path expressions, if provided, and
otherwise use the column name as the implied path expression. (The
docs should mention that).

It's wasn't immediately obvious to me what the branch around
tstate->for_ordinality_col   is for and what the alternate path's
purpose is in terms of XMLTABLE's behaviour, until I read the parser
definition. That's largely because the behaviour of XMLTABLE is
underspecified in the docs, since once you know ORDINALITY columns
exist it's pretty obvious what it's doing.

Similarly, for the alternate branch   tstate->ncols   , the
XmlTableGetRowValue call there is meant to do what exactly, and
why/under what conditions? Is it for situations where the field type
is a whole-row value? a composite type? (I'm deliberately not studying
this too deeply, these are points I'd like to see commented so it can
be understood to some reasonable degree at a skim-read).

               /* result is one more columns every time */
"one or more"


               /* when typmod is not valid, refresh it */               if (te->typmod == -1)


Is this a cache? How is it valid or not valid and when? The comment
(thanks!) on TableExprGetTupleDesc says:

/** When we skip transform stage (in view), then TableExpr's* TupleDesc should not be valid. Refresh is necessary.*/

but I'm not really grasping what you're trying to explain here. What
transform stage? What view? This could well be my ignorance of this
part of the code; if it should be understandable by a reader who is
appropriately familiar with the executor that's fine, but if it's
specific to how XMLTABLE works some more explanation would be good.

Good that you've got all the required node copy/in/out funcs in place.

Please don't use the name "used_dns". Anyone reading that will read it
as "domain name service" and that's actually confusing with XML
because of XML schema lookups. Maybe used_defnamespace ?   used
def_ns?

I haven't looked closely at keyword/parser changes yet, but it doesn't
look like you added any reserved keywords, which is good. It does add
unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
unreserved keywords is or the significance of that.

New ereport() calls specify ERRCODEs, which is good.

PostgreSQL already has XPATH support in the form of xmlexists(...)
etc. Why is getXPathToken() etc needed? What re-use is possible here?
There's no explanation in the patch header or comments. Should the new
xpath parser be re-used by the existing xpath stuff? Why can't we use
libxml's facilities? etc. This at least needs explaining in the
submission, and some kind of hint as to why we have two different ways
to do it is needed in the code. If we do need a new XML parser, should
it be bundled in adt/xml.c along with a lot of user-facing
functionality, or a separate file?


How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
this? It doesn't look like they're intended to be called directly by
the user, and they're not documented (or commented).

I don't understand this at all:



+/*
+ * There are different requests from XMLTABLE, JSON_TABLE functions
+ * on passed data than has CREATE TABLE command. It is reason for
+ * introduction special structure instead using ColumnDef.
+ */
+typedef struct TableExprRawCol
+{
+    NodeTag     type;
+    char       *colname;
+    TypeName   *typeName;
+    bool        for_ordinality;
+    bool        is_not_null;
+    Node       *path_expr;
+    Node       *default_expr;
+    int         location;
+} TableExprRawCol;



That's my first-pass commentary. I'll return to this once you've had a
chance to take a look at these and tell me all the places I got it
wrong ;)



-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: Refactoring of heapam code.
Next
From: Victor Wagner
Date:
Subject: Re: Patch: Implement failover on libpq connect level.