Re: patch: function xmltable - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: patch: function xmltable |
Date | |
Msg-id | CAMsr+YEXQJOHJQsZGLo-Y=G6m0adF3E=eeuHJK2ei=eUg+nw_Q@mail.gmail.com Whole thread Raw |
In response to | Re: patch: function xmltable (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: patch: function xmltable
|
List | pgsql-hackers |
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > minor update - using DefElem instead own private parser type I'm really glad that you're doing this and I'll take a look at it for this CF. It's quite a big patch so I expect this will take a few rounds of review and updating. Patch applies cleanly and builds cleanly on master both with and without --with-xml . Overall, I think this needs to be revised with appropriate comments. Whitespace/formatting needs fixing since it's all over the place. Documentation is insufficient (per notes below). Re identifier naming, some of this code uses XmlTable naming patterns, some uses TableExpr prefixes. Is that intended to indicate a bounary between things re-usable for other structured data ingesting functions? Do you expect a "JSONEXPR" or similar in future? That's alluded to by +/*---------- + * TableExpr - used for XMLTABLE function + * + * This can be used for json_table, jsonb_table functions in future + *---------- + */ +typedef struct TableExpr +{ ... If so, should this really be two patches, one to add the table expression infrastructure and another to add XMLTABLE that uses it? Also, why in that case does so much of the TableExpr code call directly into XmlTable code? It doesn't look very generic. Overall I find identifier naming to be a bit inconsisent and think it's necessary to make it clear that all the "TableExpr" stuff is for XMLTABLE specifically, if that's the case, or make the delineation clearer if not. I'd also like to see tests that exercise the ruleutils get_rule_expr parts of the code for the various XMLTABLE variants. Similarly, since this seems to add a new xpath parser, that needs comprehensive tests. Maybe re-purpose an existing xpath test data set? More detailed comments: ==== Docs comments: The <function>xmltable</function> produces [a] table based on [the] passed XML value. The docs are pretty minimal and don't explain the various clauses of XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is there a good cross reference link available? The PASSING clause? etc. How does XMLTABLE decide what to iterate over, and how to iterate over it? Presumably the FOR ORDINALITY clause makes a column emit a numeric counter. What standard, if any, does this conform to? Does it resemble implementations elsewhere? What limitations or unsupported features does it have relative to those standards? execEvalTableExpr seems to be defined twice, with a difference in case. This is probably not going to fly: +static Datum +execEvalTableExpr(TableExprState *tstate, + ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone) +{ +static Datum +ExecEvalTableExpr(TableExprState *tstate, + ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone) +{ It looks like you've split the function into a "guts" and "wrapper" part, with the error handling PG_TRY / PG_CATCH block in the wrapper. That seems reasonable for readability, but the naming isn't. A comment is needed to explain what ExecEvalTableExpr is / does. If it's XMLTABLE specific (which it looks like based on the code), its name should reflect that. This pattern is repeated elsewhere; e.g. TableExprState is really the state for an XMLTABLE expression. But PostgreSQL actually has TABLE statements, and in future we might want to support table-expressions, so I don't think this naming is appropriate. This is made worse by the lack of comments on things like the definition of TableExprState. Please use something that makes it clear it's for XMLTABLE and add appropriate comments. Formatting of variables, arguments, function signatures etc is random/haphazard and doesn't follow project convention. It's neither aligned or unaligned in the normal way, I don't understand the random spacing at all. Maybe you should try to run pgindent and then extract just the changes related to your patch? Or run your IDE/editor's indent function on your changes? Right now it's actually kind of hard to read. Do you edit with tabstop set to 1 normally or something like that? There's a general lack of comments throughout the added code. In execEvalTableExpr, why are we looping over namespaces? What's that for? Comment would be nice. Typo: Path caclulation => Path calculation What does XmlTableSetRowPath() do? It seems to copy its argument. Nothing further is done with the row_path argument after it's called by execEvalTableExpr, so what context is that memory in and do we have to worry about it if it's large? execEvalTableExpr says it's doing "path calculation". What it actually appears to do is evaluate the path expressions, if provided, and otherwise use the column name as the implied path expression. (The docs should mention that). It's wasn't immediately obvious to me what the branch around tstate->for_ordinality_col is for and what the alternate path's purpose is in terms of XMLTABLE's behaviour, until I read the parser definition. That's largely because the behaviour of XMLTABLE is underspecified in the docs, since once you know ORDINALITY columns exist it's pretty obvious what it's doing. Similarly, for the alternate branch tstate->ncols , the XmlTableGetRowValue call there is meant to do what exactly, and why/under what conditions? Is it for situations where the field type is a whole-row value? a composite type? (I'm deliberately not studying this too deeply, these are points I'd like to see commented so it can be understood to some reasonable degree at a skim-read). /* result is one more columns every time */ "one or more" /* when typmod is not valid, refresh it */ if (te->typmod == -1) Is this a cache? How is it valid or not valid and when? The comment (thanks!) on TableExprGetTupleDesc says: /** When we skip transform stage (in view), then TableExpr's* TupleDesc should not be valid. Refresh is necessary.*/ but I'm not really grasping what you're trying to explain here. What transform stage? What view? This could well be my ignorance of this part of the code; if it should be understandable by a reader who is appropriately familiar with the executor that's fine, but if it's specific to how XMLTABLE works some more explanation would be good. Good that you've got all the required node copy/in/out funcs in place. Please don't use the name "used_dns". Anyone reading that will read it as "domain name service" and that's actually confusing with XML because of XML schema lookups. Maybe used_defnamespace ? used def_ns? I haven't looked closely at keyword/parser changes yet, but it doesn't look like you added any reserved keywords, which is good. It does add unreserved keywords PATH and COLUMNS ; I'm not sure what policy for unreserved keywords is or the significance of that. New ereport() calls specify ERRCODEs, which is good. PostgreSQL already has XPATH support in the form of xmlexists(...) etc. Why is getXPathToken() etc needed? What re-use is possible here? There's no explanation in the patch header or comments. Should the new xpath parser be re-used by the existing xpath stuff? Why can't we use libxml's facilities? etc. This at least needs explaining in the submission, and some kind of hint as to why we have two different ways to do it is needed in the code. If we do need a new XML parser, should it be bundled in adt/xml.c along with a lot of user-facing functionality, or a separate file? How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to this? It doesn't look like they're intended to be called directly by the user, and they're not documented (or commented). I don't understand this at all: +/* + * There are different requests from XMLTABLE, JSON_TABLE functions + * on passed data than has CREATE TABLE command. It is reason for + * introduction special structure instead using ColumnDef. + */ +typedef struct TableExprRawCol +{ + NodeTag type; + char *colname; + TypeName *typeName; + bool for_ordinality; + bool is_not_null; + Node *path_expr; + Node *default_expr; + int location; +} TableExprRawCol; That's my first-pass commentary. I'll return to this once you've had a chance to take a look at these and tell me all the places I got it wrong ;) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: