Thread: patch: function xmltable
Attachment
I invite any help with documentation and testing.HiI am sending implementation of xmltable function. The code should to have near to final quality and it is available for testing.
PavelRegards
Attachment
Hi2016-08-19 10:58 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:I invite any help with documentation and testing.HiI am sending implementation of xmltable function. The code should to have near to final quality and it is available for testing.new update - the work with nodes is much more correct now.
RegardsPavelPavelRegards
Attachment
Attachment
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > minor update - using DefElem instead own private parser type I'm really glad that you're doing this and I'll take a look at it for this CF. It's quite a big patch so I expect this will take a few rounds of review and updating. Patch applies cleanly and builds cleanly on master both with and without --with-xml . Overall, I think this needs to be revised with appropriate comments. Whitespace/formatting needs fixing since it's all over the place. Documentation is insufficient (per notes below). Re identifier naming, some of this code uses XmlTable naming patterns, some uses TableExpr prefixes. Is that intended to indicate a bounary between things re-usable for other structured data ingesting functions? Do you expect a "JSONEXPR" or similar in future? That's alluded to by +/*---------- + * TableExpr - used for XMLTABLE function + * + * This can be used for json_table, jsonb_table functions in future + *---------- + */ +typedef struct TableExpr +{ ... If so, should this really be two patches, one to add the table expression infrastructure and another to add XMLTABLE that uses it? Also, why in that case does so much of the TableExpr code call directly into XmlTable code? It doesn't look very generic. Overall I find identifier naming to be a bit inconsisent and think it's necessary to make it clear that all the "TableExpr" stuff is for XMLTABLE specifically, if that's the case, or make the delineation clearer if not. I'd also like to see tests that exercise the ruleutils get_rule_expr parts of the code for the various XMLTABLE variants. Similarly, since this seems to add a new xpath parser, that needs comprehensive tests. Maybe re-purpose an existing xpath test data set? More detailed comments: ==== Docs comments: The <function>xmltable</function> produces [a] table based on [the] passed XML value. The docs are pretty minimal and don't explain the various clauses of XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is there a good cross reference link available? The PASSING clause? etc. How does XMLTABLE decide what to iterate over, and how to iterate over it? Presumably the FOR ORDINALITY clause makes a column emit a numeric counter. What standard, if any, does this conform to? Does it resemble implementations elsewhere? What limitations or unsupported features does it have relative to those standards? execEvalTableExpr seems to be defined twice, with a difference in case. This is probably not going to fly: +static Datum +execEvalTableExpr(TableExprState *tstate, + ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone) +{ +static Datum +ExecEvalTableExpr(TableExprState *tstate, + ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone) +{ It looks like you've split the function into a "guts" and "wrapper" part, with the error handling PG_TRY / PG_CATCH block in the wrapper. That seems reasonable for readability, but the naming isn't. A comment is needed to explain what ExecEvalTableExpr is / does. If it's XMLTABLE specific (which it looks like based on the code), its name should reflect that. This pattern is repeated elsewhere; e.g. TableExprState is really the state for an XMLTABLE expression. But PostgreSQL actually has TABLE statements, and in future we might want to support table-expressions, so I don't think this naming is appropriate. This is made worse by the lack of comments on things like the definition of TableExprState. Please use something that makes it clear it's for XMLTABLE and add appropriate comments. Formatting of variables, arguments, function signatures etc is random/haphazard and doesn't follow project convention. It's neither aligned or unaligned in the normal way, I don't understand the random spacing at all. Maybe you should try to run pgindent and then extract just the changes related to your patch? Or run your IDE/editor's indent function on your changes? Right now it's actually kind of hard to read. Do you edit with tabstop set to 1 normally or something like that? There's a general lack of comments throughout the added code. In execEvalTableExpr, why are we looping over namespaces? What's that for? Comment would be nice. Typo: Path caclulation => Path calculation What does XmlTableSetRowPath() do? It seems to copy its argument. Nothing further is done with the row_path argument after it's called by execEvalTableExpr, so what context is that memory in and do we have to worry about it if it's large? execEvalTableExpr says it's doing "path calculation". What it actually appears to do is evaluate the path expressions, if provided, and otherwise use the column name as the implied path expression. (The docs should mention that). It's wasn't immediately obvious to me what the branch around tstate->for_ordinality_col is for and what the alternate path's purpose is in terms of XMLTABLE's behaviour, until I read the parser definition. That's largely because the behaviour of XMLTABLE is underspecified in the docs, since once you know ORDINALITY columns exist it's pretty obvious what it's doing. Similarly, for the alternate branch tstate->ncols , the XmlTableGetRowValue call there is meant to do what exactly, and why/under what conditions? Is it for situations where the field type is a whole-row value? a composite type? (I'm deliberately not studying this too deeply, these are points I'd like to see commented so it can be understood to some reasonable degree at a skim-read). /* result is one more columns every time */ "one or more" /* when typmod is not valid, refresh it */ if (te->typmod == -1) Is this a cache? How is it valid or not valid and when? The comment (thanks!) on TableExprGetTupleDesc says: /** When we skip transform stage (in view), then TableExpr's* TupleDesc should not be valid. Refresh is necessary.*/ but I'm not really grasping what you're trying to explain here. What transform stage? What view? This could well be my ignorance of this part of the code; if it should be understandable by a reader who is appropriately familiar with the executor that's fine, but if it's specific to how XMLTABLE works some more explanation would be good. Good that you've got all the required node copy/in/out funcs in place. Please don't use the name "used_dns". Anyone reading that will read it as "domain name service" and that's actually confusing with XML because of XML schema lookups. Maybe used_defnamespace ? used def_ns? I haven't looked closely at keyword/parser changes yet, but it doesn't look like you added any reserved keywords, which is good. It does add unreserved keywords PATH and COLUMNS ; I'm not sure what policy for unreserved keywords is or the significance of that. New ereport() calls specify ERRCODEs, which is good. PostgreSQL already has XPATH support in the form of xmlexists(...) etc. Why is getXPathToken() etc needed? What re-use is possible here? There's no explanation in the patch header or comments. Should the new xpath parser be re-used by the existing xpath stuff? Why can't we use libxml's facilities? etc. This at least needs explaining in the submission, and some kind of hint as to why we have two different ways to do it is needed in the code. If we do need a new XML parser, should it be bundled in adt/xml.c along with a lot of user-facing functionality, or a separate file? How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to this? It doesn't look like they're intended to be called directly by the user, and they're not documented (or commented). I don't understand this at all: +/* + * There are different requests from XMLTABLE, JSON_TABLE functions + * on passed data than has CREATE TABLE command. It is reason for + * introduction special structure instead using ColumnDef. + */ +typedef struct TableExprRawCol +{ + NodeTag type; + char *colname; + TypeName *typeName; + bool for_ordinality; + bool is_not_null; + Node *path_expr; + Node *default_expr; + int location; +} TableExprRawCol; That's my first-pass commentary. I'll return to this once you've had a chance to take a look at these and tell me all the places I got it wrong ;) -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 4 September 2016 at 16:06, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> minor update - using DefElem instead own private parser type
I'm really glad that you're doing this and I'll take a look at it for this CF.
It's quite a big patch so I expect this will take a few rounds of
review and updating.
Patch applies cleanly and builds cleanly on master both with and
without --with-xml .
Overall, I think this needs to be revised with appropriate comments.
Whitespace/formatting needs fixing since it's all over the place.
Documentation is insufficient (per notes below).
Re identifier naming, some of this code uses XmlTable naming patterns,
some uses TableExpr prefixes. Is that intended to indicate a bounary
between things re-usable for other structured data ingesting
functions? Do you expect a "JSONEXPR" or similar in future? That's
alluded to by
+/*----------
+ * TableExpr - used for XMLTABLE function
+ *
+ * This can be used for json_table, jsonb_table functions in future
+ *----------
+ */
+typedef struct TableExpr
+{
...
If so, should this really be two patches, one to add the table
expression infrastructure and another to add XMLTABLE that uses it?
Also, why in that case does so much of the TableExpr code call
directly into XmlTable code? It doesn't look very generic.
Overall I find identifier naming to be a bit inconsisent and think
it's necessary to make it clear that all the "TableExpr" stuff is for
XMLTABLE specifically, if that's the case, or make the delineation
clearer if not.
I'd also like to see tests that exercise the ruleutils get_rule_expr
parts of the code for the various XMLTABLE variants.
Similarly, since this seems to add a new xpath parser, that needs
comprehensive tests. Maybe re-purpose an existing xpath test data set?
More detailed comments:
====
Docs comments:
The <function>xmltable</function> produces [a] table based on
[the] passed XML value.
The docs are pretty minimal and don't explain the various clauses of
XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
there a good cross reference link available? The PASSING clause? etc.
How does XMLTABLE decide what to iterate over, and how to iterate over it?
Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.
What standard, if any, does this conform to? Does it resemble
implementations elsewhere? What limitations or unsupported features
does it have relative to those standards?
execEvalTableExpr seems to be defined twice, with a difference in
case. This is probably not going to fly:
+static Datum
+execEvalTableExpr(TableExprState *tstate,
+ ExprContext *econtext,
+ bool *isNull, ExprDoneCond *isDone)
+{
+static Datum
+ExecEvalTableExpr(TableExprState *tstate,
+ ExprContext *econtext,
+ bool *isNull, ExprDoneCond *isDone)
+{
It looks like you've split the function into a "guts" and "wrapper"
part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
That seems reasonable for readability, but the naming isn't.
A comment is needed to explain what ExecEvalTableExpr is / does. If
it's XMLTABLE specific (which it looks like based on the code), its
name should reflect that. This pattern is repeated elsewhere; e.g.
TableExprState is really the state for an XMLTABLE expression. But
PostgreSQL actually has TABLE statements, and in future we might want
to support table-expressions, so I don't think this naming is
appropriate. This is made worse by the lack of comments on things like
the definition of TableExprState. Please use something that makes it
clear it's for XMLTABLE and add appropriate comments.
Formatting of variables, arguments, function signatures etc is
random/haphazard and doesn't follow project convention. It's neither
aligned or unaligned in the normal way, I don't understand the random
spacing at all. Maybe you should try to run pgindent and then extract
just the changes related to your patch? Or run your IDE/editor's
indent function on your changes? Right now it's actually kind of hard
to read. Do you edit with tabstop set to 1 normally or something like
that?
There's a general lack of comments throughout the added code.
In execEvalTableExpr, why are we looping over namespaces? What's that
for? Comment would be nice.
Typo: Path caclulation => Path calculation
What does XmlTableSetRowPath() do? It seems to copy its argument.
Nothing further is done with the row_path argument after it's called
by execEvalTableExpr, so what context is that memory in and do we have
to worry about it if it's large?
execEvalTableExpr says it's doing "path calculation". What it actually
appears to do is evaluate the path expressions, if provided, and
otherwise use the column name as the implied path expression. (The
docs should mention that).
It's wasn't immediately obvious to me what the branch around
tstate->for_ordinality_col is for and what the alternate path's
purpose is in terms of XMLTABLE's behaviour, until I read the parser
definition. That's largely because the behaviour of XMLTABLE is
underspecified in the docs, since once you know ORDINALITY columns
exist it's pretty obvious what it's doing.
Similarly, for the alternate branch tstate->ncols , the
XmlTableGetRowValue call there is meant to do what exactly, and
why/under what conditions? Is it for situations where the field type
is a whole-row value? a composite type? (I'm deliberately not studying
this too deeply, these are points I'd like to see commented so it can
be understood to some reasonable degree at a skim-read).
/* result is one more columns every time */
"one or more"
/* when typmod is not valid, refresh it */
if (te->typmod == -1)
Is this a cache? How is it valid or not valid and when? The comment
(thanks!) on TableExprGetTupleDesc says:
/*
* When we skip transform stage (in view), then TableExpr's
* TupleDesc should not be valid. Refresh is necessary.
*/
but I'm not really grasping what you're trying to explain here. What
transform stage? What view? This could well be my ignorance of this
part of the code; if it should be understandable by a reader who is
appropriately familiar with the executor that's fine, but if it's
specific to how XMLTABLE works some more explanation would be good.
Good that you've got all the required node copy/in/out funcs in place.
Please don't use the name "used_dns". Anyone reading that will read it
as "domain name service" and that's actually confusing with XML
because of XML schema lookups. Maybe used_defnamespace ? used
def_ns?
I haven't looked closely at keyword/parser changes yet, but it doesn't
look like you added any reserved keywords, which is good. It does add
unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
unreserved keywords is or the significance of that.
New ereport() calls specify ERRCODEs, which is good.
PostgreSQL already has XPATH support in the form of xmlexists(...)
etc. Why is getXPathToken() etc needed? What re-use is possible here?
There's no explanation in the patch header or comments. Should the new
xpath parser be re-used by the existing xpath stuff? Why can't we use
libxml's facilities? etc. This at least needs explaining in the
submission, and some kind of hint as to why we have two different ways
to do it is needed in the code. If we do need a new XML parser, should
it be bundled in adt/xml.c along with a lot of user-facing
functionality, or a separate file?
How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
this? It doesn't look like they're intended to be called directly by
the user, and they're not documented (or commented).
I don't understand this at all:
+/*
+ * There are different requests from XMLTABLE, JSON_TABLE functions
+ * on passed data than has CREATE TABLE command. It is reason for
+ * introduction special structure instead using ColumnDef.
+ */
+typedef struct TableExprRawCol
+{
+ NodeTag type;
+ char *colname;
+ TypeName *typeName;
+ bool for_ordinality;
+ bool is_not_null;
+ Node *path_expr;
+ Node *default_expr;
+ int location;
+} TableExprRawCol;
That's my first-pass commentary. I'll return to this once you've had a
chance to take a look at these and tell me all the places I got it
wrong ;)
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
libxml2 and our XPATH function doesn't support default namespace ( http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I implemented. This is the mayor issue of libxml2 library. Another difference between XPATH function and XMLTABLE function is using two phase searching and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using two XPATH expressions - for row data cutting and next for column data cutting (from row data). The our XPATH functions is pretty simple mapped to libxml2 XPATH API. But it is not possible with XMLTABLE function - due design of this function in standard (it is more user friendly and doesn't require exactly correct xpath expressions).
On 7 September 2016 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Overall, I think this needs to be revised with appropriate comments. >> Whitespace/formatting needs fixing since it's all over the place. >> Documentation is insufficient (per notes below). > > > I am not able to write documentation in English language :( - This function > is pretty complex - so I hope so anybody with better language skills can > help with this. It respects standard and it respects little bit different > Oracle's behave too (different order of DEFAULT and PATH parts). OK, no problem. It can't be committed without more comprehensive docs though, especially for new and nontrivial functionality. Is there some reference material you can point to so someone else can help with docs? And can you describe what differences there are between your implementation and the reference? Alternately, if you document it in Czech, do you know of anyone who could assist in translating to English for the main documentation? >> Re identifier naming, some of this code uses XmlTable naming patterns, >> some uses TableExpr prefixes. Is that intended to indicate a bounary >> between things re-usable for other structured data ingesting >> functions? Do you expect a "JSONEXPR" or similar in future? That's >> alluded to by > > > This structure should be reused by JSON_TABLE function. Now, it is little > bit strange, because there is only XMLTABLE implementation - and I have to > choose between a) using two different names now, b) renaming some part in > future. OK. Are you planning on writing this JSON_TABLE or are you leaving room for future growth? Either way is fine, just curious. > And although XMLTABLE and JSON_TABLE functions are pretty similar - share > 90% of data (input value, path, columns definitions), these functions has > different syntax - so only middle level code should be shared. That makes sense. I think it would be best if you separated out the TableExpr infrastructure from the XMLTABLE implementation though, so we can review the first level infrastrcture separately and make this a 2-patch series. Most importantly, doing it that way will help you find places where TableExpr code calls directly into XMLTABLE code. If TableExpr is supposed to be reusable for json etc, it probably shouldn't be calling XmlTable stuff directly. That also means somewhat smaller simpler patches, which probably isn't bad. I don't necessarily think this needs to be fully pluggable with callbacks etc. It doesn't sound like you expect this to be used by extensions or to have a lot of users, right? So it probably just needs clearer separation of the infrastructure layer from the xmltable layer. I think splitting the patch will make that easier to see and make it easier to find problems. My biggest complaint at the moment is that execEvalTableExpr calls initXmlTableContext(...) directly, is aware of XML namespaces directly, calls XmlTableSetRowPath() directly, calls XmlTableFetchRow() directly, etc. It is in no way generic/reusable for some later JSONTABLE feature. That needs to be fixed by: * Renaming it so it's clearly only for XMLTABLE; or * Abstracting the init context, set row path, fetch row etc operations so json ones can be plugged in later > Currently the common part is not too big - just the Node related part - I am > not sure about necessity of two patches. The problem is that the common part is all mixed in with the XMLTABLE-specific part, so it's not at all clear it can be common with something else. > I am agree, there is missing some > TableExpBuilder, where can be better isolated the XML part. Yeah, that's sort of what I'm getting at. >> execEvalTableExpr seems to be defined twice, with a difference in >> case. This is probably not going to fly: >> >> >> +static Datum >> +execEvalTableExpr(TableExprState *tstate, >> + ExprContext *econtext, >> + bool *isNull, ExprDoneCond *isDone) >> +{ >> >> +static Datum >> +ExecEvalTableExpr(TableExprState *tstate, >> + ExprContext *econtext, >> + bool *isNull, ExprDoneCond *isDone) >> +{ >> >> >> It looks like you've split the function into a "guts" and "wrapper" >> part, with the error handling PG_TRY / PG_CATCH block in the wrapper. >> That seems reasonable for readability, but the naming isn't. > > > I invite any idea how these functions should be named. Definitely not how they are ;) . They really can't differ in a single character's case. I'm not sure if PostgreSQL has any formal convention for this. Some places use _impl e.g. pg_read_barrier_impl() but that's in the context of an interface-vs-implementation separation, which isn't the case here. Some places use _internal, like AlterObjectRename_internal(...), but that's where there's an associated public/external part, which isn't the case here. Some places use _guts e.g. pg_logical_slot_get_changes_guts(...), largely where there's common use by several callers. This is a fairly arbitrary function split for readability/length. Is it actually useful to split this function up at all? Anyone else have an opinion? >> A comment is needed to explain what ExecEvalTableExpr is / does. If >> it's XMLTABLE specific (which it looks like based on the code), its >> name should reflect that. This pattern is repeated elsewhere; e.g. >> TableExprState is really the state for an XMLTABLE expression. But >> PostgreSQL actually has TABLE statements, and in future we might want >> to support table-expressions, so I don't think this naming is >> appropriate. This is made worse by the lack of comments on things like >> the definition of TableExprState. Please use something that makes it >> clear it's for XMLTABLE and add appropriate comments. > > > I understand, so using TableExpr can be strange (for XMLTABLE function). But > when we will have JSON_TABLE function, then it will have a sense. It's pretty hard to review that as shared infrastructure when it's still tangled up in xmltable specifics, though. > "TableExprState" is consistent with "TableExpr". > > Any idea how it should be changed? I think if you want it to be shareable infrasructure, you need to write it so it can be used as shared infrastructure. Not just name it that way but then make it XMLTABLE specific in actual functionality. >> /* when typmod is not valid, refresh it */ >> if (te->typmod == -1) >> >> >> Is this a cache? How is it valid or not valid and when? The comment >> (thanks!) on TableExprGetTupleDesc says: >> >> /* >> * When we skip transform stage (in view), then TableExpr's >> * TupleDesc should not be valid. Refresh is necessary. >> */ >> >> but I'm not really grasping what you're trying to explain here. What >> transform stage? What view? This could well be my ignorance of this >> part of the code; if it should be understandable by a reader who is >> appropriately familiar with the executor that's fine, but if it's >> specific to how XMLTABLE works some more explanation would be good. > > > This is most difficult part of this patch, and I am not sure it it is fully > correctly implemented. I use TupleDesc cache. The TupleDesc is created in > parser/transform stage. When the XMLTABLE is used in some view, then the > transformed parser tree is materialized - and when the view is used in > query, then this tree is loaded and the parser/transform stage is "skipped". > I'll check this code against implementation of ROW constructor and I'll try > to do more comments there. Thanks. It would be good to highlight when this does and does not happen and why. Why is it necessary at all? What happens if XMLTABLE is used in a query directly, not part of a view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared statement / plpgsql statement / etc? What about a CTE term? Not necessarily list all these cases one by one, just explain what happens, when and why. Especially if it's complex, so other readers can understand it and don't have to study it in detail to understand what is going on. It does not need to be good public-facing documentation, and details of wording, grammar etc can be fixed up later, it's the ideas that matter. Is this similar to other logic elsewhere? If so, reference that other logic so readers know where to look. That way if they're changing/bugfixing/etc one place they know there's another place that might need changing. I don't know this area of the code well enough to give a solid review of the actual functionality, and I don't yet understand what it's trying to do so it's hard to review it by studying what it actually does vs what it claims to do. Maybe Peter E can help, he said he was thinking of looking at this patch too. But more information on what it's trying to do would be a big help. >> PostgreSQL already has XPATH support in the form of xmlexists(...) >> etc. Why is getXPathToken() etc needed? What re-use is possible here? >> There's no explanation in the patch header or comments. Should the new >> xpath parser be re-used by the existing xpath stuff? Why can't we use >> libxml's facilities? etc. This at least needs explaining in the >> submission, and some kind of hint as to why we have two different ways >> to do it is needed in the code. If we do need a new XML parser, should >> it be bundled in adt/xml.c along with a lot of user-facing >> functionality, or a separate file? > > libxml2 and our XPATH function doesn't support default namespace ( > http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I > implemented. OK, that makes sense. For the purpose of getting this patch in, is it a _necessary_ feature? Can XMLTABLE be usefully implemented without it, and if so, can it be added in a subsequent patch? It would be nice to simplify this by using existing libxml2 functionality in the first version rather than adding a whole new xpath as well! > This is the mayor issue of libxml2 library. Another difference > between XPATH function and XMLTABLE function is using two phase searching > and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using > two XPATH expressions - for row data cutting and next for column data > cutting (from row data). The our XPATH functions is pretty simple mapped to > libxml2 XPATH API. But it is not possible with XMLTABLE function - due > design of this function in standard (it is more user friendly and doesn't > require exactly correct xpath expressions). So you can't use existing libxml2 xpath support to implement XMLTABLE, even without default namespaces? > I didn't find any API in libxml2 for a work with parsed xpath expressions - > I need some info about the first and last token of xpath expression - it is > base for decision about using prefix or suffix. > > This functionality (xpath expression parser) cannot be used for our XPATH > function now - maybe default namespace in future. So we'll have two different XPATH implementations for different places, with different limitations, different possible bugs, etc? What would be needed to make the new XPATH work for our built-in xpath functions too? >> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to >> this? It doesn't look like they're intended to be called directly by >> the user, and they're not documented (or commented). > > > Probably I used wrong names. XMLTABLE function is running in two different > modes - with explicitly defined columns (XmlTableGetValue is used), and > without explicitly defined columns - so result is one XML column and only > one one step searching is used (there are not column related xpath > expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue > is used for getting one column value, the function XmlTableGetRowValue is > used for getting one value too, but in special case, when there are not any > other value. So both are internal implementation of the parser-level XMLTABLE(...) construct and are not intended to be called directly by users - right? Comments please! A short comment on the function saying this would be a big help. Regarding naming, do we already have a convention for functions that are internal implementation of something the user "spells" differently? Where it's transformed by the parser? I couldn't find one. I don't much care about the names so long as there are comments explaining what calls the functions and what the user-facing interface that matches the function is. Is it safe for users to call these directly? What happens if they do so incorrectly? Why are they not in pg_proc.h? Do they need to be? >> +/* >> + * There are different requests from XMLTABLE, JSON_TABLE functions >> + * on passed data than has CREATE TABLE command. It is reason for >> + * introduction special structure instead using ColumnDef. >> + */ >> +typedef struct TableExprRawCol >> +{ >> + NodeTag type; >> + char *colname; >> + TypeName *typeName; >> + bool for_ordinality; >> + bool is_not_null; >> + Node *path_expr; >> + Node *default_expr; >> + int location; >> +} TableExprRawCol; > > I am sorry. It is my fault. Now we have very similar node ColumnDef. This > node is designed for usage in utility commands - and it is not designed for > usage inside a query. Makes sense. > I had to decide between enhancing ColumnDef node or > introduction new special node. Because there are more special attributes and > it is hard to serialize current ColumnDef, I decided to use new node. Seems reasonable. The summary is "this is the parse node for a column of an XMLTABLE expression". Suggested comment: /** This is the parsenode for a column definition in a table-expression like XMLTABLE.** We can't re-use ColumnDef here; the utility command column definition has all the* wrong attributes for use in table-expressions and just doesn't make sense here.*/ typedef struct TableExprColumn { ... }; ? Why "RawCol" ? What does it become when it's not "raw" anymore? Is that a reference to ColumnDef's raw_default and cooked_default for untransformed vs transformed parse-trees? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 September 2016 at 04:13, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Overall, I think this needs to be revised with appropriate comments.
>> Whitespace/formatting needs fixing since it's all over the place.
>> Documentation is insufficient (per notes below).
>
>
> I am not able to write documentation in English language :( - This function
> is pretty complex - so I hope so anybody with better language skills can
> help with this. It respects standard and it respects little bit different
> Oracle's behave too (different order of DEFAULT and PATH parts).
OK, no problem. It can't be committed without more comprehensive docs
though, especially for new and nontrivial functionality.
Is there some reference material you can point to so someone else can
help with docs? And can you describe what differences there are
between your implementation and the reference?
Alternately, if you document it in Czech, do you know of anyone who
could assist in translating to English for the main documentation?
>> Re identifier naming, some of this code uses XmlTable naming patterns,
>> some uses TableExpr prefixes. Is that intended to indicate a bounary
>> between things re-usable for other structured data ingesting
>> functions? Do you expect a "JSONEXPR" or similar in future? That's
>> alluded to by
>
>
> This structure should be reused by JSON_TABLE function. Now, it is little
> bit strange, because there is only XMLTABLE implementation - and I have to
> choose between a) using two different names now, b) renaming some part in
> future.
OK. Are you planning on writing this JSON_TABLE or are you leaving
room for future growth? Either way is fine, just curious.
> And although XMLTABLE and JSON_TABLE functions are pretty similar - share
> 90% of data (input value, path, columns definitions), these functions has
> different syntax - so only middle level code should be shared.
That makes sense.
I think it would be best if you separated out the TableExpr
infrastructure from the XMLTABLE implementation though, so we can
review the first level infrastrcture separately and make this a
2-patch series. Most importantly, doing it that way will help you find
places where TableExpr code calls directly into XMLTABLE code. If
TableExpr is supposed to be reusable for json etc, it probably
shouldn't be calling XmlTable stuff directly.
That also means somewhat smaller simpler patches, which probably isn't bad.
I don't necessarily think this needs to be fully pluggable with
callbacks etc. It doesn't sound like you expect this to be used by
extensions or to have a lot of users, right? So it probably just needs
clearer separation of the infrastructure layer from the xmltable
layer. I think splitting the patch will make that easier to see and
make it easier to find problems.
My biggest complaint at the moment is that execEvalTableExpr calls
initXmlTableContext(...) directly, is aware of XML namespaces
directly, calls XmlTableSetRowPath() directly, calls
XmlTableFetchRow() directly, etc. It is in no way generic/reusable for
some later JSONTABLE feature. That needs to be fixed by:
* Renaming it so it's clearly only for XMLTABLE; or
* Abstracting the init context, set row path, fetch row etc operations
so json ones can be plugged in later
> Currently the common part is not too big - just the Node related part - I am
> not sure about necessity of two patches.
The problem is that the common part is all mixed in with the
XMLTABLE-specific part, so it's not at all clear it can be common with
something else.
> I am agree, there is missing some
> TableExpBuilder, where can be better isolated the XML part.
Yeah, that's sort of what I'm getting at.
>> execEvalTableExpr seems to be defined twice, with a difference in
>> case. This is probably not going to fly:
>>
>>
>> +static Datum
>> +execEvalTableExpr(TableExprState *tstate,
>> + ExprContext *econtext,
>> + bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>> +static Datum
>> +ExecEvalTableExpr(TableExprState *tstate,
>> + ExprContext *econtext,
>> + bool *isNull, ExprDoneCond *isDone)
>> +{
>>
>>
>> It looks like you've split the function into a "guts" and "wrapper"
>> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
>> That seems reasonable for readability, but the naming isn't.
>
>
> I invite any idea how these functions should be named.
Definitely not how they are ;) . They really can't differ in a single
character's case.
I'm not sure if PostgreSQL has any formal convention for this. Some
places use _impl e.g. pg_read_barrier_impl() but that's in the
context of an interface-vs-implementation separation, which isn't the
case here.
Some places use _internal, like AlterObjectRename_internal(...), but
that's where there's an associated public/external part, which isn't
the case here.
Some places use _guts e.g. pg_logical_slot_get_changes_guts(...),
largely where there's common use by several callers.
This is a fairly arbitrary function split for readability/length. Is
it actually useful to split this function up at all?
Anyone else have an opinion?
>> A comment is needed to explain what ExecEvalTableExpr is / does. If
>> it's XMLTABLE specific (which it looks like based on the code), its
>> name should reflect that. This pattern is repeated elsewhere; e.g.
>> TableExprState is really the state for an XMLTABLE expression. But
>> PostgreSQL actually has TABLE statements, and in future we might want
>> to support table-expressions, so I don't think this naming is
>> appropriate. This is made worse by the lack of comments on things like
>> the definition of TableExprState. Please use something that makes it
>> clear it's for XMLTABLE and add appropriate comments.
>
>
> I understand, so using TableExpr can be strange (for XMLTABLE function). But
> when we will have JSON_TABLE function, then it will have a sense.
It's pretty hard to review that as shared infrastructure when it's
still tangled up in xmltable specifics, though.
> "TableExprState" is consistent with "TableExpr".
>
> Any idea how it should be changed?
I think if you want it to be shareable infrasructure, you need to
write it so it can be used as shared infrastructure. Not just name it
that way but then make it XMLTABLE specific in actual functionality.
>> /* when typmod is not valid, refresh it */
>> if (te->typmod == -1)
>>
>>
>> Is this a cache? How is it valid or not valid and when? The comment
>> (thanks!) on TableExprGetTupleDesc says:
>>
>> /*
>> * When we skip transform stage (in view), then TableExpr's
>> * TupleDesc should not be valid. Refresh is necessary.
>> */
>>
>> but I'm not really grasping what you're trying to explain here. What
>> transform stage? What view? This could well be my ignorance of this
>> part of the code; if it should be understandable by a reader who is
>> appropriately familiar with the executor that's fine, but if it's
>> specific to how XMLTABLE works some more explanation would be good.
>
>
> This is most difficult part of this patch, and I am not sure it it is fully
> correctly implemented. I use TupleDesc cache. The TupleDesc is created in
> parser/transform stage. When the XMLTABLE is used in some view, then the
> transformed parser tree is materialized - and when the view is used in
> query, then this tree is loaded and the parser/transform stage is "skipped".
> I'll check this code against implementation of ROW constructor and I'll try
> to do more comments there.
Thanks. It would be good to highlight when this does and does not
happen and why. Why is it necessary at all?
What happens if XMLTABLE is used in a query directly, not part of a
view? if XMLTABLE is used in a view? If XMLTABLE is used in a prepared
statement / plpgsql statement / etc? What about a CTE term?
Not necessarily list all these cases one by one, just explain what
happens, when and why. Especially if it's complex, so other readers
can understand it and don't have to study it in detail to understand
what is going on. It does not need to be good public-facing
documentation, and details of wording, grammar etc can be fixed up
later, it's the ideas that matter.
Is this similar to other logic elsewhere? If so, reference that other
logic so readers know where to look. That way if they're
changing/bugfixing/etc one place they know there's another place that
might need changing.
I don't know this area of the code well enough to give a solid review
of the actual functionality, and I don't yet understand what it's
trying to do so it's hard to review it by studying what it actually
does vs what it claims to do. Maybe Peter E can help, he said he was
thinking of looking at this patch too. But more information on what
it's trying to do would be a big help.
>> PostgreSQL already has XPATH support in the form of xmlexists(...)
>> etc. Why is getXPathToken() etc needed? What re-use is possible here?
>> There's no explanation in the patch header or comments. Should the new
>> xpath parser be re-used by the existing xpath stuff? Why can't we use
>> libxml's facilities? etc. This at least needs explaining in the
>> submission, and some kind of hint as to why we have two different ways
>> to do it is needed in the code. If we do need a new XML parser, should
>> it be bundled in adt/xml.c along with a lot of user-facing
>> functionality, or a separate file?
>
> libxml2 and our XPATH function doesn't support default namespace (
> http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
> implemented.
OK, that makes sense.
For the purpose of getting this patch in, is it a _necessary_ feature?
Can XMLTABLE be usefully implemented without it, and if so, can it be
added in a subsequent patch? It would be nice to simplify this by
using existing libxml2 functionality in the first version rather than
adding a whole new xpath as well!
> This is the mayor issue of libxml2 library. Another difference
> between XPATH function and XMLTABLE function is using two phase searching
> and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
> two XPATH expressions - for row data cutting and next for column data
> cutting (from row data). The our XPATH functions is pretty simple mapped to
> libxml2 XPATH API. But it is not possible with XMLTABLE function - due
> design of this function in standard (it is more user friendly and doesn't
> require exactly correct xpath expressions).
So you can't use existing libxml2 xpath support to implement XMLTABLE,
even without default namespaces?
> I didn't find any API in libxml2 for a work with parsed xpath expressions -
> I need some info about the first and last token of xpath expression - it is
> base for decision about using prefix or suffix.
>
> This functionality (xpath expression parser) cannot be used for our XPATH
> function now - maybe default namespace in future.
So we'll have two different XPATH implementations for different
places, with different limitations, different possible bugs, etc?
What would be needed to make the new XPATH work for our built-in xpath
functions too?
>> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
>> this? It doesn't look like they're intended to be called directly by
>> the user, and they're not documented (or commented).
>
>
> Probably I used wrong names. XMLTABLE function is running in two different
> modes - with explicitly defined columns (XmlTableGetValue is used), and
> without explicitly defined columns - so result is one XML column and only
> one one step searching is used (there are not column related xpath
> expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
> is used for getting one column value, the function XmlTableGetRowValue is
> used for getting one value too, but in special case, when there are not any
> other value.
So both are internal implementation of the parser-level XMLTABLE(...)
construct and are not intended to be called directly by users - right?
Comments please! A short comment on the function saying this would be
a big help.
Regarding naming, do we already have a convention for functions that
are internal implementation of something the user "spells"
differently? Where it's transformed by the parser? I couldn't find
one. I don't much care about the names so long as there are comments
explaining what calls the functions and what the user-facing interface
that matches the function is.
Is it safe for users to call these directly? What happens if they do
so incorrectly?
Why are they not in pg_proc.h? Do they need to be?
>> +/*
>> + * There are different requests from XMLTABLE, JSON_TABLE functions
>> + * on passed data than has CREATE TABLE command. It is reason for
>> + * introduction special structure instead using ColumnDef.
>> + */
>> +typedef struct TableExprRawCol
>> +{
>> + NodeTag type;
>> + char *colname;
>> + TypeName *typeName;
>> + bool for_ordinality;
>> + bool is_not_null;
>> + Node *path_expr;
>> + Node *default_expr;
>> + int location;
>> +} TableExprRawCol;
>
> I am sorry. It is my fault. Now we have very similar node ColumnDef. This
> node is designed for usage in utility commands - and it is not designed for
> usage inside a query.
Makes sense.
> I had to decide between enhancing ColumnDef node or
> introduction new special node. Because there are more special attributes and
> it is hard to serialize current ColumnDef, I decided to use new node.
Seems reasonable. The summary is "this is the parse node for a column
of an XMLTABLE expression".
Suggested comment:
/*
* This is the parsenode for a column definition in a table-expression
like XMLTABLE.
*
* We can't re-use ColumnDef here; the utility command column
definition has all the
* wrong attributes for use in table-expressions and just doesn't make
sense here.
*/
typedef struct TableExprColumn
{
...
};
?
Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Suggested comment:
/*
* This is the parsenode for a column definition in a table-expression
like XMLTABLE.
*
* We can't re-use ColumnDef here; the utility command column
definition has all the
* wrong attributes for use in table-expressions and just doesn't make
sense here.
*/
typedef struct TableExprColumn
{
...
};
?
Why "RawCol" ? What does it become when it's not "raw" anymore? Is
that a reference to ColumnDef's raw_default and cooked_default for
untransformed vs transformed parse-trees?
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 7 September 2016 at 14:44, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> >> Suggested comment: >> >> /* >> * This is the parsenode for a column definition in a table-expression >> like XMLTABLE. >> * >> * We can't re-use ColumnDef here; the utility command column >> definition has all the >> * wrong attributes for use in table-expressions and just doesn't make >> sense here. >> */ >> typedef struct TableExprColumn >> { >> ... >> }; >> >> ? >> >> Why "RawCol" ? What does it become when it's not "raw" anymore? Is >> that a reference to ColumnDef's raw_default and cooked_default for >> untransformed vs transformed parse-trees? > > > My previous reply was wrong - it is used by parser only and holds TypeName > field. The analogy with ColumnDef raw_default is perfect. Cool, lets just comment that then. I'll wait on an updated patch per discussion to date. Hopefully somebody else with more of a clue than me can offer better review of the executor/view/caching part you specifically called out as complex. Otherwise maybe it'll be clearer in a revised version. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hi,I am sending new version of this patch1. now generic TableExpr is better separated from a real content generation2. I removed cached typmod - using row type cache everywhere - it is consistent with other few places in Pg where dynamic types are used - the result tupdesc is generated few times more - but it is not on critical path.3. More comments, few more lines in doc.4. Reformated by pgindent
RegardsPavel
Attachment
On 9 September 2016 at 21:44, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>: >> >> Hi, >> >> I am sending new version of this patch >> >> 1. now generic TableExpr is better separated from a real content >> generation >> 2. I removed cached typmod - using row type cache everywhere - it is >> consistent with other few places in Pg where dynamic types are used - the >> result tupdesc is generated few times more - but it is not on critical path. >> 3. More comments, few more lines in doc. >> 4. Reformated by pgindent Thanks. I applied this on top of the same base as your prior patch so I could compare changes. The new docs look good. Thanks for that, I know it's a pain. It'll need to cover ORDINAL too, but that's not hard. I'll try to find some time to help with the docs per the references you sent offlist. Out of interest, should the syntax allow room for future expansion to permit reading from file rather than just string literal / column reference? It'd be ideal to avoid reading big documents wholly into memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't suggest adding that to this patch, just making sure adding it later would not cause problems. I see you added a builder context abstraction as discussed, so there's no longer any direct reference to XMLTABLE specifics from TableExpr code. Good, thanks for that. It'll make things much less messy when adding other table expression types as you expressed the desire to do, and means the TableExpr code now makes more sense as generic infrastructure. ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or better than execEvalTableExpr and ExecEvalTableExpr were anyway. Eventual committer will probably have opinions here. Mild nitpick: since you can have multiple namespaces, shouldn't builder->SetNS be builder->AddNS ? Added comments are helpful, thanks. On first read-through this is a big improvement and addresses all the concerns I raised. Documentation is much much better, thanks, I know that's a pain. I'll take a closer read-through shortly. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> I'll take a closer read-through shortly. Missing file. You omitted executor/tableexpr.h from the patch, so I can't compile. I've expanded and copy-edited the docs. Some of it is guesswork based on the references you sent and a glance at the code. Please check my changes carefully. I found a few surprises, like the fact that DEFAULT isn't a normal literal, it's an xpath expression evaluated at the same time as the rowexpression. Updated patch attached as XMLTABLE-v3 includes the docs changes. Note that it's missing tableexpr.h. For convenient review or to apply to your working tree I also attach a diff of just my docs changes as proposed-docs-changes.diff. Docs: - Can you send the sample data used to generate the example output? I'd like to include at least a cut down part of it in the docs to make it clear how the input correlates with output, and preferably put the whole thing in an appendix. - How does it decide what subset of the document to iterate over? That's presumably rowexpr, which is xpath in postgresql? (I added this to docs). - xmlnamespaces clause in docs needs an example for a non-default namespace. - What effect does xmlnamespaces clause have? Does supplying it allow you to reference qualified names in xpath? What happens if you don't specify it for a document that has namespaces or don't define all the namespaces? What if you reference an undefined namespace in xpath? What about if an undefined namespace isn't referenced by xpath, but is inside a node selected by an xpath expression? - What are the rules for converting the matched XML node into a value? If the matched node is not a simple text node or lacks a text node as its single child, what happens? - What happens if the matched has multiple text node children? This can happen if, for example, you have something like <matchedNode> some text <!-- comment splits up text node --> other text </matchedNode> - Is there a way to get an attribute as a value? If so, an example should show this because it's going to be a common need. Presumably you want node/@attrname ? - What happens if COLUMNS is not specified at all? It looks like it returns a single column result set with the matched entries as 'xml' type, so added to docs, please verify. - PASSING clause isn't really defined. You can specify one PASSING entry as a literal/colref/expression, and it's the argument xml document, right? The external docs you referred to say that PASSING may have a BY VALUE keyword, alias its argument with AS, and may have expressions, e.g. PASSING BY VALUE '<x/>' AS a, '<y/>' AS b Neither aliases nor multiple entries are supported by the code or grammar. Should this be documented as a restriction? Do you know if that's an extension by the other implementation or if it's SQL/XML standard? (I've drafted a docs update to cover this in the updated patch). - What does BY REF mean? Should this just be mentioned with a "see xmlexists(...)" since it seems to be compatibility noise? Is there a corresponding BY VALUE or similar? - The parser definitions re-use xmlexists_argument . I don't mind that, but it's worth noting here in case others do. - Why do the expression arguments take c_expr (anything allowed in a_expr or b_expr), not b_expr (restricted expression) ? - Column definitions are underdocumented. The grammar says they can be NOT NULL, for example, but I don't see that in any of the references you mailed me nor in the docs. What behaviour is expected for a NOT NULL column? I've documented my best guess (not checked closely against the code, please verify). - Test suggestions: - Coverage of multiple text() node children of an element, where split up by comment or similar - Coverage of xpath that matches a node with child element nodes More to come. Please review my docs changes in the mean time. I'm spending a lot more time on this than I expected so I might have to get onto other things for a while too. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote: >> I'll take a closer read-through shortly. >DEFAULT > isn't a normal literal, it's an xpath expression evaluated at the same > time as the rowexpression. Sorry for the spam, but turns out that's not the case as implemented here. The docs you referenced say it should be an xpath expression, but the implementation here is of a literal value, and examples elsewhere on the Internet show a literal value. Unclear if the referenced docs are wrong or what and I don't have anything to test with. Feel free to fix/trim the DEFAULT related changes in above docs patch as needed. Also, tests/docs should probably cover what happens when PATH matches more than one element, i.e. produces a list of more than one match. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll take a closer read-through shortly.
>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.
Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.
Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.
Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 9 September 2016 at 21:44, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
>>
>> Hi,
>>
>> I am sending new version of this patch
>>
>> 1. now generic TableExpr is better separated from a real content
>> generation
>> 2. I removed cached typmod - using row type cache everywhere - it is
>> consistent with other few places in Pg where dynamic types are used - the
>> result tupdesc is generated few times more - but it is not on critical path.
>> 3. More comments, few more lines in doc.
>> 4. Reformated by pgindent
Thanks.
I applied this on top of the same base as your prior patch so I could
compare changes.
The new docs look good. Thanks for that, I know it's a pain. It'll
need to cover ORDINAL too, but that's not hard. I'll try to find some
time to help with the docs per the references you sent offlist.
Out of interest, should the syntax allow room for future expansion to
permit reading from file rather than just string literal / column
reference? It'd be ideal to avoid reading big documents wholly into
memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
suggest adding that to this patch, just making sure adding it later
would not cause problems.
I see you added a builder context abstraction as discussed, so there's
no longer any direct reference to XMLTABLE specifics from TableExpr
code. Good, thanks for that. It'll make things much less messy when
adding other table expression types as you expressed the desire to do,
and means the TableExpr code now makes more sense as generic
infrastructure.
ExecEvalTableExprProtected and ExecEvalTableExpr are OK with me, or
better than execEvalTableExpr and ExecEvalTableExpr were anyway.
Eventual committer will probably have opinions here.
Mild nitpick: since you can have multiple namespaces, shouldn't
builder->SetNS be builder->AddNS ?
Added comments are helpful, thanks.
On first read-through this is a big improvement and addresses all the
concerns I raised. Documentation is much much better, thanks, I know
that's a pain.
I'll take a closer read-through shortly.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
> I'll take a closer read-through shortly.
Missing file. You omitted executor/tableexpr.h from the patch, so I
can't compile.
I've expanded and copy-edited the docs. Some of it is guesswork based
on the references you sent and a glance at the code. Please check my
changes carefully. I found a few surprises, like the fact that DEFAULT
isn't a normal literal, it's an xpath expression evaluated at the same
time as the rowexpression.
Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
that it's missing tableexpr.h. For convenient review or to apply to
your working tree I also attach a diff of just my docs changes as
proposed-docs-changes.diff.
Docs:
- Can you send the sample data used to generate the example output?
I'd like to include at least a cut down part of it in the docs to make
it clear how the input correlates with output, and preferably put the
whole thing in an appendix.
- How does it decide what subset of the document to iterate over?
That's presumably rowexpr, which is xpath in postgresql? (I added this
to docs).
- xmlnamespaces clause in docs needs an example for a non-default namespace.
- What effect does xmlnamespaces clause have? Does supplying it allow
you to reference qualified names in xpath? What happens if you don't
specify it for a document that has namespaces or don't define all the
namespaces? What if you reference an undefined namespace in xpath?
What about if an undefined namespace isn't referenced by xpath, but is
inside a node selected by an xpath expression?
- What are the rules for converting the matched XML node into a value?
If the matched node is not a simple text node or lacks a text node as
its single child, what happens?
- What happens if the matched has multiple text node children? This
can happen if, for example, you have something like
<matchedNode>
some text <!-- comment splits up text node --> other text
</matchedNode>
- Is there a way to get an attribute as a value? If so, an example
should show this because it's going to be a common need. Presumably
you want node/@attrname ?
- What happens if COLUMNS is not specified at all? It looks like it
returns a single column result set with the matched entries as 'xml'
type, so added to docs, please verify.
- PASSING clause isn't really defined. You can specify one PASSING
entry as a literal/colref/expression, and it's the argument xml
document, right? The external docs you referred to say that PASSING
may have a BY VALUE keyword, alias its argument with AS, and may have
expressions, e.g.
PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
Neither aliases nor multiple entries are supported by the code or
grammar. Should this be documented as a restriction? Do you know if
that's an extension by the other implementation or if it's SQL/XML
standard? (I've drafted a docs update to cover this in the updated
patch).
- What does BY REF mean? Should this just be mentioned with a "see
xmlexists(...)" since it seems to be compatibility noise? Is there a
corresponding BY VALUE or similar?
- The parser definitions re-use xmlexists_argument . I don't mind
that, but it's worth noting here in case others do.
- Why do the expression arguments take c_expr (anything allowed in
a_expr or b_expr), not b_expr (restricted expression) ?
- Column definitions are underdocumented. The grammar says they can be
NOT NULL, for example, but I don't see that in any of the references
you mailed me nor in the docs. What behaviour is expected for a NOT
NULL column? I've documented my best guess (not checked closely
against the code, please verify).
-
Test suggestions:
- Coverage of multiple text() node children of an element, where split
up by comment or similar
- Coverage of xpath that matches a node with child element nodes
More to come. Please review my docs changes in the mean time. I'm
spending a lot more time on this than I expected so I might have to
get onto other things for a while too.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 12:28, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I'll take a closer read-through shortly.
>DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.
Sorry for the spam, but turns out that's not the case as implemented
here. The docs you referenced say it should be an xpath expression,
but the implementation here is of a literal value, and examples
elsewhere on the Internet show a literal value. Unclear if the
referenced docs are wrong or what and I don't have anything to test
with.
Feel free to fix/trim the DEFAULT related changes in above docs patch as needed.
Also, tests/docs should probably cover what happens when PATH matches
more than one element, i.e. produces a list of more than one match.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Out of interest, should the syntax allow room for future expansion to >> permit reading from file rather than just string literal / column >> reference? It'd be ideal to avoid reading big documents wholly into >> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't >> suggest adding that to this patch, just making sure adding it later >> would not cause problems. > > > this is little bit different question - it is server side function, so first > question is - how to push usually client side content to server? Next > question is how to get this content to a executor. Now only COPY statement > is able to do. Probably start with support for server-side files. When people are dealing with really big files they'll be more willing to copy files to the server or bind them into the server file system over the network. The v3 protocol doesn't really allow any way for client-to-server streaming during a query, I think that's hopeless until we have a protocol bump. > updated patch attached - with your documentation. Will take a look and a better read of the code. Likely tomorrow, I've got work to do as well. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 14:02, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hi > > There is some opened questions - the standard (and some other databases) > requires entering XPath expression as string literal. > > I am thinking so it is too strong not necessary limit - (it enforces dynamic > query in more cases), so I allowed the expressions there. I agree. There's no reason not to permit expressions there, and there are many other places where we have similar extensions. > Another questions is when these expressions should be evaluated. There are > two possibilities - once per query, once per input row. I selected "once per > input row mode" - it is simpler to implement it, and it is consistent with > other "similar" generators - see the behave and related discussion to > "array_to_string" and evaluation of separator argument. The switch to "once > per query" should not be hard - but it can be strange for users, because > some his volatile expression should be stable. I would've expected once per query. There's no way the expressions can reference the row data, so there's no reason to evaluate them each time. The only use case I see for evaluating them each time is - maybe - DEFAULT. Where maybe there's a use for nextval() or other volatile functions. But honestly, I think that's better done explicitly in a post-pass, i.e. select uuid_generate_v4(), x.* from ( xmltable(.....) x ); in cases where that's what the user actually wants. There's no other case I can think of where expressions as arguments to set-returning functions are evaluated once per output row. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Out of interest, should the syntax allow room for future expansion to >> permit reading from file rather than just string literal / column >> reference? It'd be ideal to avoid reading big documents wholly into >> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't >> suggest adding that to this patch, just making sure adding it later >> would not cause problems. > > > this is little bit different question - it is server side function, so first > question is - how to push usually client side content to server? Next > question is how to get this content to a executor. Now only COPY statement > is able to do. Probably start with support for server-side files. When people are dealing with really big files they'll be more willing to copy files to the server or bind them into the server file system over the network. The v3 protocol doesn't really allow any way for client-to-server streaming during a query, I think that's hopeless until we have a protocol bump. > updated patch attached - with your documentation. >> - Can you send the sample data used to generate the example output? >> I'd like to include at least a cut down part of it in the docs to make >> it clear how the input correlates with output, and preferably put the >> whole thing in an appendix. > > > it is in regress tests. Makes sense. It's not that verbose (for XML) and I wonder if it's just worth including it in-line in the docs along with the XMLTABLE example. It'd be much easier to understand how XMLTABLE works and what it does then. >> - What effect does xmlnamespaces clause have? Does supplying it allow >> you to reference qualified names in xpath? What happens if you don't >> specify it for a document that has namespaces or don't define all the >> namespaces? What if you reference an undefined namespace in xpath? >> What about if an undefined namespace isn't referenced by xpath, but is >> inside a node selected by an xpath expression? > > > All this is under libxml2 control - when you use undefined namespace, then > libxml2 raises a error. The namespaces in document and in XPath queries are > absolutely independent - the relation is a URI. When you use bad URI > (referenced by name), then the result will be empty set. When you use > undefined name, then you will get a error. OK, makes sense. >> - What are the rules for converting the matched XML node into a value? >> If the matched node is not a simple text node or lacks a text node as >> its single child, what happens? > > This process is described and controlled by "XML SQL mapping". The Postgres > has minimalistic implementation without possibility of external control and > without schema support. The my implementation is simple. When user doesn't > specify result target like explicit using of text() function, then the > text() function is used implicitly when target type is not XML. Then I dump > result to string and I enforce related input functions for target types. OK, so a subset of the full spec functionality is provided because of limitations in Pg and libxml2. Makes sense. My only big concern here is that use of text() is a common mistake in XSLT, and I think the same thing will happen here. Users expect comments to be ignored, but in fact a comment inserts a comment node into the XML DOM, so a comment between two pieces of text produces a tree of element text() comment() text() If you match element/text(), you get a 2-node result and will presumably ERROR. There is no good way to tell this from element text() element text() when you use an xpath expression like element/text() . So you can't safely solve it just by concatenating all resulting text() nodes without surprising behaviour. >> - What happens if the matched has multiple text node children? This >> can happen if, for example, you have something like >> >> <matchedNode> >> some text <!-- comment splits up text node --> other text >> </matchedNode> > > > depends on target type - it is allowed in XML, and it is disallowed for > other types. I though about support of a arrays - but the patch will be much > more complex - there can be recursion - so I disallowed it. When the user > have to solve this issue, then he can use nested XMLTABLE functions and > nested function is working with XML type. I don't really understand how that'd work. Do you know how other implementations handle this? I think users are going to be VERY surprised when comments in text break their XML. >> - Is there a way to get an attribute as a value? If so, an example >> should show this because it's going to be a common need. Presumably >> you want node/@attrname ? > > > you can use reference to current node "." - so "./@attname" should to work - > a example is in regress tests cool, just needs mention in docs then. >> - PASSING clause isn't really defined. You can specify one PASSING >> entry as a literal/colref/expression, and it's the argument xml >> document, right? The external docs you referred to say that PASSING >> may have a BY VALUE keyword, alias its argument with AS, and may have >> expressions, e.g. >> >> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b >> >> Neither aliases nor multiple entries are supported by the code or >> grammar. Should this be documented as a restriction? > > > The ANSI allows to pass more documents - and then do complex queries with > XQuery. Passing more than one document has not sense in libxml2 based > implementation, so I didn't supported it. The referenced names can be > implemented later - but it needs to changes in XPATH function too. OK, so my docs addition that just says they're not supported should be fine. >> - What does BY REF mean? Should this just be mentioned with a "see >> xmlexists(...)" since it seems to be compatibility noise? Is there a >> corresponding BY VALUE or similar? > > When the XML document is stored as serialized DOM, then by ref means link on > this DOM. It has not sense in Postgres - because we store XML documents by > value only. Right. And since there's already precent for xmlexists there's no point worrying about whether we lose opportunities to implement it later. >> - Why do the expression arguments take c_expr (anything allowed in >> a_expr or b_expr), not b_expr (restricted expression) ? > > > I don't know - I expect the problems with parser - because PASSING is > restricted keyword in ANSI/SQL and unreserved keyword in Postgres. I mean for the rowpath argument, not the parts within xmlexists_argument. If the rowpath doesn't need to be c_expr presumably it should be a b_expr or even, if it doesn't cause parsing ambiguities, an a_expr ? There doesn't seem to be the same issue here as we have with BETWEEN etc. >> - Column definitions are underdocumented. The grammar says they can be >> NOT NULL, for example, but I don't see that in any of the references >> you mailed me nor in the docs. What behaviour is expected for a NOT >> NULL column? I've documented my best guess (not checked closely >> against the code, please verify). >> > > yes - some other databases allows it - I am thinking so it is useful. Sure. Sounds like my docs additions are probably right then, except for incorrect description of DEFAULT. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 September 2016 at 13:07, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Out of interest, should the syntax allow room for future expansion to
>> permit reading from file rather than just string literal / column
>> reference? It'd be ideal to avoid reading big documents wholly into
>> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
>> suggest adding that to this patch, just making sure adding it later
>> would not cause problems.
>
>
> this is little bit different question - it is server side function, so first
> question is - how to push usually client side content to server? Next
> question is how to get this content to a executor. Now only COPY statement
> is able to do.
Probably start with support for server-side files. When people are
dealing with really big files they'll be more willing to copy files to
the server or bind them into the server file system over the network.
The v3 protocol doesn't really allow any way for client-to-server
streaming during a query, I think that's hopeless until we have a
protocol bump.
> updated patch attached - with your documentation.
>> - Can you send the sample data used to generate the example output?
>> I'd like to include at least a cut down part of it in the docs to make
>> it clear how the input correlates with output, and preferably put the
>> whole thing in an appendix.
>
>
> it is in regress tests.
Makes sense.
It's not that verbose (for XML) and I wonder if it's just worth
including it in-line in the docs along with the XMLTABLE example. It'd
be much easier to understand how XMLTABLE works and what it does then.
>> - What effect does xmlnamespaces clause have? Does supplying it allow
>> you to reference qualified names in xpath? What happens if you don't
>> specify it for a document that has namespaces or don't define all the
>> namespaces? What if you reference an undefined namespace in xpath?
>> What about if an undefined namespace isn't referenced by xpath, but is
>> inside a node selected by an xpath expression?
>
>
> All this is under libxml2 control - when you use undefined namespace, then
> libxml2 raises a error. The namespaces in document and in XPath queries are
> absolutely independent - the relation is a URI. When you use bad URI
> (referenced by name), then the result will be empty set. When you use
> undefined name, then you will get a error.
OK, makes sense.
>> - What are the rules for converting the matched XML node into a value?
>> If the matched node is not a simple text node or lacks a text node as
>> its single child, what happens?
>
> This process is described and controlled by "XML SQL mapping". The Postgres
> has minimalistic implementation without possibility of external control and
> without schema support. The my implementation is simple. When user doesn't
> specify result target like explicit using of text() function, then the
> text() function is used implicitly when target type is not XML. Then I dump
> result to string and I enforce related input functions for target types.
OK, so a subset of the full spec functionality is provided because of
limitations in Pg and libxml2. Makes sense.
My only big concern here is that use of text() is a common mistake in
XSLT, and I think the same thing will happen here. Users expect
comments to be ignored, but in fact a comment inserts a comment node
into the XML DOM, so a comment between two pieces of text produces a
tree of
element
text()
comment()
text()
If you match element/text(), you get a 2-node result and will presumably ERROR.
There is no good way to tell this from
element
text()
element
text()
when you use an xpath expression like element/text() . So you can't
safely solve it just by concatenating all resulting text() nodes
without surprising behaviour.
>> - What happens if the matched has multiple text node children? This
>> can happen if, for example, you have something like
>>
>> <matchedNode>
>> some text <!-- comment splits up text node --> other text
>> </matchedNode>
>
>
> depends on target type - it is allowed in XML, and it is disallowed for
> other types. I though about support of a arrays - but the patch will be much
> more complex - there can be recursion - so I disallowed it. When the user
> have to solve this issue, then he can use nested XMLTABLE functions and
> nested function is working with XML type.
I don't really understand how that'd work.
Do you know how other implementations handle this?
I think users are going to be VERY surprised when comments in text
break their XML.
>> - Is there a way to get an attribute as a value? If so, an example
>> should show this because it's going to be a common need. Presumably
>> you want node/@attrname ?
>
>
> you can use reference to current node "." - so "./@attname" should to work -
> a example is in regress tests
cool, just needs mention in docs then.
>> - PASSING clause isn't really defined. You can specify one PASSING
>> entry as a literal/colref/expression, and it's the argument xml
>> document, right? The external docs you referred to say that PASSING
>> may have a BY VALUE keyword, alias its argument with AS, and may have
>> expressions, e.g.
>>
>> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>>
>> Neither aliases nor multiple entries are supported by the code or
>> grammar. Should this be documented as a restriction?
>
>
> The ANSI allows to pass more documents - and then do complex queries with
> XQuery. Passing more than one document has not sense in libxml2 based
> implementation, so I didn't supported it. The referenced names can be
> implemented later - but it needs to changes in XPATH function too.
OK, so my docs addition that just says they're not supported should be fine.
>> - What does BY REF mean? Should this just be mentioned with a "see
>> xmlexists(...)" since it seems to be compatibility noise? Is there a
>> corresponding BY VALUE or similar?
>
> When the XML document is stored as serialized DOM, then by ref means link on
> this DOM. It has not sense in Postgres - because we store XML documents by
> value only.
Right. And since there's already precent for xmlexists there's no
point worrying about whether we lose opportunities to implement it
later.
>> - Why do the expression arguments take c_expr (anything allowed in
>> a_expr or b_expr), not b_expr (restricted expression) ?
>
>
> I don't know - I expect the problems with parser - because PASSING is
> restricted keyword in ANSI/SQL and unreserved keyword in Postgres.
I mean for the rowpath argument, not the parts within
xmlexists_argument. If the rowpath doesn't need to be c_expr
presumably it should be a b_expr or even, if it doesn't cause parsing
ambiguities, an a_expr ? There doesn't seem to be the same issue here
as we have with BETWEEN etc.
>> - Column definitions are underdocumented. The grammar says they can be
>> NOT NULL, for example, but I don't see that in any of the references
>> you mailed me nor in the docs. What behaviour is expected for a NOT
>> NULL column? I've documented my best guess (not checked closely
>> against the code, please verify).
>>
>
> yes - some other databases allows it - I am thinking so it is useful.
Sure. Sounds like my docs additions are probably right then, except
for incorrect description of DEFAULT.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 15 September 2016 at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote: > b_expr enforces shift/reduce conflict :( No problem then. I just thought it'd be worth allowing more if it worked to do so. > I found other opened question - how we can translate empty tag to SQL value? > The Oracle should not to solve this question, but PostgreSQL does. Some > databases returns empty string. Oracle doesn't solve the problem? it ERRORs? > I prefer return a empty string - not null in this case. I agree, and that's consistent with how most XML is interpreted. XSLT for example considers <x></x> and <x/> to be pretty much the same thing. > The reason is simple > - Empty string is some information - and NULL is less information. When it > is necessary I can transform empty string to NULL - different direction is > not unique. Yep, I definitely agree. The only issue is if people want a DEFAULT to be applied for empty tags. But that's something they can do in a post-process pass easily enough, since XMLTABLE is callable as a subquery / WITH expression / etc. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 9/12/16 1:14 AM, Craig Ringer wrote: > I would've expected once per query. There's no way the expressions can > reference the row data, so there's no reason to evaluate them each > time. > > The only use case I see for evaluating them each time is - maybe - > DEFAULT. Where maybe there's a use for nextval() or other volatile > functions. But honestly, I think that's better done explicitly in a > post-pass, i.e. > > select uuid_generate_v4(), x.* > from ( > xmltable(.....) x > ); > > in cases where that's what the user actually wants. > > There's no other case I can think of where expressions as arguments to > set-returning functions are evaluated once per output row. The SQL standard appears to show what the behavior ought to be: <XML table> is equivalent to LATERAL ( XNDC SELECT SLI1 AS CN1, SLI2 AS CN2, ..., SLINC AS CNNC FROM XMLITERATE ( XMLQUERY ( XTRP XQAL RETURNING SEQUENCE BY REF EMPTY ON EMPTY ) ) AS I ( V, N ) ) AS CORR DCLP and SLIj is CASE WHEN XEj THEN XMLCAST( XQCj AS DTj CPMj ) ELSE DEFj END where DEFj is the default expression. So simplified it is LATERAL ( SELECT CASE WHEN ... ELSE DEFj END, ... FROM something ) which indicates that the default expression is evaluated for every row. If we're not sure about all this, it might be worth restricting the default expressions to stable or immutable expressions for the time being. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15 September 2016 at 19:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> b_expr enforces shift/reduce conflict :(
No problem then. I just thought it'd be worth allowing more if it
worked to do so.
> I found other opened question - how we can translate empty tag to SQL value?
> The Oracle should not to solve this question, but PostgreSQL does. Some
> databases returns empty string.
Oracle doesn't solve the problem? it ERRORs?
> I prefer return a empty string - not null in this case.
I agree, and that's consistent with how most XML is interpreted. XSLT
for example considers <x></x> and <x/> to be pretty much the same
thing.
> The reason is simple
> - Empty string is some information - and NULL is less information. When it
> is necessary I can transform empty string to NULL - different direction is
> not unique.
Yep, I definitely agree. The only issue is if people want a DEFAULT to
be applied for empty tags. But that's something they can do in a
post-process pass easily enough, since XMLTABLE is callable as a
subquery / WITH expression / etc.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
* call forgotten check_srf_call_placement* few more regress tests* doc is moved to better place - xml processing functionsHinew update:
PavelRegards
Attachment
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote: > another small update - fix XMLPath parser - support multibytes characters I'm returning for another round of review. The code doesn't handle the 5 XML built-in entities correctly in text-typed output. It processes ' and " but not &, < or > . See added test. I have not fixed this, but I think it's clearly broken: + -- XML builtin entities + SELECT * FROM xmltable('/x/a' PASSING '<x><a><ent>'</ent></a><a><ent>"</ent></a><a><ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>' COLUMNS ent text); + ent + ------- + ' + " + & + < + > + (5 rows) so I've adjusted the docs to claim that they're expanded. The code needs fixing to avoid entity-escaping when the output column type is not 'xml'. ' and " entities in xml-typed output are expanded, not preserved. I don't know if this is intended but I suspect it is: + SELECT * FROM xmltable('/x/a' PASSING '<x><a><ent>'</ent></a><a><ent>"</ent></a><a><ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>' COLUMNS ent xml); + ent + ------------------ + <ent>'</ent> + <ent>"</ent> + <ent>&</ent> + <ent><</ent> + <ent>></ent> + (5 rows) For the docs changes relevant to the above search for "The five predefined XML entities". Adjust that bit of docs if I guessed wrong about the intended behaviour. The tests don't cover CDATA or PCDATA . I didn't try to add that, but they should. Did some docs copy-editing and integrated some examples. Explained how nested elements work, that multiple top level elements is an error, etc. Explained the time-of-evaluation stuff. Pointed out that you can refer to prior output columns in PATH and DEFAULT, since that's weird and unusual compared to normal SQL. Documented handling of multiple node matches, including the surprising results of somepath/text() on <somepath>x<!--blah-->y</somepath>. Documented handling of nested elements. Documented that xmltable works only on XML documents, not fragments/forests. Regarding evaluation time, it struck me that evaluating path expressions once per row means the xpath must be parsed and processed once per row. Isn't it desirable to store and re-use the preparsed xpath? I don't think this is a major problem, since we can later detect stable/immutable expressions including constants, evaluate only once in that case, and cache. It's just worth thinking about. The docs and tests don't seem to cover XML entities. What's the behaviour there? Core XML only defines one entity, but if a schema defines more how are they processed? The tests need to cover the predefined entities " & ' < and > at least. I have no idea whether the current code can fetch a DTD and use any <!ENTITY > declarations to expand entities, but I'm guessing not? If not, external DTDs, and internal DTDs with external entities should be documented as unsupported. It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?> <!DOCTYPE foo [ <!ELEMENT foo (#PCDATA)> <!ENTITY pg "PostgreSQL"> ]> <foo>Hello &pg;.</foo> $XML$ COLUMNS foo text); + ERROR: invalid XML content + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ... + ^ + DETAIL: line 2: StartTag: invalid element name + <!DOCTYPE foo [ + ^ + line 3: StartTag: invalid element name + <!ELEMENT foo (#PCDATA)> + ^ + line 4: StartTag: invalid element name + <!ENTITY pg "PostgreSQL"> + ^ + line 6: Entity 'pg' not defined + <foo>Hello &pg;.</foo> + ^ libxml seems to support documents with internal DTDs: $ xmllint --valid /tmp/x <?xml version="1.0" standalone="yes"?> <!DOCTYPE foo [ <!ELEMENT foo (#PCDATA)> <!ENTITY pg "PostgreSQL"> ]> <foo>Hello &pg;.</foo> so presumably the issue lies in the xpath stuff? Note that it's not even ignoring the DTD and choking on the undefined entity, it's choking on the DTD its self. OK, code comments: In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP instead of a PG_TRY() / PG_CATCH() block? I think the new way you handle the type stuff is much, much better, and with comments to explain too. Thanks very much. There's an oversight in tableexpr vs xmltable separation here: + case T_TableExpr: + *name = "xmltable"; + return 2; presumably you need to look at the node and decide what kind of table expression it is or just use a generic "tableexpr". Same problem here: + case T_TableExpr: + { + TableExpr *te = (TableExpr *) node; + + /* c_expr shoud be closed in brackets */ + appendStringInfoString(buf, "XMLTABLE("); I don't have the libxml knowledge or remaining brain to usefully evaluate the xpath and xml specifics in xpath.c today. It does strike me that the new xpath parser should probably live in its own file, though. I think this is all a big improvement. Barring the notes above and my lack of review of the guts of the xml.c parts of it, I'm pretty happy with what I see now. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> Did some docs copy-editing and integrated some examples. Whoops, forgot to attach. Rather than sending a whole new copy of the patch, here's a diff against your patched tree of my changes so you can see what I've done and apply the parts you want. Note that I didn't updated the expected files. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> another small update - fix XMLPath parser - support multibytes characters
I'm returning for another round of review.
The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes ' and " but not &, < or
> . See added test. I have not fixed this, but I think it's clearly
broken:
+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>'</ent></a><a><ent>"</ent></a><a>< ent>&</ent></a><a><ent>& lt;</ent></a><a><ent>></ ent></a></x>'
COLUMNS ent text);
+ ent
+ -------
+ '
+ "
+ &
+ <
+ >
+ (5 rows)
so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.
' and " entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>'</ent></a><a><ent>"</ent></a><a>< ent>&</ent></a><a><ent>& lt;</ent></a><a><ent>></ ent></a></x>'
COLUMNS ent xml);
+ ent
+ ------------------
+ <ent>'</ent>
+ <ent>"</ent>
+ <ent>&</ent>
+ <ent><</ent>
+ <ent>></ent>
+ (5 rows)
For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.
The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.
Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.
Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.
The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities " & ' < and > at least.
I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.
It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);
+ ERROR: invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+ ^
+ DETAIL: line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+ ^
+ line 3: StartTag: invalid element name
+ <!ELEMENT foo (#PCDATA)>
+ ^
+ line 4: StartTag: invalid element name
+ <!ENTITY pg "PostgreSQL">
+ ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+ ^
libxml seems to support documents with internal DTDs:
$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.
OK, code comments:
In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?
I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.
There's an oversight in tableexpr vs xmltable separation here:
+ case T_TableExpr:
+ *name = "xmltable";
+ return 2;
presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".
Same problem here:
+ case T_TableExpr:
+ {
+ TableExpr *te = (TableExpr *) node;
+
+ /* c_expr shoud be closed in brackets */
+ appendStringInfoString(buf, "XMLTABLE(");
I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.
I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
> Did some docs copy-editing and integrated some examples.
Whoops, forgot to attach.
Rather than sending a whole new copy of the patch, here's a diff
against your patched tree of my changes so you can see what I've done
and apply the parts you want.
Note that I didn't updated the expected files.
to a simple value before calling the function. <literal>PATH</>
+ expressions are normally evaluated <emphasis>exactly once per result row ## per input row
+ </emphasis>,
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 22 September 2016 at 02:31, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> another small update - fix XMLPath parser - support multibytes characters
I'm returning for another round of review.
The code doesn't handle the 5 XML built-in entities correctly in
text-typed output. It processes ' and " but not &, < or
> . See added test. I have not fixed this, but I think it's clearly
broken:
+ -- XML builtin entities
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>'</ent></a><a><ent>"</ent></a><a>< ent>&</ent></a><a><ent>& lt;</ent></a><a><ent>></ ent></a></x>'
COLUMNS ent text);
+ ent
+ -------
+ '
+ "
+ &
+ <
+ >
+ (5 rows)
so I've adjusted the docs to claim that they're expanded. The code
needs fixing to avoid entity-escaping when the output column type is
not 'xml'.
' and " entities in xml-typed output are expanded, not
preserved. I don't know if this is intended but I suspect it is:
+ SELECT * FROM xmltable('/x/a' PASSING
'<x><a><ent>'</ent></a><a><ent>"</ent></a><a>< ent>&</ent></a><a><ent>& lt;</ent></a><a><ent>></ ent></a></x>'
COLUMNS ent xml);
+ ent
+ ------------------
+ <ent>'</ent>
+ <ent>"</ent>
+ <ent>&</ent>
+ <ent><</ent>
+ <ent>></ent>
+ (5 rows)
For the docs changes relevant to the above search for "The five
predefined XML entities". Adjust that bit of docs if I guessed wrong
about the intended behaviour.
The tests don't cover CDATA or PCDATA . I didn't try to add that, but
they should.
Did some docs copy-editing and integrated some examples. Explained how
nested elements work, that multiple top level elements is an error,
etc. Explained the time-of-evaluation stuff. Pointed out that you can
refer to prior output columns in PATH and DEFAULT, since that's weird
and unusual compared to normal SQL. Documented handling of multiple
node matches, including the surprising results of somepath/text() on
<somepath>x<!--blah-->y</somepath>. Documented handling of nested
elements. Documented that xmltable works only on XML documents, not
fragments/forests.
Regarding evaluation time, it struck me that evaluating path
expressions once per row means the xpath must be parsed and processed
once per row. Isn't it desirable to store and re-use the preparsed
xpath? I don't think this is a major problem, since we can later
detect stable/immutable expressions including constants, evaluate only
once in that case, and cache. It's just worth thinking about.
The docs and tests don't seem to cover XML entities. What's the
behaviour there? Core XML only defines one entity, but if a schema
defines more how are they processed? The tests need to cover the
predefined entities " & ' < and > at least.
I have no idea whether the current code can fetch a DTD and use any
<!ENTITY > declarations to expand entities, but I'm guessing not? If
not, external DTDs, and internal DTDs with external entities should be
documented as unsupported.
It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
$XML$ COLUMNS foo text);
+ ERROR: invalid XML content
+ LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
+ ^
+ DETAIL: line 2: StartTag: invalid element name
+ <!DOCTYPE foo [
+ ^
+ line 3: StartTag: invalid element name
+ <!ELEMENT foo (#PCDATA)>
+ ^
+ line 4: StartTag: invalid element name
+ <!ENTITY pg "PostgreSQL">
+ ^
+ line 6: Entity 'pg' not defined
+ <foo>Hello &pg;.</foo>
+ ^
postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
postgres$# <!DOCTYPE foo [
postgres$# <!ELEMENT foo (#PCDATA)>
postgres$# <!ENTITY pg "PostgreSQL">
postgres$# ]>
postgres$# <foo>Hello &pg;.</foo>
postgres$# $XML$::xml;
ERROR: invalid XML content
LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
^
DETAIL: line 2: StartTag: invalid element name
<!DOCTYPE foo [
^
line 3: StartTag: invalid element name
<!ELEMENT foo (#PCDATA)>
^
line 4: StartTag: invalid element name
<!ENTITY pg "PostgreSQL">
^
line 6: Entity 'pg' not defined
<foo>Hello &pg;.</foo>
^
libxml seems to support documents with internal DTDs:
$ xmllint --valid /tmp/x
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE foo [
<!ELEMENT foo (#PCDATA)>
<!ENTITY pg "PostgreSQL">
]>
<foo>Hello &pg;.</foo>
I removed this tests - it is not related to XMLTABLE function, but to generic XML processing/validation.
so presumably the issue lies in the xpath stuff? Note that it's not
even ignoring the DTD and choking on the undefined entity, it's
choking on the DTD its self.
OK, code comments:
In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
instead of a PG_TRY() / PG_CATCH() block?
I think the new way you handle the type stuff is much, much better,
and with comments to explain too. Thanks very much.
There's an oversight in tableexpr vs xmltable separation here:
+ case T_TableExpr:
+ *name = "xmltable";
+ return 2;
presumably you need to look at the node and decide what kind of table
expression it is or just use a generic "tableexpr".
Same problem here:
+ case T_TableExpr:
+ {
+ TableExpr *te = (TableExpr *) node;
+
+ /* c_expr shoud be closed in brackets */
+ appendStringInfoString(buf, "XMLTABLE(");
I don't have the libxml knowledge or remaining brain to usefully
evaluate the xpath and xml specifics in xpath.c today. It does strike
me that the new xpath parser should probably live in its own file,
though.
I think this is all a big improvement. Barring the notes above and my
lack of review of the guts of the xml.c parts of it, I'm pretty happy
with what I see now.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Did some docs copy-editing and integrated some examples. Explained how >> nested elements work, that multiple top level elements is an error, >> etc. Explained the time-of-evaluation stuff. Pointed out that you can >> refer to prior output columns in PATH and DEFAULT, since that's weird >> and unusual compared to normal SQL. Documented handling of multiple >> node matches, including the surprising results of somepath/text() on >> <somepath>x<!--blah-->y</somepath>. Documented handling of nested >> elements. Documented that xmltable works only on XML documents, not >> fragments/forests. > > > I don't understand to this sentence: "It is possible for a PATH expression > to reference output columns that appear before it in the column-list, so > paths may be dynamically constructed based on other parts of the XML > document:" >> The docs and tests don't seem to cover XML entities. What's the >> behaviour there? Core XML only defines one entity, but if a schema >> defines more how are they processed? The tests need to cover the >> predefined entities " & ' < and > at least. > > > I don't understand, what you are propose here. ?? Please, can you send some > examples. Per below - handling of DTD <!ENTITY> declarations, and the builtin entity tests I already added tests for. >> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?): >> >> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" >> standalone="yes" ?> >> <!DOCTYPE foo [ >> <!ELEMENT foo (#PCDATA)> >> <!ENTITY pg "PostgreSQL"> >> ]> >> <foo>Hello &pg;.</foo> >> $XML$ COLUMNS foo text); >> >> + ERROR: invalid XML content >> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ... >> + ^ >> + DETAIL: line 2: StartTag: invalid element name >> + <!DOCTYPE foo [ >> + ^ >> + line 3: StartTag: invalid element name >> + <!ELEMENT foo (#PCDATA)> >> + ^ >> + line 4: StartTag: invalid element name >> + <!ENTITY pg "PostgreSQL"> >> + ^ >> + line 6: Entity 'pg' not defined >> + <foo>Hello &pg;.</foo> >> + ^ >> > > It is rejected before XMLTABLE function call > > postgres=# select $XML$<?xml version="1.0" standalone="yes" ?> > postgres$# <!DOCTYPE foo [ > postgres$# <!ELEMENT foo (#PCDATA)> > postgres$# <!ENTITY pg "PostgreSQL"> > postgres$# ]> > postgres$# <foo>Hello &pg;.</foo> > postgres$# $XML$::xml; > ERROR: invalid XML content > LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?> > ^ > DETAIL: line 2: StartTag: invalid element name > <!DOCTYPE foo [ [snip] > It is disabled by default in libxml2. I found a function > xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html > http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault > > The default behave should be common for all PostgreSQL's libxml2 based > function - and then it is different topic - maybe part for PostgreSQL ToDo? > But I don't remember any user requests related to this issue. OK, so it's not xmltable specific. Fine by me. Somebody who cares can deal with it. There's clearly nobody breaking down the walls wanting the feature. > I removed this tests - it is not related to XMLTABLE function, but to > generic XML processing/validation. Good plan. >> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP >> instead of a PG_TRY() / PG_CATCH() block? > > > If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you > want to catch FATAL errors (and when you want to clean shared memory). > XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors. Ok, makes sense. >> I don't have the libxml knowledge or remaining brain to usefully >> evaluate the xpath and xml specifics in xpath.c today. It does strike >> me that the new xpath parser should probably live in its own file, >> though. > > moved Thanks. > new version is attached Great. I'm marking this ready for committer at this point. I think the XML parser likely needs a more close reading, so I'll ping Peter E to see if he'll have a chance to check that bit out. But by and large I think the issues have been ironed out - in terms of functionality, structure and clarity I think it's looking solid. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"
>> The docs and tests don't seem to cover XML entities. What's the
>> behaviour there? Core XML only defines one entity, but if a schema
>> defines more how are they processed? The tests need to cover the
>> predefined entities " & ' < and > at least.
>
>
> I don't understand, what you are propose here. ?? Please, can you send some
> examples.
Per below - handling of DTD <!ENTITY> declarations, and the builtin
entity tests I already added tests for.[snip]
>> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>>
>> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
>> standalone="yes" ?>
>> <!DOCTYPE foo [
>> <!ELEMENT foo (#PCDATA)>
>> <!ENTITY pg "PostgreSQL">
>> ]>
>> <foo>Hello &pg;.</foo>
>> $XML$ COLUMNS foo text);
>>
>> + ERROR: invalid XML content
>> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
>> + ^
>> + DETAIL: line 2: StartTag: invalid element name
>> + <!DOCTYPE foo [
>> + ^
>> + line 3: StartTag: invalid element name
>> + <!ELEMENT foo (#PCDATA)>
>> + ^
>> + line 4: StartTag: invalid element name
>> + <!ENTITY pg "PostgreSQL">
>> + ^
>> + line 6: Entity 'pg' not defined
>> + <foo>Hello &pg;.</foo>
>> + ^
>>
>
> It is rejected before XMLTABLE function call
>
> postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> postgres$# <!DOCTYPE foo [
> postgres$# <!ELEMENT foo (#PCDATA)>
> postgres$# <!ENTITY pg "PostgreSQL">
> postgres$# ]>
> postgres$# <foo>Hello &pg;.</foo>
> postgres$# $XML$::xml;
> ERROR: invalid XML content
> LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
> ^
> DETAIL: line 2: StartTag: invalid element name
> <!DOCTYPE foo [
> It is disabled by default in libxml2. I found a function
> xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> http://www.xmlsoft.org/html/libxml-parser.html# xmlSubstituteEntitiesDefault
>
> The default behave should be common for all PostgreSQL's libxml2 based
> function - and then it is different topic - maybe part for PostgreSQL ToDo?
> But I don't remember any user requests related to this issue.
OK, so it's not xmltable specific. Fine by me.
Somebody who cares can deal with it. There's clearly nobody breaking
down the walls wanting the feature.
> I removed this tests - it is not related to XMLTABLE function, but to
> generic XML processing/validation.
Good plan.
>> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
>> instead of a PG_TRY() / PG_CATCH() block?
>
>
> If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you
> want to catch FATAL errors (and when you want to clean shared memory).
> XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.
Ok, makes sense.
>> I don't have the libxml knowledge or remaining brain to usefully
>> evaluate the xpath and xml specifics in xpath.c today. It does strike
>> me that the new xpath parser should probably live in its own file,
>> though.
>
> moved
Thanks.
> new version is attached
Great.
I'm marking this ready for committer at this point.
I think the XML parser likely needs a more close reading, so I'll ping
Peter E to see if he'll have a chance to check that bit out. But by
and large I think the issues have been ironed out - in terms of
functionality, structure and clarity I think it's looking solid.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Did some docs copy-editing and integrated some examples. Explained how >> nested elements work, that multiple top level elements is an error, >> etc. Explained the time-of-evaluation stuff. Pointed out that you can >> refer to prior output columns in PATH and DEFAULT, since that's weird >> and unusual compared to normal SQL. Documented handling of multiple >> node matches, including the surprising results of somepath/text() on >> <somepath>x<!--blah-->y</somepath>. Documented handling of nested >> elements. Documented that xmltable works only on XML documents, not >> fragments/forests. > > > I don't understand to this sentence: "It is possible for a PATH expression > to reference output columns that appear before it in the column-list, so > paths may be dynamically constructed based on other parts of the XML > document:" This was based on a misunderstanding of something you said earlier. I thought the idea was to allow this to work: SELECT * FROM xmltable('/x' PASSING '<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text, extractedValue text PATH elemName); ... but it doesn't: SELECT * FROM xmltable('/x' PASSING '<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text, extractedValue text PATH elemName); ERROR: column "elemname" does not exist LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName); ... so please delete that text. I thought I'd tested it but the state of my tests dir says I just got distracted by another task at the wrong time. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 24 September 2016 at 14:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"
This was based on a misunderstanding of something you said earlier. I
thought the idea was to allow this to work:
SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);
... but it doesn't:
SELECT * FROM xmltable('/x' PASSING
'<x><elemName>a</elemName><a>value</a></x>' COLUMNS elemName text,
extractedValue text PATH elemName);
ERROR: column "elemname" does not exist
LINE 1: ...' COLUMNS elemName text, extractedValue text PATH elemName);
... so please delete that text. I thought I'd tested it but the state
of my tests dir says I just got distracted by another task at the
wrong time.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Sep 27, 2016 at 2:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2016-09-27 5:53 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>: >> >> [...] >> ... so please delete that text. I thought I'd tested it but the state >> of my tests dir says I just got distracted by another task at the >> wrong time. Moved patch to next CF with same status: ready for committer. -- Michael
+ <para>
+ Only XPath query language is supported. PostgreSQL doesn't support XQuery
+ language. Then the syntax of <function>xmltable</function> doesn't
+ allow to use XQuery related functionality - the name of xml expression
+ (clause <literal>AS</literal>) is not allowed, and only one xml expression
+ should be passed to <function>xmltable</function> function as parameter.
+ </para>
Attachment
I've been going over this patch. I think it'd be better to restructure the <sect2> before adding the docs for this new function; I already split it out, so don't do anything about this. Next, looking at struct TableExprBuilder I noticed that the comments are already obsolete, as they talk about function params that do not exist (missing_columns) and they fail to mention the ones that do exist. Also, function member SetContent is not documented at all. Overall, these comments do not convey a lot -- apparently, whoever reads them is already supposed to know how it works: "xyz sets a row generating filter" doesn't tell me anything. Since this is API documentation, it needs to be much clearer. ExecEvalTableExpr and ExecEvalTableExprProtected have no comments whatsoever. Needs fixed. I wonder if it'd be a good idea to install TableExpr first without the implementing XMLTABLE, so that it's clearer what is API and what is implementation. The number of new keywords in this patch is depressing. I suppose there's no way around that -- as I understand, this is caused by the SQL standard's definition of the syntax for this feature. Have to go now for a bit -- will continue looking afterwards. Please submit delta patches on top of your latest v12 to fix the comments I mentioned. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I've been going over this patch. I think it'd be better to restructure
the <sect2> before adding the docs for this new function; I already
split it out, so don't do anything about this.
Next, looking at struct TableExprBuilder I noticed that the comments are
already obsolete, as they talk about function params that do not exist
(missing_columns) and they fail to mention the ones that do exist.
Also, function member SetContent is not documented at all. Overall,
these comments do not convey a lot -- apparently, whoever reads them is
already supposed to know how it works: "xyz sets a row generating
filter" doesn't tell me anything. Since this is API documentation, it
needs to be much clearer.
ExecEvalTableExpr and ExecEvalTableExprProtected have no comments
whatsoever. Needs fixed.
I wonder if it'd be a good idea to install TableExpr first without the
implementing XMLTABLE, so that it's clearer what is API and what is
implementation.
The number of new keywords in this patch is depressing. I suppose
there's no way around that -- as I understand, this is caused by the SQL
standard's definition of the syntax for this feature.
Have to go now for a bit -- will continue looking afterwards. Please
submit delta patches on top of your latest v12 to fix the comments I
mentioned.
--
Álvaro Herrera https://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > 2016-11-17 19:22 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > > Next, looking at struct TableExprBuilder I noticed that the comments are > > already obsolete, as they talk about function params that do not exist > > (missing_columns) and they fail to mention the ones that do exist. > > Also, function member SetContent is not documented at all. Overall, > > these comments do not convey a lot -- apparently, whoever reads them is > > already supposed to know how it works: "xyz sets a row generating > > filter" doesn't tell me anything. Since this is API documentation, it > > needs to be much clearer. > > > > ExecEvalTableExpr and ExecEvalTableExprProtected have no comments > > whatsoever. Needs fixed. > > I am sending the patch with more comments - but it needs a care someone > with good English skills. Thanks, I can help with that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The SQL standard seems to require a comma after the XMLNAMESPACES clause: <XML table> ::= XMLTABLE <left paren> [ <XML namespace declaration> <comma> ] <XML table row pattern> [ <XML table argumentlist> ] COLUMNS <XML table column definitions> <right paren> I don't understand the reason for that, but I have added it: | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')' { TableExpr *n = makeNode(TableExpr); n->row_path = $8; n->expr = $9; n->cols= NIL; n->namespaces = $5; n->location = @1; $$ = (Node *)n; } | XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList')' { TableExpr *n = makeNode(TableExpr); n->row_path = $8; n->expr = $9; n->cols = $11; n->namespaces = $5; n->location = @1; $$ = (Node *)n; } ; Another thing I did was remove the TableExprColOptionsOpt production; in its place I added a third rule in TableExprCol for "ColId Typename IsNotNull" (i.e. no options). This seems to reduce the size of the generated gram.c a good dozen kB. I didn't like much the use of c_expr in all these productions. As I understand it, c_expr is mostly an implementation artifact and we should be using a_expr or b_expr almost everywhere. I see that XMLEXISTS already expanded the very limited use of c_expr there was; I would prefer to fix that one too rather than replicate it here. TBH I'm not sure I like that XMLTABLE is re-using xmlexists_argument. Actually, is the existing XMLEXISTS production correct? What I see in the standard is <XML table row pattern> ::= <character string literal> <XML table argument list> ::= PASSING <XML table argument passing mechanism> <XML query argument> [ { <comma> <XML queryargument> }... ] <XML table argument passing mechanism> ::= <XML passing mechanism> <XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ] <XML table column definition> ::= <XML table ordinality column definition> | <XML table regular column definition> <XML table ordinality column definition> ::= <column name> FOR ORDINALITY <XML table regular column definition> ::= <column name> <data type> [ <XML passing mechanism> ] [ <default clause> ] [PATH <XML table column pattern> ] <XML table column pattern> ::= <character string literal> so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what we have in gram.y is: /* We allow several variants for SQL and other compatibility. */ xmlexists_argument: PASSING c_expr { $$ = $2; } | PASSING c_expr BY REF { $$ = $2; } | PASSING BY REF c_expr { $$ = $4; } | PASSING BY REF c_expr BY REF { $$ = $4; } ; I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY REF is not specified, we should just not have PASSING at all? If we extended this for XMLEXISTS for compatibility with some other product, perhaps we should look into what that product supports for XMLTABLE; maybe XMLTABLE does not need all the same options as XMLEXISTS. The fourth option seems very dubious to me. This was added by commit 641459f26, submitted here: https://www.postgresql.org/message-id/4C0F6DBF.9000001@mlfowler.com ... Hm, actually some perusal of the XMLEXISTS predicate in the standard shows that it's quite a different thing from XMLTABLE. Maybe we shouldn't reuse xmlexists_argument here at all. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The SQL standard seems to require a comma after the XMLNAMESPACES
clause:
<XML table> ::=
XMLTABLE <left paren>
[ <XML namespace declaration> <comma> ]
<XML table row pattern>
[ <XML table argument list> ]
COLUMNS <XML table column definitions> <right paren>
I don't understand the reason for that, but I have added it:
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = NIL;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = $11;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
;
Another thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options). This seems to reduce the size of the
generated gram.c a good dozen kB.
I didn't like much the use of c_expr in all these productions. As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere. I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here. TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.
Actually, is the existing XMLEXISTS production correct? What I see in
the standard is
<XML table row pattern> ::= <character string literal>
<XML table argument list> ::=
PASSING <XML table argument passing mechanism> <XML query argument>
[ { <comma> <XML query argument> }... ]
<XML table argument passing mechanism> ::= <XML passing mechanism>
<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]
<XML table column definition> ::=
<XML table ordinality column definition>
| <XML table regular column definition>
<XML table ordinality column definition> ::= <column name> FOR ORDINALITY
<XML table regular column definition> ::=
<column name> <data type> [ <XML passing mechanism> ]
[ <default clause> ]
[ PATH <XML table column pattern> ]
<XML table column pattern> ::= <character string literal>
so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:
/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
PASSING c_expr
{
$$ = $2;
}
| PASSING c_expr BY REF
{
$$ = $2;
}
| PASSING BY REF c_expr
{
$$ = $4;
}
| PASSING BY REF c_expr BY REF
{
$$ = $4;
}
;
I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?
If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.
The fourth option seems very dubious to me. This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4C0F6DBF.9000001@ mlfowler.com
... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE. Maybe we
shouldn't reuse xmlexists_argument here at all.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-11-19 0:42 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:The SQL standard seems to require a comma after the XMLNAMESPACES
clause:
<XML table> ::=
XMLTABLE <left paren>
[ <XML namespace declaration> <comma> ]
<XML table row pattern>
[ <XML table argument list> ]
COLUMNS <XML table column definitions> <right paren>
I don't understand the reason for that, but I have added it:
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = NIL;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
| XMLTABLE '(' XMLNAMESPACES '(' XmlNamespaceList ')' ',' c_expr xmlexists_argument COLUMNS TableExprColList ')'
{
TableExpr *n = makeNode(TableExpr);
n->row_path = $8;
n->expr = $9;
n->cols = $11;
n->namespaces = $5;
n->location = @1;
$$ = (Node *)n;
}
;yes, looks my oversight - it is betterAnother thing I did was remove the TableExprColOptionsOpt production; in
its place I added a third rule in TableExprCol for "ColId Typename
IsNotNull" (i.e. no options). This seems to reduce the size of the
generated gram.c a good dozen kB.If I remember well - this was required by better compatibility with OracleANSI SQL: colname type DEFAULT PATHOracle: colname PATH DEFAULTMy implementation allows both combinations - there are two reasons: 1. one less issue when people does port from Oracle, 2. almost all examples of XMLTABLE on a net are from Oracle - it can be unfriendly, when these examples would not work on PG - there was discussion about this issue in this mailing list
I didn't like much the use of c_expr in all these productions. As I
understand it, c_expr is mostly an implementation artifact and we should
be using a_expr or b_expr almost everywhere. I see that XMLEXISTS
already expanded the very limited use of c_expr there was; I would
prefer to fix that one too rather than replicate it here. TBH I'm not
sure I like that XMLTABLE is re-using xmlexists_argument.There are two situations: c_expr as document content, and c_expr after DEFAULT and PATH keywords. First probably can be fixed, second not, because "PATH" is unreserved keyword only.
Actually, is the existing XMLEXISTS production correct? What I see in
the standard is
<XML table row pattern> ::= <character string literal>
<XML table argument list> ::=
PASSING <XML table argument passing mechanism> <XML query argument>
[ { <comma> <XML query argument> }... ]
<XML table argument passing mechanism> ::= <XML passing mechanism>
<XML table column definitions> ::= <XML table column definition> [ { <comma> <XML table column definition> }... ]
<XML table column definition> ::=
<XML table ordinality column definition>
| <XML table regular column definition>
<XML table ordinality column definition> ::= <column name> FOR ORDINALITY
<XML table regular column definition> ::=
<column name> <data type> [ <XML passing mechanism> ]
[ <default clause> ]
[ PATH <XML table column pattern> ]
<XML table column pattern> ::= <character string literal>
so I think this resolves "PASSING BY {REF,VALUE} <XML query argument>", but what
we have in gram.y is:
/* We allow several variants for SQL and other compatibility. */
xmlexists_argument:
PASSING c_expr
{
$$ = $2;
}
| PASSING c_expr BY REF
{
$$ = $2;
}
| PASSING BY REF c_expr
{
$$ = $4;
}
| PASSING BY REF c_expr BY REF
{
$$ = $4;
}
;
I'm not sure why we allow "PASSING c_expr" at all. Maybe if BY VALUE/BY
REF is not specified, we should just not have PASSING at all?
If we extended this for XMLEXISTS for compatibility with some other
product, perhaps we should look into what that product supports for
XMLTABLE; maybe XMLTABLE does not need all the same options as
XMLEXISTS.The reason is a compatibility with other products - DB2. XMLTABLE uses same options like XMLEXISTS. These options has zero value for Postgres, but its are important - compatibility, and workable examples.The fourth option seems very dubious to me. This was added by commit
641459f26, submitted here:
https://www.postgresql.org/message-id/4C0F6DBF.9000001@mlfow ler.com
... Hm, actually some perusal of the XMLEXISTS predicate in the standard
shows that it's quite a different thing from XMLTABLE. Maybe we
shouldn't reuse xmlexists_argument here at all.not sure If I understandRegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Something I just noticed is that transformTableExpr takes a TableExpr node and returns another TableExpr node. That's unlike what we do in other places, where the node returned is of a different type than the input node. I'm not real clear what happens if you try to re-transform a node that was already transformed, but it seems worth thinking about. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Something I just noticed is that transformTableExpr takes a TableExpr > node and returns another TableExpr node. That's unlike what we do in > other places, where the node returned is of a different type than the > input node. I'm not real clear what happens if you try to re-transform > a node that was already transformed, but it seems worth thinking about. We're not 100% consistent on that --- there are cases such as RowExpr and CaseExpr where the same struct type is used for pre-parse-analysis and post-parse-analysis nodes. I think it's okay as long as the information content isn't markedly different, ie the transformation just consists of transforming all the sub-nodes. Being able to behave sanely on a re-transformation used to be an issue, but we no longer expect transformExpr to support that. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Something I just noticed is that transformTableExpr takes a TableExpr
> node and returns another TableExpr node. That's unlike what we do in
> other places, where the node returned is of a different type than the
> input node. I'm not real clear what happens if you try to re-transform
> a node that was already transformed, but it seems worth thinking about.
We're not 100% consistent on that --- there are cases such as RowExpr
and CaseExpr where the same struct type is used for pre-parse-analysis
and post-parse-analysis nodes. I think it's okay as long as the
information content isn't markedly different, ie the transformation
just consists of transforming all the sub-nodes.
Being able to behave sanely on a re-transformation used to be an
issue, but we no longer expect transformExpr to support that.
regards, tom lane
I found the whole TableExprGetTupleDesc() function a bit odd in nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to execTuples.c -- but only because that's where ExecTypeFromTL and others already live. I would have liked to move it to tupdesc.c instead, but it requires knowledge of executor nodes, which is probably the reason that ExecTypeFromTL is in execTuples. I think we'd eat that bit of ugliness only because we're not the first. But anyway I quickly ran into another problem. I noticed that ExecTypeFromTableExpr is being called from the transform phase, which is much earlier than the executor. I noticed because of the warning that the above movement added to nodeFuncs.c, src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of function 'ExecTypeFromTableExpr' [-Wimplicit-function-declaration] so I thought, hm, is it okay to have parse analysis run an executor function? (I suppose this is the reason you put it in nodeFuncs in the first place). For fun, I tried this query under GDB, with a breakpoint on exprTypmod(): SELECT X.* FROM emp, XMLTABLE ('//depts/dept/employee' passing doc COLUMNS empID INTEGER PATH '@id', firstname int PATH 'name/first', lastname VARCHAR(25) PATH 'name/last') AS X; and sure enough, the type is resolved during parse analysis: Breakpoint 1, exprTypmod (expr=expr@entry=0x1d23ad8) at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283 283 if (!expr) (gdb) print *expr $2 = {type = T_TableExpr} (gdb) bt #0 exprTypmod (expr=expr@entry=0x1d23ad8) at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283 #1 0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8, resultTypeId=0x7ffd482bfdb4, resultTupleDesc=0x7ffd482bfdb8) at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c:247 #2 0x000000000056de1b in expandRTE (rte=rte@entry=0x1d6b800, rtindex=2, sublevels_up=0, location=location@entry=7, include_dropped=include_dropped@entry=0 '\000', colnames=colnames@entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18) at/pgsql/source/master/src/backend/parser/parse_relation.c:2052 #3 0x000000000056e131 in expandRelAttrs (pstate=pstate@entry=0x1d238a8, rte=rte@entry=0x1d6b800, rtindex=<optimized out>, sublevels_up=<optimized out>, location=location@entry=7) at /pgsql/source/master/src/backend/parser/parse_relation.c:2435 #4 0x000000000056fa64 in ExpandSingleTable (pstate=pstate@entry=0x1d238a8, rte=rte@entry=0x1d6b800, location=7, make_target_entry=make_target_entry@entry=1'\001') at /pgsql/source/master/src/backend/parser/parse_target.c:1266 #5 0x000000000057135b in ExpandColumnRefStar (pstate=pstate@entry=0x1d238a8, cref=0x1d22720, make_target_entry=make_target_entry@entry=1'\001') at /pgsql/source/master/src/backend/parser/parse_target.c:1158 #6 0x00000000005716f9 in transformTargetList (pstate=0x1d238a8, targetlist=<optimized out>, exprKind=EXPR_KIND_SELECT_TARGET) This seems fine I guess, and it seems to say that we ought to move the code that generates the tupdesc to back parse analysis rather than executor. Okay, fine. But let's find a better place than nodeFuncs. But if I move the XMLTABLE() call to the target list instead, the type is resolved at planner time: SELECT XMLTABLE ('/dept/employee' passing $$<dept bldg="114"> <employee id="903"> <name> <first>Mary</first> <last>Jones</last> </name> <office>415</office> <phone>905-403-6112</phone> <phone>647-504-4546</phone> <salary currency="USD">64000</salary> </employee> </dept>$$ COLUMNS empID INTEGER PATH '@id', firstname varchar(4) PATH 'name/first', lastname VARCHAR(25) PATH 'name/last') AS X; Breakpoint 1, exprTypmod (expr=expr@entry=0x1d6bed8) at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283 283 if (!expr) (gdb) bt #0 exprTypmod (expr=expr@entry=0x1d6bed8) at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283 #1 0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68, target=0x1d6c728) at /pgsql/source/master/src/backend/optimizer/path/costsize.c:4729 #2 0x000000000066c197 in grouping_planner (root=0x1d6bc68, inheritance_update=40 '(', inheritance_update@entry=0 '\000', tuple_fraction=0.01, tuple_fraction@entry=0) at /pgsql/source/master/src/backend/optimizer/plan/planner.c:1745 #3 0x000000000066ef64 in subquery_planner (glob=glob@entry=0x1d6bbd0, parse=parse@entry=0x1d23818, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=0 '\000', tuple_fraction=tuple_fraction@entry=0) at /pgsql/source/master/src/backend/optimizer/plan/planner.c:795 #4 0x000000000066fe5e in standard_planner (parse=0x1d23818, cursorOptions=256, boundParams=<optimized out>) at /pgsql/source/master/src/backend/optimizer/plan/planner.c:307 This is surprising, but I'm not sure it's wrong. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I found the whole TableExprGetTupleDesc() function a bit odd in
nodeFuncs.c, so I renamed it to ExecTypeFromTableExpr() and moved it to
execTuples.c -- but only because that's where ExecTypeFromTL and others
already live. I would have liked to move it to tupdesc.c instead, but
it requires knowledge of executor nodes, which is probably the reason
that ExecTypeFromTL is in execTuples. I think we'd eat that bit of
ugliness only because we're not the first. But anyway I quickly ran
into another problem.
I noticed that ExecTypeFromTableExpr is being called from the transform
phase, which is much earlier than the executor. I noticed because of
the warning that the above movement added to nodeFuncs.c,
src/backend/nodes/nodeFuncs.c:509:5: warning: implicit declaration of function 'ExecTypeFromTableExpr' [-Wimplicit-function- declaration]
so I thought, hm, is it okay to have parse analysis run an executor
function? (I suppose this is the reason you put it in nodeFuncs in the
first place). For fun, I tried this query under GDB, with a breakpoint
on exprTypmod():
SELECT X.*
FROM emp,
XMLTABLE ('//depts/dept/employee' passing doc
COLUMNS
empID INTEGER PATH '@id',
firstname int PATH 'name/first',
lastname VARCHAR(25) PATH 'name/last') AS X;
and sure enough, the type is resolved during parse analysis:
Breakpoint 1, exprTypmod (expr=expr@entry=0x1d23ad8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283 if (!expr)
(gdb) print *expr
$2 = {type = T_TableExpr}
(gdb) bt
#0 exprTypmod (expr=expr@entry=0x1d23ad8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1 0x000000000080c500 in get_expr_result_type (expr=0x1d23ad8,
resultTypeId=0x7ffd482bfdb4, resultTupleDesc=0x7ffd482bfdb8)
at /pgsql/source/master/src/backend/utils/fmgr/funcapi.c: 247
#2 0x000000000056de1b in expandRTE (rte=rte@entry=0x1d6b800, rtindex=2,
sublevels_up=0, location=location@entry=7,
include_dropped=include_dropped@entry=0 '\000',
colnames=colnames@entry=0x7ffd482bfe10, colvars=0x7ffd482bfe18)
at /pgsql/source/master/src/backend/parser/parse_relation. c:2052
#3 0x000000000056e131 in expandRelAttrs (pstate=pstate@entry=0x1d238a8,
rte=rte@entry=0x1d6b800, rtindex=<optimized out>,
sublevels_up=<optimized out>, location=location@entry=7)
at /pgsql/source/master/src/backend/parser/parse_relation. c:2435
#4 0x000000000056fa64 in ExpandSingleTable (pstate=pstate@entry=0x1d238a8,
rte=rte@entry=0x1d6b800, location=7,
make_target_entry=make_target_entry@entry=1 '\001')
at /pgsql/source/master/src/backend/parser/parse_target.c: 1266
#5 0x000000000057135b in ExpandColumnRefStar (pstate=pstate@entry=0x1d238a8,
cref=0x1d22720, make_target_entry=make_target_entry@entry=1 '\001')
at /pgsql/source/master/src/backend/parser/parse_target.c: 1158
#6 0x00000000005716f9 in transformTargetList (pstate=0x1d238a8,
targetlist=<optimized out>, exprKind=EXPR_KIND_SELECT_TARGET)
This seems fine I guess, and it seems to say that we ought to move the
code that generates the tupdesc to back parse analysis rather than
executor. Okay, fine. But let's find a better place than nodeFuncs.
But if I move the XMLTABLE() call to the target list instead, the type
is resolved at planner time:
SELECT
XMLTABLE ('/dept/employee' passing $$<dept bldg="114">
<employee id="903">
<name>
<first>Mary</first>
<last>Jones</last>
</name>
<office>415</office>
<phone>905-403-6112</phone>
<phone>647-504-4546</phone>
<salary currency="USD">64000</salary>
</employee>
</dept>$$
COLUMNS
empID INTEGER PATH '@id',
firstname varchar(4) PATH 'name/first',
lastname VARCHAR(25) PATH 'name/last') AS X;
Breakpoint 1, exprTypmod (expr=expr@entry=0x1d6bed8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
283 if (!expr)
(gdb) bt
#0 exprTypmod (expr=expr@entry=0x1d6bed8)
at /pgsql/source/master/src/backend/nodes/nodeFuncs.c:283
#1 0x0000000000654058 in set_pathtarget_cost_width (root=0x1d6bc68,
target=0x1d6c728)
at /pgsql/source/master/src/backend/optimizer/path/ costsize.c:4729
#2 0x000000000066c197 in grouping_planner (root=0x1d6bc68,
inheritance_update=40 '(', inheritance_update@entry=0 '\000',
tuple_fraction=0.01, tuple_fraction@entry=0)
at /pgsql/source/master/src/backend/optimizer/plan/ planner.c:1745
#3 0x000000000066ef64 in subquery_planner (glob=glob@entry=0x1d6bbd0,
parse=parse@entry=0x1d23818, parent_root=parent_root@entry=0x0,
hasRecursion=hasRecursion@entry=0 '\000',
tuple_fraction=tuple_fraction@entry=0)
at /pgsql/source/master/src/backend/optimizer/plan/ planner.c:795
#4 0x000000000066fe5e in standard_planner (parse=0x1d23818,
cursorOptions=256, boundParams=<optimized out>)
at /pgsql/source/master/src/backend/optimizer/plan/ planner.c:307
This is surprising, but I'm not sure it's wrong.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I tried to see if a following RTE was able to "see" the entries created by XMLTABLE, and sure it can: SELECT X.*, generate_series FROM emp, XMLTABLE ('//depts/dept/employee' passing doc COLUMNS empID INTEGER PATH '@id', firstname varchar(25) PATH 'name/first', lastname VARCHAR(25) PATH 'name/last') AS X, generate_series(900, empid); empid │ firstname │ lastname │ generate_series ───────┼───────────┼──────────┼───────────────── 901 │ John │ Doe │ 900 901 │ John │ Doe │ 901 902 │ Peter │ Pan │ 900 902 │ Peter │ Pan │ 901 902 │ Peter │ Pan │ 902 903 │ Mary │ Jones │ 900 903 │ Mary │ Jones │ 901 903 │ Mary │ Jones │ 902 903 │ Mary │ Jones │ 903 (9 filas) Cool. I'm still wondering how this works. I'll continue to explore the patch in order to figure out. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > I'm still wondering how this works. I'll continue to explore the patch > in order to figure out. Ah, so it's already parsed as a "range function". That part looks good to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Oh my, I just noticed we have a new xpath preprocessor in this patch too. Where did this code come from -- did you write it all from scratch? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here's another version. Not there yet: need to move back the function to create the tupdesc, as discussed. Not clear what's the best place, however. I modified the grammar a bit (added the missing comma, removed PATH as an unreserved keyword and just used IDENT, removed the "Opt" version for column options), and reworked the comments in the transform phase (I tweaked the code here and there mostly to move things to nicer places, but it's pretty much the same code). In the new xpath_parser.c file I think we should tidy things up a bit. First, it needs more commentary on what the entry function actually does, in detail. Also, IMO that function should be at the top of the file, not at the bottom, followed by all its helpers. I would like some more clarity on the provenance of all this code, just to assess the probability of bugs; mostly as it's completely undocumented. I don't like the docs either. I think we should have a complete reference to the syntax, followed by examples, rather than letting the examples drive the whole thing. I fixed the synopsis so that it's not one very long line. If you use "PATH '/'" for a column, you get the text for all the entries in the whole XML, rather than the text for the particular row being processed. Isn't that rather weird, or to put it differently, completely wrong? I didn't find a way to obtain the whole XML row when you have the COLUMNS option (which is what I was hoping for with the "PATH '/'"). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sorry, here's the patch. Power loss distracted me here. By the way, the pgindent you did is slightly confused because you failed to add the new struct types you define to typedefs.list. I have not looked at the new xml.c code at all, yet. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 11/23/2016 06:31 PM, Alvaro Herrera wrote: > Sorry, here's the patch. Power loss distracted me here. > > By the way, the pgindent you did is slightly confused because you failed > to add the new struct types you define to typedefs.list. Tips on how to use pgindent for developers: <http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html> cheers andrew
Alvaro Herrera wrote: > If you use "PATH '/'" for a column, you get the text for all the entries > in the whole XML, rather than the text for the particular row being > processed. Isn't that rather weird, or to put it differently, completely > wrong? I didn't find a way to obtain the whole XML row when you have > the COLUMNS option (which is what I was hoping for with the "PATH '/'"). Ah, apparently you need to use type XML for that column in order for this to happen. Example: insert into emp values ($$ <depts ><dept bldg="102"> <employee id="905"> <name> <first>John</first> <last>Doew</last> </name> <office>344</office> <salary currency="USD">55000</salary> </employee> <employee id="908"> <name> <first>Peter</first> <last>Panw</last> </name> <office>216</office> <phone>905-416-5004</phone> </employee></dept> <dept bldg="115"> <employee id="909"> <name> <first>Mary</first> <last>Jonesw</last> </name> <office>415</office> <phone>905-403-6112</phone> <phone>647-504-4546</phone> <salarycurrency="USD">64000</salary> </employee></dept> </depts> $$); Note the weird salary_amount value here: SELECT x.* FROM emp, XMLTABLE ('//depts/dept/employee' passing doc COLUMNS i for ordinality, empID int PATH '@id', firstname varchar(25) PATH 'name/first' default 'FOOBAR', lastname VARCHAR(25) PATH 'name/last', salary xml path 'concat(salary/text(), salary/@currency)' default 'DONT KNOW', salary_amount xml path '/') WITH ORDINALITY AS X (i, a, b, c) limit 1;i │ a │ b │ c │ salary │ salary_amount │ ordinality ───┼─────┼──────┼──────┼──────────┼───────────────────────┼────────────1 │ 905 │ John │ Doew │ 55000USD │ ↵│ 1 │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ John ↵│ │ │ │ │ │ Doew ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ 344 ↵│ │ │ │ │ │ 55000 ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ Peter ↵│ │ │ │ │ │ Panw ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ 216 ↵│ │ │ │ │ │ 905-416-5004↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ Mary ↵│ │ │ │ │ │ Jonesw ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ 415 ↵│ │ │ │ │ │ 905-403-6112↵│ │ │ │ │ │ 647-504-4546↵│ │ │ │ │ │ 64000 ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ ↵│ │ │ │ │ │ │ (1 fila) If you declare salary_amount to be text instead, it doesn't happen anymore. Apparently if you put it in a namespace, it doesn't hapen either. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Oh my, I just noticed we have a new xpath preprocessor in this patch
too. Where did this code come from -- did you write it all from
scratch?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Here's another version. Not there yet: need to move back the functionthe COLUMNS option (which is what I was hoping for with the "PATH '/'").
to create the tupdesc, as discussed. Not clear what's the best place,
however. I modified the grammar a bit (added the missing comma, removed
PATH as an unreserved keyword and just used IDENT, removed the "Opt"
version for column options), and reworked the comments in the transform
phase (I tweaked the code here and there mostly to move things to nicer
places, but it's pretty much the same code).
In the new xpath_parser.c file I think we should tidy things up a bit.
First, it needs more commentary on what the entry function actually
does, in detail. Also, IMO that function should be at the top of the
file, not at the bottom, followed by all its helpers. I would like some
more clarity on the provenance of all this code, just to assess the
probability of bugs; mostly as it's completely undocumented.
I don't like the docs either. I think we should have a complete
reference to the syntax, followed by examples, rather than letting the
examples drive the whole thing. I fixed the synopsis so that it's not
one very long line.
If you use "PATH '/'" for a column, you get the text for all the entries
in the whole XML, rather than the text for the particular row being
processed. Isn't that rather weird, or to put it differently, completely
wrong? I didn't find a way to obtain the whole XML row when you have
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Oh my, I just noticed we have a new xpath preprocessor in this patch
too. Where did this code come from -- did you write it all from
scratch?I wrote it from scratch - libxml2 has not any API for iteration over XPath expression (different than iteration over XPath expression result), and what I have info, there will not be any new API in libxml2.There are two purposes:Safe manipulation with XPath expression prefixes - ANSI SQL design implicitly expects some prefix, but it can be used manually. The prefix should not be used twice and in some situations, when it can breaks the expression.
Second goal is support default namespaces - when we needed parser for first task, then the enhancing for this task was not too much lines more.This parser can be used for enhancing current XPath function - default namespaces are pretty nice, when you have to use namespaces.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > Hi > > 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > > Oh my, I just noticed we have a new xpath preprocessor in this patch > > too. Where did this code come from -- did you write it all from > > scratch? > > I wrote it from scratch - libxml2 has not any API for iteration over XPath > expression (different than iteration over XPath expression result), and > what I have info, there will not be any new API in libxml2. Okay, I agree that the default namespace stuff looks worthwhile in the long run. But I don't have enough time to review the xpath parser stuff in the current commitfest, and I think it needs at the very least a lot of additional code commentary. However I think the rest of it can reasonably go in -- I mean the SQL parse of it, analysis, executor. Let me propose this: you split the patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT, and we introduce just the TableExpr stuff plus the XMLTABLE function. I can commit that part in the current commitfest, and we leave the xpath_parser plus associated features for the upcoming commitfest. Deal? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> Hi
>
> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > Oh my, I just noticed we have a new xpath preprocessor in this patch
> > too. Where did this code come from -- did you write it all from
> > scratch?
>
> I wrote it from scratch - libxml2 has not any API for iteration over XPath
> expression (different than iteration over XPath expression result), and
> what I have info, there will not be any new API in libxml2.
Okay, I agree that the default namespace stuff looks worthwhile in the
long run. But I don't have enough time to review the xpath parser stuff
in the current commitfest, and I think it needs at the very least a lot
of additional code commentary.
However I think the rest of it can reasonably go in -- I mean the SQL
parse of it, analysis, executor. Let me propose this: you split the
patch, leaving the xpath_parser.c stuff out and XMLNAMESPACES DEFAULT,
and we introduce just the TableExpr stuff plus the XMLTABLE function. I
can commit that part in the current commitfest, and we leave the
xpath_parser plus associated features for the upcoming commitfest.
Deal?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > can me send your last work? Sure, it's in the archives -- https://www.postgresql.org/message-id/20161123233130.oqf7jl6czehy5fiw@alvherre.pgsql -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Pavel Stehule wrote: >> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: >>> Oh my, I just noticed we have a new xpath preprocessor in this patch >>> too. Where did this code come from -- did you write it all from >>> scratch? >> I wrote it from scratch - libxml2 has not any API for iteration over XPath >> expression (different than iteration over XPath expression result), and >> what I have info, there will not be any new API in libxml2. > Okay, I agree that the default namespace stuff looks worthwhile in the > long run. But I don't have enough time to review the xpath parser stuff > in the current commitfest, and I think it needs at the very least a lot > of additional code commentary. contrib/xml2 has always relied on libxslt for xpath functionality. Can we do that here instead of writing, debugging, and documenting a pile of new code? regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Pavel Stehule wrote:
>> 2016-11-24 0:13 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>>> Oh my, I just noticed we have a new xpath preprocessor in this patch
>>> too. Where did this code come from -- did you write it all from
>>> scratch?
>> I wrote it from scratch - libxml2 has not any API for iteration over XPath
>> expression (different than iteration over XPath expression result), and
>> what I have info, there will not be any new API in libxml2.
> Okay, I agree that the default namespace stuff looks worthwhile in the
> long run. But I don't have enough time to review the xpath parser stuff
> in the current commitfest, and I think it needs at the very least a lot
> of additional code commentary.
contrib/xml2 has always relied on libxslt for xpath functionality.
Can we do that here instead of writing, debugging, and documenting
a pile of new code?
regards, tom lane
Pavel Stehule wrote:
> can me send your last work?
Sure, it's in the archives --
https://www.postgresql.org/message-id/20161123233130. oqf7jl6czehy5fiw@alvherre. pgsql
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Nov 25, 2016 at 3:31 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2016-11-24 18:51 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>: >> contrib/xml2 has always relied on libxslt for xpath functionality. >> Can we do that here instead of writing, debugging, and documenting >> a pile of new code? > > I am sorry - I don't see it. There is nothing complex manipulation with > XPath expressions. You are missing the point here, which is to make the implementation footprint as light as possible, especially if the added functionality is already present in a dependency that Postgres can be linked to. OK, libxslt can only be linked with contrib/xml2/ now, but it would be at least worth looking at how much the current patch can be simplified for things like transformTableExpr or XmlTableGetValue by relying on some existing routines. Nit: I did not look at the patch in details, but I find the size of the latest version sent, 167kB, scary as it complicates review and increases the likeliness of bugs. -- Michael
Michael Paquier wrote: > Nit: I did not look at the patch in details, > but I find the size of the latest version sent, 167kB, scary as it > complicates review and increases the likeliness of bugs. Here's the stat. Note that removing the functionality as discussed would remove all of xpath_parser.c but I think the rest of it remains pretty much unchanged. So it's clearly a large patch, but there are large docs and tests too, not just code. doc/src/sgml/func.sgml | 376 ++++++++++++++++++---src/backend/executor/execQual.c | 335 +++++++++++++++++++src/backend/executor/execTuples.c | 42 +++src/backend/nodes/copyfuncs.c | 66 ++++src/backend/nodes/equalfuncs.c | 51 +++src/backend/nodes/nodeFuncs.c | 100 ++++++src/backend/nodes/outfuncs.c | 51 +++src/backend/nodes/readfuncs.c | 42 +++src/backend/optimizer/util/clauses.c| 33 ++src/backend/parser/gram.y | 181 ++++++++++-src/backend/parser/parse_coerce.c | 33 +-src/backend/parser/parse_expr.c | 182 +++++++++++src/backend/parser/parse_target.c | 7 +src/backend/utils/adt/Makefile | 2 +-src/backend/utils/adt/ruleutils.c | 100 ++++++src/backend/utils/adt/xml.c | 610 +++++++++++++++++++++++++++++++++++src/backend/utils/adt/xpath_parser.c| 337 +++++++++++++++++++src/backend/utils/fmgr/funcapi.c | 13 +src/include/executor/executor.h | 1 +src/include/executor/tableexpr.h | 69 ++++src/include/funcapi.h | 1 -src/include/nodes/execnodes.h | 31 ++src/include/nodes/nodes.h | 4 +src/include/nodes/parsenodes.h | 21 ++src/include/nodes/primnodes.h | 40 +++src/include/parser/kwlist.h | 3 +src/include/parser/parse_coerce.h | 4 +src/include/utils/xml.h | 2 +src/include/utils/xpath_parser.h | 24 ++src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++src/test/regress/sql/xml.sql | 170 ++++++++++33 files changed, 4019 insertions(+), 64 deletions(-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Michael Paquier wrote:
> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.
Here's the stat. Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged. So it's clearly a large patch, but there are
large docs and tests too, not just code.
doc/src/sgml/func.sgml | 376 ++++++++++++++++++---
src/backend/executor/execQual.c | 335 +++++++++++++++++++
src/backend/executor/execTuples.c | 42 +++
src/backend/nodes/copyfuncs.c | 66 ++++
src/backend/nodes/equalfuncs.c | 51 +++
src/backend/nodes/nodeFuncs.c | 100 ++++++
src/backend/nodes/outfuncs.c | 51 +++
src/backend/nodes/readfuncs.c | 42 +++
src/backend/optimizer/util/clauses.c | 33 ++
src/backend/parser/gram.y | 181 ++++++++++-
src/backend/parser/parse_coerce.c | 33 +-
src/backend/parser/parse_expr.c | 182 +++++++++++
src/backend/parser/parse_target.c | 7 +
src/backend/utils/adt/Makefile | 2 +-
src/backend/utils/adt/ruleutils.c | 100 ++++++
src/backend/utils/adt/xml.c | 610 +++++++++++++++++++++++++++++++++++
src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
src/backend/utils/fmgr/funcapi.c | 13 +
src/include/executor/executor.h | 1 +
src/include/executor/tableexpr.h | 69 ++++
src/include/funcapi.h | 1 -
src/include/nodes/execnodes.h | 31 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 21 ++
src/include/nodes/primnodes.h | 40 +++
src/include/parser/kwlist.h | 3 +
src/include/parser/parse_coerce.h | 4 +
src/include/utils/xml.h | 2 +
src/include/utils/xpath_parser.h | 24 ++
src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++
src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 170 ++++++++++
33 files changed, 4019 insertions(+), 64 deletions(-)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-11-25 3:31 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Michael Paquier wrote:
> Nit: I did not look at the patch in details,
> but I find the size of the latest version sent, 167kB, scary as it
> complicates review and increases the likeliness of bugs.
Here's the stat. Note that removing the functionality as discussed
would remove all of xpath_parser.c but I think the rest of it remains
pretty much unchanged. So it's clearly a large patch, but there are
large docs and tests too, not just code.yes, lot of is regress tests (expected part is 3x) - and XMLTABLE function is not trivial.
lot of code is mechanical - nodes related. The really complex part is only in xml.c. There is one longer function only - the complexity is based on mapping libxml2 result to PostgreSQL result (with catching exceptions due releasing libxml2 sources).The all changes are well isolated - there is less risk to break some other.
doc/src/sgml/func.sgml | 376 ++++++++++++++++++---
src/backend/executor/execQual.c | 335 +++++++++++++++++++
src/backend/executor/execTuples.c | 42 +++
src/backend/nodes/copyfuncs.c | 66 ++++
src/backend/nodes/equalfuncs.c | 51 +++
src/backend/nodes/nodeFuncs.c | 100 ++++++
src/backend/nodes/outfuncs.c | 51 +++
src/backend/nodes/readfuncs.c | 42 +++
src/backend/optimizer/util/clauses.c | 33 ++
src/backend/parser/gram.y | 181 ++++++++++-
src/backend/parser/parse_coerce.c | 33 +-
src/backend/parser/parse_expr.c | 182 +++++++++++
src/backend/parser/parse_target.c | 7 +
src/backend/utils/adt/Makefile | 2 +-
src/backend/utils/adt/ruleutils.c | 100 ++++++
src/backend/utils/adt/xml.c | 610 +++++++++++++++++++++++++++++++++++
src/backend/utils/adt/xpath_parser.c | 337 +++++++++++++++++++
src/backend/utils/fmgr/funcapi.c | 13 +
src/include/executor/executor.h | 1 +
src/include/executor/tableexpr.h | 69 ++++
src/include/funcapi.h | 1 -
src/include/nodes/execnodes.h | 31 ++
src/include/nodes/nodes.h | 4 +
src/include/nodes/parsenodes.h | 21 ++
src/include/nodes/primnodes.h | 40 +++
src/include/parser/kwlist.h | 3 +
src/include/parser/parse_coerce.h | 4 +
src/include/utils/xml.h | 2 +
src/include/utils/xpath_parser.h | 24 ++
src/test/regress/expected/xml.out | 415 ++++++++++++++++++++++++
src/test/regress/expected/xml_1.out | 323 +++++++++++++++++++
src/test/regress/expected/xml_2.out | 414 ++++++++++++++++++++++++
src/test/regress/sql/xml.sql | 170 ++++++++++
33 files changed, 4019 insertions(+), 64 deletions(-)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > Here is updated patch without default namespace support (and without XPath > expression transformation). > > Due last changes in parser > https://github.com/postgres/postgres/commit/906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd > I had to use c_expr on other positions ( xmlnamespace definition). > > I don't think it is limit - in 99% there will be const literal. Argh. I can't avoid the feeling that I'm missing some parser trickery here. We have the XMLNAMESPACES keyword and the clause-terminating comma to protect these clauses, there must be a way to define this piece of the grammar so that there's no conflict, without losing the freedom in the expressions. But I don't see how. Now I agree that xml namespace definitions are going to be string literals in almost all cases (or in extra sophisticated cases, column names) ... it's probably better to spend the bison-fu in the document expression or the column options, or better yet the xmlexists_argument stuff. But I don't see possibility of improvements in any of those places, so let's put it aside -- we can improve later, if need arises. In any case, it looks like we can change c_expr to b_expr in a few places, which is good because then operators work (in particular, unless I misread the grammar, foo||bar doesn't work with c_expr and does work with b_expr, which seems the most useful in this case). Also, it makes no sense to support (in the namespaces clause) DEFAULT a_expr if the IDENT case uses only b_expr, so let's reduce both to just b_expr. While I'm looking at node definitions, I see a few things that could use some naming improvement. For example, "expr" for TableExpr is a bit unexpressive. We could use "document_expr" there, perhaps. "row_path" seems fixated on the XML case and the expression be path; let's use "row_expr" there. And "cols" could be "column_exprs" perhaps. (All those renames cause fall-out in various node-related files, so let's think carefully to avoid renaming them multiple times.) In primnodes, you kept the comment that says "xpath". Please update that to not-just-XML reality. Please fix the comment in XmlTableAddNs; NULL is no longer a valid value. parse_expr.c has two unused variables; please remove them. This test in ExecEvalTableExprProtected looks weird:if (i != tstate->for_ordinality_col - 1) please change to comparing "i + 1" (convert array index into attribute number), and invert the boolean expression, leaving the for_ordinality case on top and the rest in the "else". That seems easier to read. Also, we customarily use post-increment (rownum++) instead of pre-incr. In execQual.c I think it's neater to have ExecEvalTableExpr go before its subroutine. Actually, I wonder whether it is really necessary to have a subroutine in the first place; you could just move the entire contents of that subroutine to within the PG_TRY block instead. The only thing you lose is one indentation level. I'm not sure about this one, but it's worth considering. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> Here is updated patch without default namespace support (and without XPath
> expression transformation).
>
> Due last changes in parser
> https://github.com/postgres/postgres/commit/ 906bfcad7ba7cb3863fe0e2a7810be 8e3cd84fbd
> I had to use c_expr on other positions ( xmlnamespace definition).
>
> I don't think it is limit - in 99% there will be const literal.
Argh. I can't avoid the feeling that I'm missing some parser trickery
here. We have the XMLNAMESPACES keyword and the clause-terminating
comma to protect these clauses, there must be a way to define this piece
of the grammar so that there's no conflict, without losing the freedom
in the expressions. But I don't see how. Now I agree that xml
namespace definitions are going to be string literals in almost all
cases (or in extra sophisticated cases, column names) ... it's probably
better to spend the bison-fu in the document expression or the column
options, or better yet the xmlexists_argument stuff. But I don't see
possibility of improvements in any of those places, so let's put it
aside -- we can improve later, if need arises.
The problem is in unreserved keyword "PASSING" probably.
In any case, it looks like we can change c_expr to b_expr in a few
places, which is good because then operators work (in particular, unless
I misread the grammar, foo||bar doesn't work with c_expr and does work
with b_expr, which seems the most useful in this case). Also, it makes
no sense to support (in the namespaces clause) DEFAULT a_expr if the
IDENT case uses only b_expr, so let's reduce both to just b_expr.
While I'm looking at node definitions, I see a few things that could use
some naming improvement. For example, "expr" for TableExpr is a bit
unexpressive. We could use "document_expr" there, perhaps. "row_path"
seems fixated on the XML case and the expression be path; let's use
"row_expr" there. And "cols" could be "column_exprs" perhaps. (All
those renames cause fall-out in various node-related files, so let's
think carefully to avoid renaming them multiple times.)
In primnodes, you kept the comment that says "xpath". Please update
that to not-just-XML reality.
Please fix the comment in XmlTableAddNs; NULL is no longer a valid value.
parse_expr.c has two unused variables; please remove them.
This test in ExecEvalTableExprProtected looks weird:
if (i != tstate->for_ordinality_col - 1)
please change to comparing "i + 1" (convert array index into attribute
number), and invert the boolean expression, leaving the for_ordinality
case on top and the rest in the "else". That seems easier to read.
Also, we customarily use post-increment (rownum++) instead of pre-incr.
In execQual.c I think it's neater to have ExecEvalTableExpr go before
its subroutine. Actually, I wonder whether it is really necessary to
have a subroutine in the first place; you could just move the entire
contents of that subroutine to within the PG_TRY block instead. The
only thing you lose is one indentation level. I'm not sure about this
one, but it's worth considering.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com> wrote: > The problem is in unreserved keyword "PASSING" probably. Yeah, I think that's what I hit when trying to change it. Can't you just parenthesize the expression to use operators like || etc? If so, not a big deal. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> The problem is in unreserved keyword "PASSING" probably.
Yeah, I think that's what I hit when trying to change it.
Can't you just parenthesize the expression to use operators like ||
etc? If so, not a big deal.
???
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Pavel Stehule wrote: > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>: > > > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com> > > wrote: > > > > > The problem is in unreserved keyword "PASSING" probably. > > > > Yeah, I think that's what I hit when trying to change it. > > > > Can't you just parenthesize the expression to use operators like || > > etc? If so, not a big deal. > > > ??? "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users to manually add parens around any expressions that they want to use. That's not necessary most of the time since we've been able to use b_expr in most places. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
>
> > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com>
> > wrote:
> >
> > > The problem is in unreserved keyword "PASSING" probably.
> >
> > Yeah, I think that's what I hit when trying to change it.
> >
> > Can't you just parenthesize the expression to use operators like ||
> > etc? If so, not a big deal.
> >
> ???
"'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
to manually add parens around any expressions that they want to use.
That's not necessary most of the time since we've been able to use
b_expr in most places.
--
Álvaro Herrera https://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<p dir="ltr"><p dir="ltr">Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <<a href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>:<br/> ><br /> ><br /> ><br /> > 2016-11-3013:38 GMT+01:00 Alvaro Herrera <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>:<br/> >><br /> >> Pavel Stehule wrote:<br/> >> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <<a href="mailto:craig@2ndquadrant.com">craig@2ndquadrant.com</a>>:<br/> >> ><br /> >> > > On 30 November2016 at 05:36, Pavel Stehule <<a href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>><br /> >>> > wrote:<br /> >> > ><br /> >> > > > The problem is in unreserved keyword "PASSING"probably.<br /> >> > ><br /> >> > > Yeah, I think that's what I hit when trying to changeit.<br /> >> > ><br /> >> > > Can't you just parenthesize the expression to use operators like||<br /> >> > > etc? If so, not a big deal.<br /> >> > ><br /> >> > ???<br /> >><br/> >> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users<br /> >> to manuallyadd parens around any expressions that they want to use.<br /> >> That's not necessary most of the time sincewe've been able to use<br /> >> b_expr in most places.<br /> ><p dir="ltr">still there are one c_expr, butwithout new reserved word there are not change to reduce it.<br /><p dir="ltr">><br /> > Now I understand<br />><br /> > Regards<br /> ><br /> > Pavel<br /> > <br /> >><br /> >><br /> >> --<br />>> Álvaro Herrera <a href="https://www.2ndQuadrant.com/">https://www.2ndQuadrant.com/</a><br /> >>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services<br /> ><br /> >
Dne 30. 11. 2016 14:53 napsal uživatel "Pavel Stehule" <pavel.stehule@gmail.com>:
>
>
>
> 2016-11-30 13:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>>
>> Pavel Stehule wrote:
>> > 2016-11-30 2:40 GMT+01:00 Craig Ringer <craig@2ndquadrant.com>:
>> >
>> > > On 30 November 2016 at 05:36, Pavel Stehule <pavel.stehule@gmail.com>
>> > > wrote:
>> > >
>> > > > The problem is in unreserved keyword "PASSING" probably.
>> > >
>> > > Yeah, I think that's what I hit when trying to change it.
>> > >
>> > > Can't you just parenthesize the expression to use operators like ||
>> > > etc? If so, not a big deal.
>> > >
>> > ???
>>
>> "'(' a_expr ')'" is a c_expr; Craig suggests that we can just tell users
>> to manually add parens around any expressions that they want to use.
>> That's not necessary most of the time since we've been able to use
>> b_expr in most places.
>still there are one c_expr, but without new reserved word there are not change to reduce it.
Hm, you omitted tableexpr.h from the v15 patch ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hm, you omitted tableexpr.h from the v15 patch ...
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Here's version 17. I have made significant changes here. 1. Restructure the execQual code. Instead of a PG_TRY wrapper, I have split this code in three pieces; there's the main code with the PG_TRY wrappers and is mainly in charge of the builderCxt pointer. In the previous coding there was a shim that examined builderCxt but was not responsible for setting it up, which was ugly. The second part is the "initializer" which sets the row and column filters and does namespace processing. The third part is the "FetchRow" logic. It seems to me much cleaner this way. 2. rename the "builder" stuff to use the "routine" terminology. This is in line with what we do for other function-pointer-filled structs, such as FdwRoutine, IndexAmRoutine etc. I also cleaned up the names a bit more. 3. Added a magic number to the table builder context struct, so that we can barf appropriately. This is in line with PgXmlErrorContext -- mostly for future-proofing. I didn't test this too hard. Also, moved the XmlTableContext struct declaration nearer the top of the file, as is customary. (We don't really need it that way, since the functions are all declared taking void *, but it seems cleaner to me anyway). 4. I added, edited, and fixed a large number of code comments. This is looking much better now, but it still needs at least the following changes. First, we need to fix is the per_rowset_memcxt thingy. I think the way it's currently being used is rather ugly; it looks to me like the memory context does not belong into the XmlTableContext struct at all. Instead, the executor code should keep the memcxt pointer in a state struct of its own, and it should be the executor's responsibility to change to the appropriate context before calling the table builder functions. In particular, this means that the table context can no longer be a void * pointer; it needs to be a struct that's defined by the executor (probably a primnodes.h one). The void * pointer is stashed inside that struct. Also, the "routine" pointer should not be part of the void * struct, but of the executor's struct. So the execQual code can switch to the memory context, and destroy it appropriately. Second, we should make gram.y set a new "function type" value in the TableExpr it creates, so that the downstream code (transformTableExpr, ExecInitExpr, ruleutils.c) really knows that the given function is XmlTableExpr, instead of guessing just because it's the only implemented case. Probably this "function type" is an enum (currently with a single value TableExprTypeXml or something like that) in primnodes. Finally, there's the pending task of renaming and moving ExecTypeFromTableExpr to some better place. Not sure that moving it back to nodeFuncs is a nice idea. Looks to me like calling it from ExprTypmod is a rather ugly idea. Hmm, ruleutils ... not sure what to think of this one. The typedefs.list changes are just used to pgindent the affected code correctly. It's not for commit. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Here's version 17. I have made significant changes here.
1. Restructure the execQual code. Instead of a PG_TRY wrapper, I have
split this code in three pieces; there's the main code with the PG_TRY
wrappers and is mainly in charge of the builderCxt pointer. In the
previous coding there was a shim that examined builderCxt but was not
responsible for setting it up, which was ugly. The second part is the
"initializer" which sets the row and column filters and does namespace
processing. The third part is the "FetchRow" logic. It seems to me
much cleaner this way.
2. rename the "builder" stuff to use the "routine" terminology. This is
in line with what we do for other function-pointer-filled structs, such
as FdwRoutine, IndexAmRoutine etc. I also cleaned up the names a bit
more.
3. Added a magic number to the table builder context struct, so that we
can barf appropriately. This is in line with PgXmlErrorContext --
mostly for future-proofing. I didn't test this too hard. Also, moved
the XmlTableContext struct declaration nearer the top of the file, as is
customary. (We don't really need it that way, since the functions are
all declared taking void *, but it seems cleaner to me anyway).
4. I added, edited, and fixed a large number of code comments.
This is looking much better now, but it still needs at least the
following changes.
First, we need to fix is the per_rowset_memcxt thingy. I think the way
it's currently being used is rather ugly; it looks to me like the memory
context does not belong into the XmlTableContext struct at all.
Instead, the executor code should keep the memcxt pointer in a state
struct of its own, and it should be the executor's responsibility to
change to the appropriate context before calling the table builder
functions. In particular, this means that the table context can no
longer be a void * pointer; it needs to be a struct that's defined by
the executor (probably a primnodes.h one). The void * pointer is
stashed inside that struct. Also, the "routine" pointer should not be
part of the void * struct, but of the executor's struct. So the
execQual code can switch to the memory context, and destroy it
appropriately.
Second, we should make gram.y set a new "function type" value in the
TableExpr it creates, so that the downstream code (transformTableExpr,
ExecInitExpr, ruleutils.c) really knows that the given function is
XmlTableExpr, instead of guessing just because it's the only implemented
case. Probably this "function type" is an enum (currently with a single
value TableExprTypeXml or something like that) in primnodes.
Finally, there's the pending task of renaming and moving
ExecTypeFromTableExpr to some better place. Not sure that moving it
back to nodeFuncs is a nice idea. Looks to me like calling it from
ExprTypmod is a rather ugly idea.
Hmm, ruleutils ... not sure what to think of this one.
The typedefs.list changes are just used to pgindent the affected code
correctly. It's not for commit.
+ /* XXX OK to do this? looks a bit out of place ... */
+ assign_record_type_typmod(typeInfo);
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > This is looking much better now, but it still needs at least the > > following changes. > > > > First, we need to fix is the per_rowset_memcxt thingy. I think the way > > it's currently being used is rather ugly; it looks to me like the memory > > context does not belong into the XmlTableContext struct at all. > > Instead, the executor code should keep the memcxt pointer in a state > > struct of its own, and it should be the executor's responsibility to > > change to the appropriate context before calling the table builder > > functions. In particular, this means that the table context can no > > longer be a void * pointer; it needs to be a struct that's defined by > > the executor (probably a primnodes.h one). The void * pointer is > > stashed inside that struct. Also, the "routine" pointer should not be > > part of the void * struct, but of the executor's struct. So the > > execQual code can switch to the memory context, and destroy it > > appropriately. > > > > Second, we should make gram.y set a new "function type" value in the > > TableExpr it creates, so that the downstream code (transformTableExpr, > > ExecInitExpr, ruleutils.c) really knows that the given function is > > XmlTableExpr, instead of guessing just because it's the only implemented > > case. Probably this "function type" is an enum (currently with a single > > value TableExprTypeXml or something like that) in primnodes. > > It has sense - I was not sure about it - because currently it is only one > value, you mentioned it. True. This is a minor point. Are you able to do the memory context change I describe? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy. I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions. In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one). The void * pointer is
> > stashed inside that struct. Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct. So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case. Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
>
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.
True. This is a minor point.
Are you able to do the memory context change I describe?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
2016-12-03 16:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> 2016-12-02 23:25 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > This is looking much better now, but it still needs at least the
> > following changes.
> >
> > First, we need to fix is the per_rowset_memcxt thingy. I think the way
> > it's currently being used is rather ugly; it looks to me like the memory
> > context does not belong into the XmlTableContext struct at all.
> > Instead, the executor code should keep the memcxt pointer in a state
> > struct of its own, and it should be the executor's responsibility to
> > change to the appropriate context before calling the table builder
> > functions. In particular, this means that the table context can no
> > longer be a void * pointer; it needs to be a struct that's defined by
> > the executor (probably a primnodes.h one). The void * pointer is
> > stashed inside that struct. Also, the "routine" pointer should not be
> > part of the void * struct, but of the executor's struct. So the
> > execQual code can switch to the memory context, and destroy it
> > appropriately.
> >
> > Second, we should make gram.y set a new "function type" value in the
> > TableExpr it creates, so that the downstream code (transformTableExpr,
> > ExecInitExpr, ruleutils.c) really knows that the given function is
> > XmlTableExpr, instead of guessing just because it's the only implemented
> > case. Probably this "function type" is an enum (currently with a single
> > value TableExprTypeXml or something like that) in primnodes.
>
> It has sense - I was not sure about it - because currently it is only one
> value, you mentioned it.
True. This is a minor point.
Are you able to do the memory context change I describe?I am not sure if I understand well to your ideas - please, check attached patch.
RegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>: > > I am not sure if I understand well to your ideas - please, check attached > > patch. Thanks, that's what I meant, but I think you went a bit overboard creating new functions in execQual -- seems to me it would work just fine to have the memory switches in the same function, rather than having a number of separate functions just to change the context then call the method. Please remove these shim functions. Also, you forgot to remove the now-unused per_rowset_memcxt struct member. Also, please rename "rc" to something more meaningful -- maybe "rowcount" is good enough. And "doc" would perhaps be better as "document". I'm not completely sure the structs are really sensible yet. I may do some more changes tomorrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> 2016-12-04 23:00 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
> > I am not sure if I understand well to your ideas - please, check attached
> > patch.
Thanks, that's what I meant, but I think you went a bit overboard
creating new functions in execQual -- seems to me it would work just
fine to have the memory switches in the same function, rather than
having a number of separate functions just to change the context then
call the method. Please remove these shim functions.
Also, you forgot to remove the now-unused per_rowset_memcxt struct member.
Also, please rename "rc" to something more meaningful -- maybe
"rowcount" is good enough. And "doc" would perhaps be better as
"document".
I'm not completely sure the structs are really sensible yet. I may do
some more changes tomorrow.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Here's v21. * I changed the grammar by moving the NOT NULL to the column options, and removing the IsNotNull production. It wasn't nice that "NOT NULL DEFAULT 0" was not accepted, which it is with the new representation. * The tuple that's returned is natively a TupleTableSlot inside the table builder, not directly a HeapTuple. That stuff was ugly and wasn't using the proper abstraction anyway. * I changed the signatures of the methods so that they receive TableExprState, and restructured the "opaque" data to be inside TableExprState. Now we don't need to have things such as the tupdesc or the input functions be repeated in the opaque struct. Instead they belong to the TableExprState and the methods can read them from there. I managed to break the case with no COLUMNS. Probably related to the tupdesc changes. It now crashes the regression test. Too tired to debug now; care to take a look? The other stuff seems to run fine, though of course the regression test crashes in the middle, so perhaps there are other problems. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Here's v21.
* I changed the grammar by moving the NOT NULL to the column options,
and removing the IsNotNull production. It wasn't nice that "NOT NULL
DEFAULT 0" was not accepted, which it is with the new representation.
* The tuple that's returned is natively a TupleTableSlot inside the
table builder, not directly a HeapTuple. That stuff was ugly and wasn't
using the proper abstraction anyway.
* I changed the signatures of the methods so that they receive
TableExprState, and restructured the "opaque" data to be inside
TableExprState. Now we don't need to have things such as the tupdesc or
the input functions be repeated in the opaque struct. Instead they
belong to the TableExprState and the methods can read them from there.
I managed to break the case with no COLUMNS. Probably related to the
tupdesc changes. It now crashes the regression test. Too tired to
debug now; care to take a look? The other stuff seems to run fine,
though of course the regression test crashes in the middle, so perhaps
there are other problems.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > I fixed two issues. > > 2. there was reverse setting in NOT NULL flag Ah-hah, that was silly, thanks. > 1. there are not columns data when there are not any explicit column - fixed Hmm. Now that I see how this works, by having the GetValue "guess" what is going on and have a special case for it, I actually don't like it very much. It seems way too magical. I think we should do away with the "if column is NULL" case in GetValue, and instead inject a column during transformTableExpr if columns is NIL. This has implications on ExecInitExpr too, which currently checks for an empty column list -- it would no longer have to do so. Maybe this means we need an additional method, which would request "the expr that returns the whole row", so that transformExpr can work for XmlTable (which I think would be something like "./") and the future JsonTable stuff (I don't know how that one would work, but I assume it's not necessarily the same thing). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag
Ah-hah, that was silly, thanks.
> 1. there are not columns data when there are not any explicit column - fixed
Hmm. Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much. It seems way too magical. I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL. This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.
Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > Hmm. Now that I see how this works, by having the GetValue "guess" what > > is going on and have a special case for it, I actually don't like it > > very much. It seems way too magical. I think we should do away with > > the "if column is NULL" case in GetValue, and instead inject a column > > during transformTableExpr if columns is NIL. This has implications on > > ExecInitExpr too, which currently checks for an empty column list -- it > > would no longer have to do so. > > I prefer this way against second described. The implementation should be in > table builder routines, not in executor. Well, given the way you have implemented it, I would prefer the original too. But your v23 is not what I meant. Essentially what you do in v23 is to communicate the lack of COLUMNS clause in a different way -- previously it was "ncolumns = 0", now it's "is_auto_col=true". It's still "magic". It's not an improvement. What I want to happen is that there is no magic at all; it's up to transformExpr to make sure that when COLUMNS is empty, one column appears and it must not be a magic column that makes the xml.c code act differently, but rather to xml.c it should appear that this is just a normal column that happens to return the entire row. If I say "COLUMNS foo PATH '/'" I should be able to obtain a similar behavior (except that in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get XML at all but rather weird text where all tags have been stripped out, which is very strange. I would expect the tags to be preserved if the output type is XML. Maybe the tag-stripping behavior should occur if the output type is some type of text.) I still have to figure out how to fix the tupledesc thing. What we have now is not good. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote:
> 2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > Hmm. Now that I see how this works, by having the GetValue "guess" what
> > is going on and have a special case for it, I actually don't like it
> > very much. It seems way too magical. I think we should do away with
> > the "if column is NULL" case in GetValue, and instead inject a column
> > during transformTableExpr if columns is NIL. This has implications on
> > ExecInitExpr too, which currently checks for an empty column list -- it
> > would no longer have to do so.
>
> I prefer this way against second described. The implementation should be in
> table builder routines, not in executor.
Well, given the way you have implemented it, I would prefer the original
too. But your v23 is not what I meant. Essentially what you do in v23
is to communicate the lack of COLUMNS clause in a different way --
previously it was "ncolumns = 0", now it's "is_auto_col=true". It's
still "magic". It's not an improvement.
What I want to happen is that there is no magic at all; it's up to
transformExpr to make sure that when COLUMNS is empty, one column
appears and it must not be a magic column that makes the xml.c code act
differently, but rather to xml.c it should appear that this is just a
normal column that happens to return the entire row. If I say "COLUMNS
foo PATH '/'" I should be able to obtain a similar behavior (except that
in the current code, if I ask for "COLUMNS foo XML PATH '/'" I don't get
XML at all but rather weird text where all tags have been stripped out,
which is very strange. I would expect the tags to be preserved if the
output type is XML. Maybe the tag-stripping behavior should occur if
the output type is some type of text.)
I still have to figure out how to fix the tupledesc thing. What we have
now is not good.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag
Ah-hah, that was silly, thanks.
> 1. there are not columns data when there are not any explicit column - fixed
Hmm. Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much. It seems way too magical. I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL. This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.I prefer this way against second described. The implementation should be in table builder routines, not in executor.sending new update
RegardsPavel
Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
2016-12-07 20:37 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:2016-12-07 18:34 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> I fixed two issues.
>
> 2. there was reverse setting in NOT NULL flag
Ah-hah, that was silly, thanks.
> 1. there are not columns data when there are not any explicit column - fixed
Hmm. Now that I see how this works, by having the GetValue "guess" what
is going on and have a special case for it, I actually don't like it
very much. It seems way too magical. I think we should do away with
the "if column is NULL" case in GetValue, and instead inject a column
during transformTableExpr if columns is NIL. This has implications on
ExecInitExpr too, which currently checks for an empty column list -- it
would no longer have to do so.I prefer this way against second described. The implementation should be in table builder routines, not in executor.sending new updatenew update - no functional changes, just unbreaking after last changes in master.
RegardsPavelRegardsPavel
Maybe this means we need an additional method, which would request "the
expr that returns the whole row", so that transformExpr can work for
XmlTable (which I think would be something like "./") and the future
JsonTable stuff (I don't know how that one would work, but I assume it's
not necessarily the same thing).
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > another update - lot of cleaning Ah, the tupledesc stuff in this one appears much more reasonable to me. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > another update - lot of cleaning Thanks. The more I look at this, the less I like using NameArgExpr for namespaces. It looks all wrong to me, and it causes ugly code all over. Maybe I just need to look at it a bit longer. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > The more I look at this, the less I like using NameArgExpr for > namespaces. It looks all wrong to me, and it causes ugly code all over. > Maybe I just need to look at it a bit longer. I think it'd be cleaner to use ResTarget for the namespaces, like xml_attribute_el does, and split the names from actual exprs in the same way. So things like ExecInitExpr become simpler because you just recurse to initialize the list, without having to examine each element individually. tabexprInitialize can just do forboth(). The main reason I don't like abusing NamedArgExpr is that the whole comment that explains it becomes a lie. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers