Thread: Re: [HACKERS] patch: function xmltable
Hi
2017-01-16 23:51 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Given
https://www.postgresql.org/message-id/20170116210019.a3glfws pg5lnfrnm@alap3.anarazel.de
which is going to heavily change how the executor works in this area, I
am returning this patch to you again. I would like a few rather minor
changes:
1. to_xmlstr can be replaced with calls to xmlCharStrdup.
I checked this idea, and it doesn't look well - xmlCharStrdup created xml string in own memory - and it should be explicitly released with xmlFree(). In this case is more practical using PostgreSQL memory context - because this memory is released safely in exception. I can rename this function to more conventional pg_xmlCharStrndup. This function can be used more time in current code.
2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
(which returns text*) and extract the cstring from the varlena. It's
a bit more wasteful in terms of cycles, but I don't think we care.
If we do care, change the function so that it returns cstring, and
have the callers that want text wrap it in cstring_to_text.
done - it is related to not too often use case, and possible slowdown is minimal
3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
and switch to it just before GetValue() (reset it just before
switching). Then, don't worry about leaks in GetValue. This way,
the text* conversions et al don't matter.
done
After that I think we're going to need to get this working on top of
Andres' changes. Which I'm afraid is going to be rather major surgery,
but I haven't looked.
I am waiting on new commits in this area. This moment I have not idea what will be broken.
attached updated patch with cleaned xml part
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
New update - rebase after yesterday changes.
What you want to change?
Regards
Pavel
Attachment
Pavel Stehule wrote: > Hi > > New update - rebase after yesterday changes. > > What you want to change? I think the problem might come from the still pending patch on that series, which Andres posted in https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de As far as I understand, minor details of that patch might change before commit, but it is pretty much in definitive form. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?
I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154. aldebi7yyjvds5qa@alap3. anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.
ok, we have to wait - please, check XML part if it is good for you
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?
I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154. aldebi7yyjvds5qa@alap3. anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.
new rebased update after these changes
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?
I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7 yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.new rebased update after these changes
fix white spaces
pavel
RegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
2017-01-21 10:31 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:Hi2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?
I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7 yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.new rebased update after these changesfix white spaces
few fixes:
* SELECT (xmltable(..)).* + regress tests
* compilation and regress tests without --with-libxml
Regards
Pavel
pavelRegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > * SELECT (xmltable(..)).* + regress tests > * compilation and regress tests without --with-libxml Thanks. I just realized that this is doing more work than necessary -- I think it would be simpler to have tableexpr fill a tuplestore with the results, instead of just expecting function execution to apply ExecEvalExpr over and over to obtain the results. So evaluating a tableexpr returns just the tuplestore, which function evaluation can return as-is. That code doesn't use the value-per-call interface anyway. I also realized that the expr context callback is not called if there's an error, which leaves us without shutting down libxml properly. I added PG_TRY around the fetchrow calls, but I'm not sure that's correct either, because there could be an error raised in other parts of the code, after we've already emitted a few rows (for example out of memory). I think the right way is to have PG_TRY around the execution of the whole thing rather than just row at a time; and the tuplestore mechanism helps us with that. I think it would be good to have a more complex test case in regress -- let's say there is a table with some simple XML values, then we use XMLFOREST (or maybe one of the table_to_xml functions) to generate a large document, and then XMLTABLE uses that document as input document. Please fix. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote: > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext, > + bool *isnull); > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext, > + bool *isNull); > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext, > + bool *isNull); > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext, > + Datum doc); > +static void ShutdownTableExpr(Datum arg); To me this (and a lot of the other code) hints quite strongly that expression evalution is the wrong approach to implementing this. What you're essentially doing is building a vulcano style scan node. Even if we can this, we shouldn't double down on the bad decision to have these magic expressions that return multiple rows. There's historical reason for tSRFs, but we shouldn't add more weirdness like this. Andres
Andres Freund wrote: > Hi, > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote: > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext, > > + bool *isnull); > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext, > > + bool *isNull); > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext, > > + bool *isNull); > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext, > > + Datum doc); > > +static void ShutdownTableExpr(Datum arg); > > To me this (and a lot of the other code) hints quite strongly that > expression evalution is the wrong approach to implementing this. What > you're essentially doing is building a vulcano style scan node. Even if > we can this, we shouldn't double down on the bad decision to have these > magic expressions that return multiple rows. There's historical reason > for tSRFs, but we shouldn't add more weirdness like this. Thanks for giving it a look. I have long thought that this patch would be at odds with your overall executor work. XMLTABLE is specified by the standard to return multiple rows ... but then as far as my reading goes, it is only supposed to be supported in the range table (FROM clause) not in the target list. I wonder if this would end up better if we only tried to support it in RT. I asked Pavel to implement it like that a few weeks ago, but ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote: > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext, > > > + bool *isnull); > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext, > > > + bool *isNull); > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext, > > > + bool *isNull); > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext, > > > + Datum doc); > > > +static void ShutdownTableExpr(Datum arg); > > > > To me this (and a lot of the other code) hints quite strongly that > > expression evalution is the wrong approach to implementing this. What > > you're essentially doing is building a vulcano style scan node. Even if > > we can this, we shouldn't double down on the bad decision to have these > > magic expressions that return multiple rows. There's historical reason > > for tSRFs, but we shouldn't add more weirdness like this. > > Thanks for giving it a look. I have long thought that this patch would > be at odds with your overall executor work. Not fundamentally, but it makes it harder. > XMLTABLE is specified by the standard to return multiple rows ... but > then as far as my reading goes, it is only supposed to be supported in > the range table (FROM clause) not in the target list. I wonder if > this would end up better if we only tried to support it in RT. I asked > Pavel to implement it like that a few weeks ago, but ... Right - it makes sense in the FROM list - but then it should be an executor node, instead of some expression thingy. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: >> XMLTABLE is specified by the standard to return multiple rows ... but >> then as far as my reading goes, it is only supposed to be supported in >> the range table (FROM clause) not in the target list. I wonder if >> this would end up better if we only tried to support it in RT. I asked >> Pavel to implement it like that a few weeks ago, but ... > Right - it makes sense in the FROM list - but then it should be an > executor node, instead of some expression thingy. +1 --- we're out of the business of having simple expressions that return rowsets. regards, tom lane
Hi
2017-01-25 1:35 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> >
> > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > > + bool *isnull);
> > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > > + bool *isNull);
> > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > > + bool *isNull);
> > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > > + Datum doc);
> > > +static void ShutdownTableExpr(Datum arg);
> >
> > To me this (and a lot of the other code) hints quite strongly that
> > expression evalution is the wrong approach to implementing this. What
> > you're essentially doing is building a vulcano style scan node. Even if
> > we can this, we shouldn't double down on the bad decision to have these
> > magic expressions that return multiple rows. There's historical reason
> > for tSRFs, but we shouldn't add more weirdness like this.
>
> Thanks for giving it a look. I have long thought that this patch would
> be at odds with your overall executor work.
Not fundamentally, but it makes it harder.
If you plan to hold support SRFin target list, then nothing is different. In last patch is executed under nodeProjectSet.
> XMLTABLE is specified by the standard to return multiple rows ... but
> then as far as my reading goes, it is only supposed to be supported in
> the range table (FROM clause) not in the target list. I wonder if
> this would end up better if we only tried to support it in RT. I asked
> Pavel to implement it like that a few weeks ago, but ...
Right - it makes sense in the FROM list - but then it should be an
executor node, instead of some expression thingy.
The XMLTABLE function is from user perspective, from implementation perspective a form of SRF function. I use own executor node, because fcinfo is complex already and not too enough to hold all information about result columns.
The implementation as RT doesn't reduce code - it is just moving to different file.
I'll try to explain my motivation. Please, check it and correct me if I am wrong. I don't keep on my implementation - just try to implement XMLTABLE be consistent with another behave and be used all time without any surprise.
1. Any function that produces a content can be used in target list. We support SRF in target list and in FROM part. Why XMLTABLE should be a exception?
2. In standard the XMLTABLE is placed only on FROM part - but standard doesn't need to solve my question - there are not SRF functions allowed in targetlist.
If there be a common decision so this inconsistency (in behave of this kind of functions) is expected, required - then I have not a problem to remove this support from XMLTABLE.
In this moment I don't see a technical reason for this step - with last Andres changes the support of XMLTABLE in target list needs less than 40 lines and there is not any special path for XMLTABLE only. Andres write support for SRF functions and SRF operator. TableExpr is third category.
Regards
Pavel
Greetings,
Andres Freund
2017-01-25 5:45 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andres Freund <andres@anarazel.de> writes:
> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
>> XMLTABLE is specified by the standard to return multiple rows ... but
>> then as far as my reading goes, it is only supposed to be supported in
>> the range table (FROM clause) not in the target list. I wonder if
>> this would end up better if we only tried to support it in RT. I asked
>> Pavel to implement it like that a few weeks ago, but ...
> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.
+1 --- we're out of the business of having simple expressions that
return rowsets.
If we do decision so this kind of function will have different behave than other SRF functions, then I remove support for this.
There are not technical reasons (maybe I don't see it) - last Andres changes do well support for my code.
Regards
Pavel
regards, tom lane
2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml
Thanks. I just realized that this is doing more work than necessary --
?? I don't understand?
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results. So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is. That code doesn't use the value-per-call interface
anyway.
ok
I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly. I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory). I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.
ok.
I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.
Please fix.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > >> XMLTABLE is specified by the standard to return multiple rows ... but > >> then as far as my reading goes, it is only supposed to be supported in > >> the range table (FROM clause) not in the target list. I wonder if > >> this would end up better if we only tried to support it in RT. I asked > >> Pavel to implement it like that a few weeks ago, but ... > > > Right - it makes sense in the FROM list - but then it should be an > > executor node, instead of some expression thingy. > > +1 --- we're out of the business of having simple expressions that > return rowsets. Well, that's it. I'm not committing this patch against two other committers' opinion, plus I was already on the fence about the implementation anyway. I think you should just go with the flow and implement this by creating nodeTableexprscan.c. It's not even difficult. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2017-01-25 05:45:24 +0100, Pavel Stehule wrote: > 2017-01-25 1:35 GMT+01:00 Andres Freund <andres@anarazel.de>: > > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > > > Andres Freund wrote: > > > > Hi, > > > > > > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote: > > > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext > > *econtext, > > > > > + bool *isnull); > > > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, > > ExprContext *econtext, > > > > > + bool *isNull); > > > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext > > *econtext, > > > > > + bool *isNull); > > > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext > > *econtext, > > > > > + Datum doc); > > > > > +static void ShutdownTableExpr(Datum arg); > > > > > > > > To me this (and a lot of the other code) hints quite strongly that > > > > expression evalution is the wrong approach to implementing this. What > > > > you're essentially doing is building a vulcano style scan node. Even > > if > > > > we can this, we shouldn't double down on the bad decision to have these > > > > magic expressions that return multiple rows. There's historical reason > > > > for tSRFs, but we shouldn't add more weirdness like this. > > > > > > Thanks for giving it a look. I have long thought that this patch would > > > be at odds with your overall executor work. > > > > Not fundamentally, but it makes it harder. > > > > If you plan to hold support SRFin target list, then nothing is different. > In last patch is executed under nodeProjectSet. It is, because we suddenly need to call different functions - and I'm revamping most of execQual to have an opcode dispatch based execution model (which then also can be JITed). > > > XMLTABLE is specified by the standard to return multiple rows ... but > > > then as far as my reading goes, it is only supposed to be supported in > > > the range table (FROM clause) not in the target list. I wonder if > > > this would end up better if we only tried to support it in RT. I asked > > > Pavel to implement it like that a few weeks ago, but ... > > > > Right - it makes sense in the FROM list - but then it should be an > > executor node, instead of some expression thingy. > > > > The XMLTABLE function is from user perspective, from implementation > perspective a form of SRF function. I use own executor node, because fcinfo > is complex already and not too enough to hold all information about result > columns. > The implementation as RT doesn't reduce code - it is just moving to > different file. You're introducing a wholly separate callback system (TableExprRoutine) for the new functionality. And that stuff is excruciatingly close to stuff that the normal executor already knows how to do. > I'll try to explain my motivation. Please, check it and correct me if I am > wrong. I don't keep on my implementation - just try to implement XMLTABLE > be consistent with another behave and be used all time without any > surprise. > > 1. Any function that produces a content can be used in target list. We > support SRF in target list and in FROM part. Why XMLTABLE should be a > exception? targetlist SRFs were a big mistake. They cause a fair number of problems code-wise. They permeated for a long while into bits of both planner and executor, where they really shouldn't belong. Even after the recent changes there's a fair amount of uglyness associated with them. We can't remove tSRFs for backward compatibility reasons, but that's not true for XMLTABLE Greetings, Andres Freund
It is, because we suddenly need to call different functions - and I'm
> >
>
> If you plan to hold support SRFin target list, then nothing is different.
> In last patch is executed under nodeProjectSet.
revamping most of execQual to have an opcode dispatch based execution
model (which then also can be JITed).
> > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > then as far as my reading goes, it is only supposed to be supported in
> > > the range table (FROM clause) not in the target list. I wonder if
> > > this would end up better if we only tried to support it in RT. I asked
> > > Pavel to implement it like that a few weeks ago, but ...
> >
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> >
>
> The XMLTABLE function is from user perspective, from implementation
> perspective a form of SRF function. I use own executor node, because fcinfo
> is complex already and not too enough to hold all information about result
> columns.
> The implementation as RT doesn't reduce code - it is just moving to
> different file.
You're introducing a wholly separate callback system (TableExprRoutine)
for the new functionality. And that stuff is excruciatingly close to
stuff that the normal executor already knows how to do.
These callbacks are related to isolation TableExpr infrastructure and TableExpr implementation - This design is prepared for reusing for JSON_TABLE function.
Any placing of TableExpr code should not impact this callback system (Or I am absolutely out and executor is able do some work what is hidden to me).
> I'll try to explain my motivation. Please, check it and correct me if I am
> wrong. I don't keep on my implementation - just try to implement XMLTABLE
> be consistent with another behave and be used all time without any
> surprise.
>
> 1. Any function that produces a content can be used in target list. We
> support SRF in target list and in FROM part. Why XMLTABLE should be a
> exception?
targetlist SRFs were a big mistake. They cause a fair number of problems
code-wise. They permeated for a long while into bits of both planner and
executor, where they really shouldn't belong. Even after the recent
changes there's a fair amount of uglyness associated with them. We
can't remove tSRFs for backward compatibility reasons, but that's not
true for XMLTABLE
ok
I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement it partially :( - mainly for usage in "ROWS FROM ()"
Greetings,
Andres Freund
Hi, > > > I'll try to explain my motivation. Please, check it and correct me if I > > am > > > wrong. I don't keep on my implementation - just try to implement XMLTABLE > > > be consistent with another behave and be used all time without any > > > surprise. > > > > > > 1. Any function that produces a content can be used in target list. We > > > support SRF in target list and in FROM part. Why XMLTABLE should be a > > > exception? > > > > targetlist SRFs were a big mistake. They cause a fair number of problems > > code-wise. They permeated for a long while into bits of both planner and > > executor, where they really shouldn't belong. Even after the recent > > changes there's a fair amount of uglyness associated with them. We > > can't remove tSRFs for backward compatibility reasons, but that's not > > true for XMLTABLE > > > > > > > ok > > I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement > it partially :( - mainly for usage in "ROWS FROM ()" Huh? Greetings, Andres Freund
2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>:
Hi,
> > > I'll try to explain my motivation. Please, check it and correct me if I
> > am
> > > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > > be consistent with another behave and be used all time without any
> > > surprise.
> > >
> > > 1. Any function that produces a content can be used in target list. We
> > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > exception?
> >
> > targetlist SRFs were a big mistake. They cause a fair number of problems
> > code-wise. They permeated for a long while into bits of both planner and
> > executor, where they really shouldn't belong. Even after the recent
> > changes there's a fair amount of uglyness associated with them. We
> > can't remove tSRFs for backward compatibility reasons, but that's not
> > true for XMLTABLE
> >
> >
> >
> ok
>
> I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
> it partially :( - mainly for usage in "ROWS FROM ()"
The TableExpr implementation is based on SRF now. You and Alvaro propose independent implementation like generic executor node. I am sceptic so FunctionScan supports reading from generic executor node.
Regards
Pavel
Huh?
Greetings,
Andres Freund
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote: > 2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>: > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > > reimplement > > > it partially :( - mainly for usage in "ROWS FROM ()" > > > > The TableExpr implementation is based on SRF now. You and Alvaro propose > independent implementation like generic executor node. I am sceptic so > FunctionScan supports reading from generic executor node. Why would it need to?
2017-01-25 23:33 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>:
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
>
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.
Why would it need to?
Simply - due consistency with any other functions that can returns rows.
Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE function - TableExpr that looks like SRF function, has similar behave - returns more rows, but should be significantly different implemented, and should to have different limits - should not be used there and there ... It is hard to see consistency there for me.
Regards
Pavel
2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list. I wonder if
> >> this would end up better if we only tried to support it in RT. I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.
Well, that's it. I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway. I think you should just go with the flow and
implement this by creating nodeTableexprscan.c. It's not even
difficult.
I am playing with this and the patch looks about 15kB longer - just due implementation basic scan functionality - and I didn't touch a planner.
I am not happy from this - still I have a feeling so I try to reimplement reduced SRF.
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list. I wonder if
> >> this would end up better if we only tried to support it in RT. I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.
Well, that's it. I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway. I think you should just go with the flow and
implement this by creating nodeTableexprscan.c. It's not even
difficult.
I am sending new version - it is based on own executor scan node and tuplestore.
Some now obsolete regress tests removed, some new added.
The executor code (memory context usage) should be cleaned little bit - but other code should be good.
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > I am sending new version - it is based on own executor scan node and > tuplestore. > > Some now obsolete regress tests removed, some new added. > > The executor code (memory context usage) should be cleaned little bit - but > other code should be good. I think you forgot nodeTableFuncscan.c. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2017-01-30 20:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> I am sending new version - it is based on own executor scan node and
> tuplestore.
>
> Some now obsolete regress tests removed, some new added.
>
> The executor code (memory context usage) should be cleaned little bit - but
> other code should be good.
I think you forgot nodeTableFuncscan.c.
true, I am sorry
attached
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Jan 31, 2017 at 5:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote: > true, I am sorry Last status is a new patch and no reviews. On top of that this thread is quite active. So moved to next CF. Pavel, please be careful about the status of the patch on the CF app, it was set to "waiting on author"... -- Michael
2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml
Thanks. I just realized that this is doing more work than necessary --
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results. So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is. That code doesn't use the value-per-call interface
anyway.
I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly. I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory). I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.
I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.
I have a 16K lines long real XML 6.MB. Probably we would not to append it to regress tests.
It is really fast - original customer implementation 20min, nested our xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
I have a plan to create tests based on pg_proc and CTE - if all works, then the query must be empty
with x as (select proname, proowner, procost, pronargs, array_to_string(proargnames,',') as proargnames, array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs, proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y, lateral xmltable('/proc' passing proc columns proname name, proowner oid, procost float, pronargs int, proargnames text, proargtypes text)) select * from z except select * from x;
Please fix.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > I think it would be good to have a more complex test case in regress -- > > let's say there is a table with some simple XML values, then we use > > XMLFOREST (or maybe one of the table_to_xml functions) to generate a > > large document, and then XMLTABLE uses that document as input document. > > I have a 16K lines long real XML 6.MB. Probably we would not to append it > to regress tests. > > It is really fast - original customer implementation 20min, nested our > xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms That's great numbers, kudos for the hard work here. That will make for a nice headline in pg10 PR materials. But what I was getting at is that I would like to exercise a bit more of the expression handling in xmltable execution, to make sure it doesn't handle just string literals. > I have a plan to create tests based on pg_proc and CTE - if all works, then > the query must be empty > > with x as (select proname, proowner, procost, pronargs, > array_to_string(proargnames,',') as proargnames, > array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select > xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs, > proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y, > lateral xmltable('/proc' passing proc columns proname name, proowner oid, > procost float, pronargs int, proargnames text, proargtypes text)) select * > from z except select * from x; Nice one :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
>
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
>
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
That's great numbers, kudos for the hard work here. That will make for
a nice headline in pg10 PR materials. But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.
I'll try to write some more dynamic examples.
> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
>
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;
Nice one :-)
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
>
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
>
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms
That's great numbers, kudos for the hard work here. That will make for
a nice headline in pg10 PR materials. But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.
done
> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
>
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;
Nice one :-)
please see attached patch
* enhanced regress tests
* clean memory context work
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
please see attached patch* enhanced regress tests* clean memory context work
new update
fix a bug in string compare
fix some typo and obsolete comments
Regards
Pavel
RegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Hi
2017-02-16 6:38 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hiplease see attached patch* enhanced regress tests* clean memory context worknew updatefix a bug in string comparefix some typo and obsolete commentsRegards
some minor but interesting fix.
I found so some xml values imported via recv function can have inconsistency between header encoding and used encoding. Internally the header encoding is removed - see xml_out function.
So now, when I have to prepare data for libxml2, I don't do direct cast, but I use xml_out_internal instead. Maybe this technique should be used elsewhere? Same issue I see on xpath function.
Solved issue is not too often probably - the some different encoding than utf8 should be used in XML document and XML document should be loaded with recv function.
Regards
Pavel
PavelRegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I've been giving this a look. I started by tweaking the docs once again, and while verifying that the example works as expected, I replayed what I have in sgml: ... begin SGML paste ... <para> For example, given the following XML document: <screen><![CDATA[ <ROWS> <ROW id="1"> <COUNTRY_ID>AU</COUNTRY_ID> <COUNTRY_NAME>Australia</COUNTRY_NAME> </ROW> <ROW id="5"> <COUNTRY_ID>JP</COUNTRY_ID> <COUNTRY_NAME>Japan</COUNTRY_NAME> <PREMIER_NAME>Sinzo Abe</PREMIER_NAME> </ROW> <ROW id="6"> <COUNTRY_ID>SG</COUNTRY_ID> <COUNTRY_NAME>Singapore</COUNTRY_NAME> <SIZE unit="km">791</SIZE> </ROW> </ROWS> ]]></screen> the following query produces the result shown below: <screen><![CDATA[ SELECT xmltable.* FROM (SELECT data FROM xmldata) x, LATERAL xmltable('//ROWS/ROW' PASSINGdata COLUMNS id int PATH '@id', ordinality FOR ORDINALITY, country_name text PATH 'COUNTRY_NAME', country_id text PATH'COUNTRY_ID', size float PATH 'SIZE[@unit = "km"]/text()', unit text PATH 'SIZE/@unit', premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified'); ... end SGML paste ... But the query doesn't actually return a table, but instead it fails with this error: ERROR: invalid input syntax for type double precision: "" This is because of the "size" column (if I remove SIZE from the COLUMNS clause, the query returns correctly). Apparently, for the rows where SIZE is not given, we try to inssert an empty string instead of a NULL value, which is what I expected. I'm using your v44 code, but trimmed both the XML document used in SGML as well as modified the query slightly to show additional features. But those changes should not cause the above error ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
It is related to older variants of this patch, where I explicitly mapped empty strings to NULL.
I've been giving this a look. I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:
... begin SGML paste ...
<para>
For example, given the following XML document:
<screen><![CDATA[
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="km">791</SIZE>
</ROW>
</ROWS>
]]></screen>
the following query produces the result shown below:
<screen><![CDATA[
SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL xmltable('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME',
country_id text PATH 'COUNTRY_ID',
size float PATH 'SIZE[@unit = "km"]/text()',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
... end SGML paste ...
But the query doesn't actually return a table, but instead it fails with
this error:
ERROR: invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly). Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.
I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features. But
those changes should not cause the above error ...
The example in doc is obsolete. Following example works without problems.
SELECT xmltable.*
FROM (SELECT data FROM xmldata) x, LATERAL xmltable('//ROWS/ROW' PASSING data COLUMNS id int PATH '@id', ordinality FOR ORDINALITY, country_name text PATH 'COUNTRY_NAME', country_id text PATH 'COUNTRY_ID', size float PATH 'SIZE[@unit = "km"]', unit text PATH 'SIZE/@unit', premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
It is related to older variants of this patch, where I explicitly mapped empty strings to NULL.
Now, I don't do it - I use libxml2 result with following mapping
No tag ... NULL
empty tag ... empty string
Important question is about mapping empty tags to Postgres. I prefer current behave, because I have a possibility to differ between these states on application level. If we returns NULL for empty tag, then there will not be possible detect if XML has tag (although empty) or not. The change is simple - just one row - but I am thinking so current behave is better. There is possible risk of using /text() somewhere - it enforce a empty tag with all negative impacts.
I prefer to fix doc in conformance with regress tests and append note about mapping these corner cases from XML to relations.
What do you think about it?
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-02 8:04 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
"If the <literal>PATH</> matches an empty tag the result is an empty string"
Hi2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
I've been giving this a look. I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:
... begin SGML paste ...
<para>
For example, given the following XML document:
<screen><![CDATA[
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="km">791</SIZE>
</ROW>
</ROWS>
]]></screen>
the following query produces the result shown below:
<screen><![CDATA[
SELECT xmltable.*
FROM (SELECT data FROM xmldata) x,
LATERAL xmltable('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
country_name text PATH 'COUNTRY_NAME',
country_id text PATH 'COUNTRY_ID',
size float PATH 'SIZE[@unit = "km"]/text()',
unit text PATH 'SIZE/@unit',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
... end SGML paste ...
But the query doesn't actually return a table, but instead it fails with
this error:
ERROR: invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly). Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.
I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features. But
those changes should not cause the above error ...The example in doc is obsolete. Following example works without problems.SELECT xmltable.*FROM (SELECT data FROM xmldata) x, LATERAL xmltable('//ROWS/ROW' PASSING data COLUMNS id int PATH '@id', ordinality FOR ORDINALITY, country_name text PATH 'COUNTRY_NAME', country_id text PATH 'COUNTRY_ID', size float PATH 'SIZE[@unit = "km"]', unit text PATH 'SIZE/@unit', premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
It is related to older variants of this patch, where I explicitly mapped empty strings to NULL.Now, I don't do it - I use libxml2 result with following mappingNo tag ... NULLempty tag ... empty stringImportant question is about mapping empty tags to Postgres. I prefer current behave, because I have a possibility to differ between these states on application level. If we returns NULL for empty tag, then there will not be possible detect if XML has tag (although empty) or not. The change is simple - just one row - but I am thinking so current behave is better. There is possible risk of using /text() somewhere - it enforce a empty tag with all negative impacts.I prefer to fix doc in conformance with regress tests and append note about mapping these corner cases from XML to relations.What do you think about it?
It is documented already
Attached new patch
cleaned documentation
regress tests is more robust
appended comment in src related to generating empty string for empty tag
Regards
Pavel
RegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > It is documented already > > "If the <literal>PATH</> matches an empty tag the result is an empty string" Hmm, okay. But what we have here is not an empty tag, but a tag that is completely missing. I don't think those two cases should be treated in the same way ... > Attached new patch > > cleaned documentation > regress tests is more robust > appended comment in src related to generating empty string for empty tag Thanks, I incorporated those changes. Here's v46. I rewrote the documentation, and fixed a couple of incorrectly copied&pasted comments in the new executor code; I think that one looks good. In the future we could rewrite it to avoid the need for a tuplestore, but I think the current approach is good enough for a pg10 implementation. Barring serious problems, I intend to commit this later today. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Dne 2. 3. 2017 18:14 napsal uživatel "Alvaro Herrera" <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:Hmm, okay. But what we have here is not an empty tag, but a tag that is
> It is documented already
>
> "If the <literal>PATH</> matches an empty tag the result is an empty string"
completely missing. I don't think those two cases should be treated in
the same way ...
this information is not propagated from libxml2.
Thanks, I incorporated those changes. Here's v46. I rewrote the
> Attached new patch
>
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag
documentation, and fixed a couple of incorrectly copied&pasted comments
in the new executor code; I think that one looks good. In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.
Barring serious problems, I intend to commit this later today.
thank you very much
regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
So in the old (non-executor-node) implementation, you could attach WITH ORDINALITY to the xmltable expression and it would count the output rows, regardless of which XML document it comes from. With the new implementation, the grammar no longer accepts it. To count output rows, you still need to use row_number(). Maybe this is okay. This is the example from the docs, and I add another XML document with two more rows for xmltable. Look at the three numbering columns ... CREATE TABLE xmldata AS SELECT xml $$ <ROWS> <ROW id="1"> <COUNTRY_ID>AU</COUNTRY_ID> <COUNTRY_NAME>Australia</COUNTRY_NAME> </ROW> <ROW id="5"> <COUNTRY_ID>JP</COUNTRY_ID> <COUNTRY_NAME>Japan</COUNTRY_NAME> <PREMIER_NAME>Shinzo Abe</PREMIER_NAME> <SIZE unit="sq_mi">145935</SIZE></ROW> <ROW id="6"> <COUNTRY_ID>SG</COUNTRY_ID> <COUNTRY_NAME>Singapore</COUNTRY_NAME> <SIZEunit="sq_km">697</SIZE> </ROW> </ROWS> $$ AS data; insert into xmldata values ($$<ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME>Chile</COUNTRY_NAME></ROW><ROW id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME>Argentina</COUNTRY_NAME></ROW></ROWS>$$); SELECT ROW_NUMBER() OVER (), xmltable.* FROM xmldata, XMLTABLE('//ROWS/ROW' PASSING data COLUMNS id int PATH '@id', ordinality FOR ORDINALITY, "COUNTRY_NAME"text, country_id text PATH 'COUNTRY_ID', size_sq_km float PATH'SIZE[@unit = "sq_km"]', size_other text PATH 'concat(SIZE[@unit!="sq_km"]," ", SIZE[@unit!="sq_km"]/@unit)', premier_name text PATH 'PREMIER_NAME'DEFAULT 'not specified') ; row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │ size_other │ premier_name ────────────┼────┼────────────┼──────────────┼────────────┼────────────┼──────────────┼─────────────── 1 │ 1 │ 1 │ Australia │ AU │ │ │ not specified 2 │ 5 │ 2 │ Japan │ JP │ │ 145935 sq_mi │ Shinzo Abe 3 │ 6 │ 3 │ Singapore │ SG │ 697 │ │ not specified 4 │ 2 │ 1 │ Chile │ CL │ │ │ not specified 5 │ 3 │ 2 │ Argentina │ AR │ │ │ not specified -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
So in the old (non-executor-node) implementation, you could attach WITH
ORDINALITY to the xmltable expression and it would count the output
rows, regardless of which XML document it comes from. With the new
implementation, the grammar no longer accepts it. To count output rows,
you still need to use row_number(). Maybe this is okay. This is the
example from the docs, and I add another XML document with two more rows
for xmltable. Look at the three numbering columns ...
It is expected - now tablefunc are not special case of SRF, so it lost all SRF functionality. It is not critical lost - it supports internally FOR ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced to support WITH ORDINALITY in future, but I have not any use case for it.
Regards
Pavel
CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>
<ROW id="1">
<COUNTRY_ID>AU</COUNTRY_ID>
<COUNTRY_NAME>Australia</COUNTRY_NAME>
</ROW>
<ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>
<COUNTRY_NAME>Japan</COUNTRY_NAME>
<PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
<SIZE unit="sq_mi">145935</SIZE>
</ROW>
<ROW id="6">
<COUNTRY_ID>SG</COUNTRY_ID>
<COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZE unit="sq_km">697</SIZE>
</ROW>
</ROWS>
$$ AS data;
insert into xmldata values ($$
<ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME> Chile</COUNTRY_NAME></ROW>
<ROW id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME> Argentina</COUNTRY_NAME></ROW> </ROWS>$$);
SELECT ROW_NUMBER() OVER (), xmltable.*
FROM xmldata,
XMLTABLE('//ROWS/ROW'
PASSING data
COLUMNS id int PATH '@id',
ordinality FOR ORDINALITY,
"COUNTRY_NAME" text,
country_id text PATH 'COUNTRY_ID',
size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
size_other text PATH
'concat(SIZE[@unit!="sq_km"], " ", SIZE[@unit!="sq_km"]/@unit)',
premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified')
;
row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │ size_other │ premier_name
────────────┼────┼────────────┼──────────────┼────────────┼─ ───────────┼──────────────┼─── ────────────
1 │ 1 │ 1 │ Australia │ AU │ │ │ not specified
2 │ 5 │ 2 │ Japan │ JP │ │ 145935 sq_mi │ Shinzo Abe
3 │ 6 │ 3 │ Singapore │ SG │ 697 │ │ not specified
4 │ 2 │ 1 │ Chile │ CL │ │ │ not specified
5 │ 3 │ 2 │ Argentina │ AR │ │ │ not specified
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > > So in the old (non-executor-node) implementation, you could attach WITH > > ORDINALITY to the xmltable expression and it would count the output > > rows, regardless of which XML document it comes from. With the new > > implementation, the grammar no longer accepts it. To count output rows, > > you still need to use row_number(). Maybe this is okay. This is the > > example from the docs, and I add another XML document with two more rows > > for xmltable. Look at the three numbering columns ... > > > > It is expected - now tablefunc are not special case of SRF, so it lost all > SRF functionality. It is not critical lost - it supports internally FOR > ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced > to support WITH ORDINALITY in future, but I have not any use case for it. Fine. After looking at the new executor code a bit, I noticed that we don't need the resultSlot anymore; we can use the ss_ScanTupleSlot instead. Because resultSlot was being used in the xml.c code (which already appeared a bit dubious to me), I changed the interface so that instead the things that it read from it are passed as parameters -- namely, in InitBuilder we pass natts, and in GetValue we pass typid and typmod. Secondly, I noticed we have the FetchRow routine produce a minimal tuple, put it in a slot; then its caller takes the slot and put the tuple in the tuplestore. This is pointless; we can just have FetchRow put the tuple in the tuplestore directly and not bother with any slot manipulations there. This simplifies the code a bit. Here's v47 with those changes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
2017-03-02 22:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > So in the old (non-executor-node) implementation, you could attach WITH
> > ORDINALITY to the xmltable expression and it would count the output
> > rows, regardless of which XML document it comes from. With the new
> > implementation, the grammar no longer accepts it. To count output rows,
> > you still need to use row_number(). Maybe this is okay. This is the
> > example from the docs, and I add another XML document with two more rows
> > for xmltable. Look at the three numbering columns ...
> >
>
> It is expected - now tablefunc are not special case of SRF, so it lost all
> SRF functionality. It is not critical lost - it supports internally FOR
> ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> to support WITH ORDINALITY in future, but I have not any use case for it.
Fine.
After looking at the new executor code a bit, I noticed that we don't
need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
Because resultSlot was being used in the xml.c code (which already
appeared a bit dubious to me), I changed the interface so that instead
the things that it read from it are passed as parameters -- namely, in
InitBuilder we pass natts, and in GetValue we pass typid and typmod.
I had similar feeling
Secondly, I noticed we have the FetchRow routine produce a minimal
tuple, put it in a slot; then its caller takes the slot and put the
tuple in the tuplestore. This is pointless; we can just have FetchRow
put the tuple in the tuplestore directly and not bother with any slot
manipulations there. This simplifies the code a bit.
has sense
attached update with fixed tests
Regards
Pavel
Here's v47 with those changes.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Pavel Stehule wrote: > attached update with fixed tests Heh, I noticed that you removed the libxml "context" lines that differentiate xml.out from xml_2.out when doing this. My implementation emits those lines, so it was failing for me. I restored them. I also changed a few things to avoid copying into TableFuncScanState things that come from the TableFunc itself, since the executor state node can grab them from the plan node. Let's do that. So instead of "evalcols" the code now checks that the column list is empty; and also, read the ordinality column number from the plan node. I have to bounce this back to you one more time, hopefully the last one I hope. Two things: 1. Please verify that pg_stat_statements behaves correctly. The patch doesn't have changes to contrib/ so without testing I'm guessing that it doesn't work. I think something very simple should do. 2. As I've complained many times, I find the way we manage an empty COLUMNS clause pretty bad. The standard doesn't require that syntax (COLUMNS is required), and I don't like the implementation, so why not provide the feature in a different way? My proposal is to change the column options in gram.y to be something like this: xmltable_column_option_el: IDENT b_expr { $$ = makeDefElem($1, $2, @1); } | DEFAULT b_expr { $$ = makeDefElem("default", $2, @1); } | FULL VALUE_P { $$ = makeDefElem("full_value", NULL, @1); } | NOT NULL_P { $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); } | NULL_P { $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); } ; Note the FULL VALUE. Then we can process it like else if (strcmp(defel->defname, "full_value") == 0) { if (fc->colexpr != NULL) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("FULL ROW may not be specified together with PATH"), parser_errposition(defel->location))); fc->full_row = true; } So if you want the full XML value of the row, you have to specify it, .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... ) This has the extra feature that you can add, say, an ORDINALITY column together with the XML value, something that you cannot do with the current implementation. It doesn't have to be FULL VALUE, but I couldn't think of anything better. (I didn't want to add any new keywords for this.) If you have a better idea, let's discuss. Code-wise, this completely removes the "else" block in transformRangeTableFunc which I marked with an XXX comment. That's a good thing -- let's get rid of that. Also, it should remove the need for the separate "if !columns" case in tfuncLoadRows. All those cases would become part of the normal code path instead of special cases. I think XmlTableSetColumnFilter doesn't need any change (we just don't call if for the FULL VALUE row); and XmlTableGetValue needs a special case that if the column filter is NULL (i.e. SetColumnFilter wasn't called for that column) then return the whole row. Of course, this opens an implementation issue: how do you annotate things from parse analysis till execution? The current TableFunc structure doesn't help, because there are only lists of column names and expressions; and we can't use the case of a NULL colexpr, because that case is already used by the column filter being the column name (a feature required by the standard). A simple way would be to have a new "colno" struct member, to store a column number for the column marked FULL VALUE (just like ordinalitycol). This means you can't have more than one of those FULL VALUE columns, but that seems okay. (Of course, this means that the two cases that have no COLUMNS in the "xmltable" production in gram.y should go away). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> attached update with fixed tests
Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this. My implementation
emits those lines, so it was failing for me. I restored them.
I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node. Let's do that. So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.
I have to bounce this back to you one more time, hopefully the last one
I hope. Two things:
1. Please verify that pg_stat_statements behaves correctly. The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work. I think something very simple should do.
2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad. The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way? My proposal is to change the
column options in gram.y to be something like this:
The clause COLUMNS is optional on Oracle and DB2
So I prefer a Oracle, DB2 design. If you are strongly against it, then we can remove it to be ANSI/SQL only.
I am don't see an good idea to introduce third syntax.
xmltable_column_option_el:
IDENT b_expr
{ $$ = makeDefElem($1, $2, @1); }
| DEFAULT b_expr
{ $$ = makeDefElem("default", $2, @1); }
| FULL VALUE_P
{ $$ = makeDefElem("full_value", NULL, @1); }
| NOT NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
| NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
;
Note the FULL VALUE. Then we can process it like
else if (strcmp(defel->defname, "full_value") == 0)
{
if (fc->colexpr != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("FULL ROW may not be specified together with PATH"),
parser_errposition(defel->location)));
fc->full_row = true;
}
So if you want the full XML value of the row, you have to specify it,
.. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )
This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.
It doesn't have to be FULL VALUE, but I couldn't think of anything
better. (I didn't want to add any new keywords for this.) If you have
a better idea, let's discuss.
I don't see a introduction own syntax as necessary solution here - use Oracle, DB2 compatible syntax, or ANSI.
It is partially corner case - the benefit of this case is almost bigger compatibility with mentioned databases.
Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment. That's a good thing -- let's get
rid of that. Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows. All those cases would become part of
the normal code path instead of special cases. I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.
Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution? The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard). A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol). This means you can't have more
than one of those FULL VALUE columns, but that seems okay.
(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).
You are commiter, and you should to decide - as first I prefer current state, as second a remove this part - it should be good for you too, because code that you don't like will be left.
I dislike introduce new syntax - this case is not too important for this.
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-03 19:42 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:Pavel Stehule wrote:
> attached update with fixed tests
Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this. My implementation
emits those lines, so it was failing for me. I restored them.
I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node. Let's do that. So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.
I have to bounce this back to you one more time, hopefully the last one
I hope. Two things:
1. Please verify that pg_stat_statements behaves correctly. The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work. I think something very simple should do.
2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad. The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way? My proposal is to change the
column options in gram.y to be something like this:The clause COLUMNS is optional on Oracle and DB2So I prefer a Oracle, DB2 design. If you are strongly against it, then we can remove it to be ANSI/SQL only.I am don't see an good idea to introduce third syntax.
xmltable_column_option_el:
IDENT b_expr
{ $$ = makeDefElem($1, $2, @1); }
| DEFAULT b_expr
{ $$ = makeDefElem("default", $2, @1); }
| FULL VALUE_P
{ $$ = makeDefElem("full_value", NULL, @1); }
| NOT NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
| NULL_P
{ $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
;
Note the FULL VALUE. Then we can process it like
else if (strcmp(defel->defname, "full_value") == 0)
{
if (fc->colexpr != NULL)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("FULL ROW may not be specified together with PATH"),
parser_errposition(defel->location)));
fc->full_row = true;
}
So if you want the full XML value of the row, you have to specify it,
.. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )
This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.
It doesn't have to be FULL VALUE, but I couldn't think of anything
better. (I didn't want to add any new keywords for this.) If you have
a better idea, let's discuss.I don't see a introduction own syntax as necessary solution here - use Oracle, DB2 compatible syntax, or ANSI.It is partially corner case - the benefit of this case is almost bigger compatibility with mentioned databases.
Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment. That's a good thing -- let's get
rid of that. Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows. All those cases would become part of
the normal code path instead of special cases. I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.
Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution? The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard). A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol). This means you can't have more
than one of those FULL VALUE columns, but that seems okay.
(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).You are commiter, and you should to decide - as first I prefer current state, as second a remove this part - it should be good for you too, because code that you don't like will be left.I dislike introduce new syntax - this case is not too important for this.
I am able to prepare reduced version if we do a agreement
Regards
Pavel
RegardsPavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > 2. As I've complained many times, I find the way we manage an empty > > COLUMNS clause pretty bad. The standard doesn't require that syntax > > (COLUMNS is required), and I don't like the implementation, so why not > > provide the feature in a different way? My proposal is to change the > > column options in gram.y to be something like this: > > The clause COLUMNS is optional on Oracle and DB2 > > So I prefer a Oracle, DB2 design. If you are strongly against it, then we > can remove it to be ANSI/SQL only. > > I am don't see an good idea to introduce third syntax. OK. I think trying to be syntax compatible with DB2 or Oracle is a lost cause, because the syntax used in the XPath expressions seems different -- I think Oracle uses XQuery (which we don't support) and DB2 uses ... not sure what it is, but it doesn't work in our implementation (stuff like '$d/employees/emp' in the row expression.) In existing applications using those Oracle/DB2, is it common to omit the COLUMNS clause? I searched for "xmltable oracle" and had a look at the first few hits outside of the oracle docs: http://viralpatel.net/blogs/oracle-xmltable-tutorial/ http://www.dba-oracle.com/t_xmltable.htm http://stackoverflow.com/questions/12690868/how-to-use-xmltable-in-oracle https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533111800346252295 http://stackoverflow.com/questions/1222570/what-is-an-xmltable https://community.oracle.com/thread/3955198 Not a single one of these omit the COLUMNS clause (though the second one mentions that the clause can be omitted). I also looked at a few samples with DB2 -- same thing; it is possible, but is it common? Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML of the row, which I think is the feature I wanted, so I think we're covered and we can omit the case with no COLUMNS, since we already have the feature in another way. No need to implement anything further, and we can rip out the special case I don't like. Example: CREATE TABLE EMPLOYEES ( id integer, data XML ); INSERT INTO EMPLOYEES VALUES (1, '<Employees> <Employee emplid="1111" type="admin"> <firstname>John</firstname> <lastname>Watson</lastname> <age>30</age> <email>johnwatson@sh.com</email> </Employee> <Employee emplid="2222" type="admin"> <firstname>Sherlock</firstname> <lastname>Homes</lastname> <age>32</age> <email>sherlock@sh.com</email> </Employee> <Employee emplid="3333"type="user"> <firstname>Jim</firstname> <lastname>Moriarty</lastname> <age>52</age> <email>jim@sh.com</email> </Employee> <Employee emplid="4444" type="user"> <firstname>Mycroft</firstname> <lastname>Holmes</lastname> <age>41</age> <email>mycroft@sh.com</email> </Employee> </Employees>'); This is with COLUMNS omitted: alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data); xmltable ──────────────────────────────────────────<Employee emplid="1111" type="admin"> ↵ <firstname>John</firstname> ↵ <lastname>Watson</lastname> ↵ <age>30</age> ↵ <email>johnwatson@sh.com</email>↵ </Employee><Employee emplid="2222" type="admin"> ↵ <firstname>Sherlock</firstname>↵ <lastname>Homes</lastname> ↵ <age>32</age> ↵ <email>sherlock@sh.com</email> ↵ </Employee><Employee emplid="3333" type="user"> ↵ <firstname>Jim</firstname> ↵ <lastname>Moriarty</lastname> ↵ <age>52</age> ↵ <email>jim@sh.com</email> ↵ </Employee><Employee emplid="4444" type="user"> ↵ <firstname>Mycroft</firstname> ↵ <lastname>Holmes</lastname> ↵ <age>41</age> ↵ <email>mycroft@sh.com</email> ↵ </Employee> and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just for fun): alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data columns row_number for ordinality,emp xml path '.');row_number │ emp ────────────┼────────────────────────────────────────── 1 │ <Employee emplid="1111" type="admin"> ↵ │ <firstname>John</firstname> ↵ │ <lastname>Watson</lastname> ↵ │ <age>30</age> ↵ │ <email>johnwatson@sh.com</email>↵ │ </Employee> 2 │ <Employee emplid="2222" type="admin"> ↵ │ <firstname>Sherlock</firstname> ↵ │ <lastname>Homes</lastname> ↵ │ <age>32</age> ↵ │ <email>sherlock@sh.com</email> ↵ │ </Employee> 3 │ <Employee emplid="3333" type="user"> ↵ │ <firstname>Jim</firstname> ↵ │ <lastname>Moriarty</lastname> ↵ │ <age>52</age> ↵ │ <email>jim@sh.com</email> ↵ │ </Employee> 4 │ <Employee emplid="4444" type="user"> ↵ │ <firstname>Mycroft</firstname> ↵ │ <lastname>Holmes</lastname> ↵ │ <age>41</age> ↵ │ <email>mycroft@sh.com</email> ↵ │ </Employee> -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-03 21:04 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > 2. As I've complained many times, I find the way we manage an empty
> > COLUMNS clause pretty bad. The standard doesn't require that syntax
> > (COLUMNS is required), and I don't like the implementation, so why not
> > provide the feature in a different way? My proposal is to change the
> > column options in gram.y to be something like this:
>
> The clause COLUMNS is optional on Oracle and DB2
>
> So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> can remove it to be ANSI/SQL only.
>
> I am don't see an good idea to introduce third syntax.
OK. I think trying to be syntax compatible with DB2 or Oracle is a lost
cause, because the syntax used in the XPath expressions seems different
-- I think Oracle uses XQuery (which we don't support) and DB2 uses ...
not sure what it is, but it doesn't work in our implementation
(stuff like '$d/employees/emp' in the row expression.)
100% compatibility is not possible - but XPath is subset of XQuery and in reality - the full XQuery examples of XMLTABLE is not often.
Almost all examples of usage XMLTABLE, what I found in blogs, uses XPath only
In existing applications using those Oracle/DB2, is it common to omit
the COLUMNS clause? I searched for "xmltable oracle" and had a look at
the first few hits outside of the oracle docs:
http://viralpatel.net/blogs/oracle-xmltable-tutorial/
http://www.dba-oracle.com/t_xmltable.htm
http://stackoverflow.com/questions/12690868/how-to-use- xmltable-in-oracle
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_ QUESTION_ID: 9533111800346252295
http://stackoverflow.com/questions/1222570/what-is-an- xmltable
https://community.oracle.com/thread/3955198
Not a single one of these omit the COLUMNS clause (though the second one
mentions that the clause can be omitted).
I also looked at a few samples with DB2 -- same thing; it is possible,
but is it common?
I don't think so it is common - it is corner case - and I can live without it well
Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML
of the row, which I think is the feature I wanted, so I think we're
covered and we can omit the case with no COLUMNS, since we already have
the feature in another way. No need to implement anything further, and
we can rip out the special case I don't like. Example:
yes,
CREATE TABLE EMPLOYEES
(
id integer,
data XML
);
INSERT INTO EMPLOYEES
VALUES (1, '<Employees>
<Employee emplid="1111" type="admin">
<firstname>John</firstname>
<lastname>Watson</lastname>
<age>30</age>
<email>johnwatson@sh.com</email>
</Employee>
<Employee emplid="2222" type="admin">
<firstname>Sherlock</firstname>
<lastname>Homes</lastname>
<age>32</age>
<email>sherlock@sh.com</email>
</Employee>
<Employee emplid="3333" type="user">
<firstname>Jim</firstname>
<lastname>Moriarty</lastname>
<age>52</age>
<email>jim@sh.com</email>
</Employee>
<Employee emplid="4444" type="user">
<firstname>Mycroft</firstname>
<lastname>Holmes</lastname>
<age>41</age>
<email>mycroft@sh.com</email>
</Employee>
</Employees>');
This is with COLUMNS omitted:
alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data);
xmltable
──────────────────────────────────────────
<Employee emplid="1111" type="admin"> ↵
<firstname>John</firstname> ↵
<lastname>Watson</lastname> ↵
<age>30</age> ↵
<email>johnwatson@sh.com</email>↵
</Employee>
<Employee emplid="2222" type="admin"> ↵
<firstname>Sherlock</firstname> ↵
<lastname>Homes</lastname> ↵
<age>32</age> ↵
<email>sherlock@sh.com</email> ↵
</Employee>
<Employee emplid="3333" type="user"> ↵
<firstname>Jim</firstname> ↵
<lastname>Moriarty</lastname> ↵
<age>52</age> ↵
<email>jim@sh.com</email> ↵
</Employee>
<Employee emplid="4444" type="user"> ↵
<firstname>Mycroft</firstname> ↵
<lastname>Holmes</lastname> ↵
<age>41</age> ↵
<email>mycroft@sh.com</email> ↵
</Employee>
and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just
for fun):
alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data columns row_number for ordinality, emp xml path '.');
row_number │ emp
────────────┼──────────────────────────────────────────
1 │ <Employee emplid="1111" type="admin"> ↵
│ <firstname>John</firstname> ↵
│ <lastname>Watson</lastname> ↵
│ <age>30</age> ↵
│ <email>johnwatson@sh.com</email>↵
│ </Employee>
2 │ <Employee emplid="2222" type="admin"> ↵
│ <firstname>Sherlock</firstname> ↵
│ <lastname>Homes</lastname> ↵
│ <age>32</age> ↵
│ <email>sherlock@sh.com</email> ↵
│ </Employee>
3 │ <Employee emplid="3333" type="user"> ↵
│ <firstname>Jim</firstname> ↵
│ <lastname>Moriarty</lastname> ↵
│ <age>52</age> ↵
│ <email>jim@sh.com</email> ↵
│ </Employee>
4 │ <Employee emplid="4444" type="user"> ↵
│ <firstname>Mycroft</firstname> ↵
│ <lastname>Holmes</lastname> ↵
│ <age>41</age> ↵
│ <email>mycroft@sh.com</email> ↵
│ </Employee>
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi
I used your idea about special columns when COLUMNS are not explicitly defined.All lines that you are dislike removed.
Now, almost all code, related to this behave, is in next few lines.
+ /*
+ * Use implicit column when it is necessary. The COLUMNS clause is optional
+ * on Oracle and DB2. In this case a result is complete row of XML type.
+ */
+ if (rtf->columns == NIL)
+ {
+ RangeTableFuncCol *fc = makeNode(RangeTableFuncCol);
+ A_Const *n = makeNode(A_Const);
+
+ fc->colname = "xmltable";
+ fc->typeName = makeTypeNameFromOid(XMLOID, -1);
+ n->val.type = T_String;
+ n->val.val.str = ".";
+ n->location = -1;
+
+ fc->colexpr = (Node *) n;
+ rtf->columns = list_make1(fc);
+ }
Regards
Pavel
Attachment
Pavel Stehule wrote: > Hi > > I used your idea about special columns when COLUMNS are not explicitly > defined. > > All lines that you are dislike removed. I just pushed XMLTABLE, after some additional changes. Please test it thoroughly and report any problems. I didn't add the change you proposed here to keep COLUMNS optional; instead, I just made COLUMNS mandatory. I think what you propose here is not entirely out of the question, but you left out ruleutils.c support for it, so I decided to leave it aside for now so that I could get this patch out of my plate once and for all. If you really want that feature, you can submit another patch for it and discuss with the RMT whether it belongs in PG10 or not. Some changes I made: * I added some pg_stat_statements support. It works fine for simple tests, but deeper testing of it would be appreciated. * I removed the "buildercxt" memory context. It seemed mostly pointless, and I was disturbed by the MemoryContextResetOnly(). Per-value memory still uses the per-value memory context, but the rest of the stuff is in the per-query context, which should be pretty much the same. * Desultory stylistic changes -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> I used your idea about special columns when COLUMNS are not explicitly
> defined.
>
> All lines that you are dislike removed.
I just pushed XMLTABLE, after some additional changes. Please test it
thoroughly and report any problems.
Thank you
I didn't add the change you proposed here to keep COLUMNS optional;
instead, I just made COLUMNS mandatory. I think what you propose here
is not entirely out of the question, but you left out ruleutils.c
support for it, so I decided to leave it aside for now so that I could
get this patch out of my plate once and for all. If you really want
that feature, you can submit another patch for it and discuss with the
RMT whether it belongs in PG10 or not.
It is interesting feature - because it replaces XPATH function, but not important enough.
For daily work the default schema support is much more interesting.
Some changes I made:
* I added some pg_stat_statements support. It works fine for simple
tests, but deeper testing of it would be appreciated.
* I removed the "buildercxt" memory context. It seemed mostly
pointless, and I was disturbed by the MemoryContextResetOnly().
Per-value memory still uses the per-value memory context, but the rest
of the stuff is in the per-query context, which should be pretty much
the same.
* Desultory stylistic changes
ok
Regards
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > I didn't add the change you proposed here to keep COLUMNS optional; > > instead, I just made COLUMNS mandatory. I think what you propose here > > is not entirely out of the question, but you left out ruleutils.c > > support for it, so I decided to leave it aside for now so that I could > > get this patch out of my plate once and for all. If you really want > > that feature, you can submit another patch for it and discuss with the > > RMT whether it belongs in PG10 or not. > > It is interesting feature - because it replaces XPATH function, but not > important enough. OK. > For daily work the default schema support is much more interesting. Let's see that one, then. It was part of the original submission so depending on how the patch we looks can still cram it in. But other patches have priority for me now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> > I didn't add the change you proposed here to keep COLUMNS optional;
> > instead, I just made COLUMNS mandatory. I think what you propose here
> > is not entirely out of the question, but you left out ruleutils.c
> > support for it, so I decided to leave it aside for now so that I could
> > get this patch out of my plate once and for all. If you really want
> > that feature, you can submit another patch for it and discuss with the
> > RMT whether it belongs in PG10 or not.
>
> It is interesting feature - because it replaces XPATH function, but not
> important enough.
OK.
> For daily work the default schema support is much more interesting.
Let's see that one, then. It was part of the original submission so
depending on how the patch we looks can still cram it in. But other
patches have priority for me now.
It is theme for 11
Thank you very much
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Pavel Stehule wrote: > 2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>: > > > For daily work the default schema support is much more interesting. > > > > Let's see that one, then. It was part of the original submission so > > depending on how the patch we looks can still cram it in. But other > > patches have priority for me now. > > It is theme for 11 Ah, great. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services