Thread: Re: [HACKERS] patch: function xmltable

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-16 23:51 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Given
https://www.postgresql.org/message-id/20170116210019.a3glfwspg5lnfrnm@alap3.anarazel.de
which is going to heavily change how the executor works in this area, I
am returning this patch to you again.  I would like a few rather minor
changes:

1. to_xmlstr can be replaced with calls to xmlCharStrdup.

I checked this idea, and it doesn't look well - xmlCharStrdup created xml string in own memory - and it should be explicitly released with xmlFree(). In this case is more practical using PostgreSQL memory context - because this memory is released safely in exception. I can rename this function to more conventional pg_xmlCharStrndup. This function can be used more time in current code.
 
2. don't need xml_xmlnodetostr either -- just use xml_xmlnodetoxmltype
   (which returns text*) and extract the cstring from the varlena.  It's
   a bit more wasteful in terms of cycles, but I don't think we care.
   If we do care, change the function so that it returns cstring, and
   have the callers that want text wrap it in cstring_to_text.

done - it is related to not too often use case, and possible slowdown is minimal
 
3. have a new perValueCxt memcxt in TableExprState, child of buildercxt,
   and switch to it just before GetValue() (reset it just before
   switching).  Then, don't worry about leaks in GetValue.  This way,
   the text* conversions et al don't matter.

done
 

After that I think we're going to need to get this working on top of
Andres' changes.  Which I'm afraid is going to be rather major surgery,
but I haven't looked.

I am waiting on new commits in this area. This moment I have not idea what will be broken.

attached updated patch with cleaned xml part

Regards

Pavel 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

New update - rebase after yesterday changes.

What you want to change?

Regards

Pavel
Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> Hi
> 
> New update - rebase after yesterday changes.
> 
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

ok, we have to wait - please, check XML part if it is good for you

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

new rebased update after these changes

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

new rebased update after these changes

fix white spaces

pavel
 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-21 10:31 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-01-21 9:30 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2017-01-19 13:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> New update - rebase after yesterday changes.
>
> What you want to change?

I think the problem might come from the still pending patch on that
series, which Andres posted in
https://www.postgresql.org/message-id/20170118221154.aldebi7yyjvds5qa@alap3.anarazel.de
As far as I understand, minor details of that patch might change before
commit, but it is pretty much in definitive form.

new rebased update after these changes

fix white spaces

few fixes:

* SELECT (xmltable(..)).* + regress tests
* compilation and regress tests without --with-libxml

Regards

Pavel
 

pavel
 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:

> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml

Thanks.  I just realized that this is doing more work than necessary --
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results.  So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is.  That code doesn't use the value-per-call interface
anyway.

I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly.  I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory).  I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.

I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.

Please fix.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] patch: function xmltable

From
Andres Freund
Date:
Hi,

On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> +                  bool *isnull);
> +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> +                      bool *isNull);
> +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> +                bool *isNull);
> +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> +                  Datum doc);
> +static void ShutdownTableExpr(Datum arg);

To me this (and a lot of the other code) hints quite strongly that
expression evalution is the wrong approach to implementing this.  What
you're essentially doing is building a vulcano style scan node.  Even if
we can this, we shouldn't double down on the bad decision to have these
magic expressions that return multiple rows.  There's historical reason
for tSRFs, but we shouldn't add more weirdness like this.

Andres



Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Andres Freund wrote:
> Hi,
> 
> On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > +                  bool *isnull);
> > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > +                      bool *isNull);
> > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > +                bool *isNull);
> > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > +                  Datum doc);
> > +static void ShutdownTableExpr(Datum arg);
> 
> To me this (and a lot of the other code) hints quite strongly that
> expression evalution is the wrong approach to implementing this.  What
> you're essentially doing is building a vulcano style scan node.  Even if
> we can this, we shouldn't double down on the bad decision to have these
> magic expressions that return multiple rows.  There's historical reason
> for tSRFs, but we shouldn't add more weirdness like this.

Thanks for giving it a look.  I have long thought that this patch would
be at odds with your overall executor work.

XMLTABLE is specified by the standard to return multiple rows ... but
then as far as my reading goes, it is only supposed to be supported in
the range table (FROM clause) not in the target list.  I wonder if
this would end up better if we only tried to support it in RT.  I asked
Pavel to implement it like that a few weeks ago, but ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Andres Freund
Date:
On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > > +                  bool *isnull);
> > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > > +                      bool *isNull);
> > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > > +                bool *isNull);
> > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > > +                  Datum doc);
> > > +static void ShutdownTableExpr(Datum arg);
> > 
> > To me this (and a lot of the other code) hints quite strongly that
> > expression evalution is the wrong approach to implementing this.  What
> > you're essentially doing is building a vulcano style scan node.  Even if
> > we can this, we shouldn't double down on the bad decision to have these
> > magic expressions that return multiple rows.  There's historical reason
> > for tSRFs, but we shouldn't add more weirdness like this.
> 
> Thanks for giving it a look.  I have long thought that this patch would
> be at odds with your overall executor work.

Not fundamentally, but it makes it harder.


> XMLTABLE is specified by the standard to return multiple rows ... but
> then as far as my reading goes, it is only supposed to be supported in
> the range table (FROM clause) not in the target list.  I wonder if
> this would end up better if we only tried to support it in RT.  I asked
> Pavel to implement it like that a few weeks ago, but ...

Right - it makes sense in the FROM list - but then it should be an
executor node, instead of some expression thingy.

Greetings,

Andres Freund



Re: [HACKERS] patch: function xmltable

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
>> XMLTABLE is specified by the standard to return multiple rows ... but
>> then as far as my reading goes, it is only supposed to be supported in
>> the range table (FROM clause) not in the target list.  I wonder if
>> this would end up better if we only tried to support it in RT.  I asked
>> Pavel to implement it like that a few weeks ago, but ...

> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.

+1 --- we're out of the business of having simple expressions that
return rowsets.
        regards, tom lane



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-25 1:35 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> >
> > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> > > +                           bool *isnull);
> > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext *econtext,
> > > +                                   bool *isNull);
> > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > > +                         bool *isNull);
> > > +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> > > +                           Datum doc);
> > > +static void ShutdownTableExpr(Datum arg);
> >
> > To me this (and a lot of the other code) hints quite strongly that
> > expression evalution is the wrong approach to implementing this.  What
> > you're essentially doing is building a vulcano style scan node.  Even if
> > we can this, we shouldn't double down on the bad decision to have these
> > magic expressions that return multiple rows.  There's historical reason
> > for tSRFs, but we shouldn't add more weirdness like this.
>
> Thanks for giving it a look.  I have long thought that this patch would
> be at odds with your overall executor work.

Not fundamentally, but it makes it harder.

If you plan to hold support SRFin target list, then nothing is different. In last patch is executed under nodeProjectSet.
 


> XMLTABLE is specified by the standard to return multiple rows ... but
> then as far as my reading goes, it is only supposed to be supported in
> the range table (FROM clause) not in the target list.  I wonder if
> this would end up better if we only tried to support it in RT.  I asked
> Pavel to implement it like that a few weeks ago, but ...

Right - it makes sense in the FROM list - but then it should be an
executor node, instead of some expression thingy.

The XMLTABLE function is from user perspective, from implementation perspective a form of SRF function. I use own executor node, because fcinfo is complex already and not too enough to hold all information about result columns. 

The implementation as RT doesn't reduce code - it is just moving to different file. 

I'll try to explain my motivation. Please, check it and correct me if I am wrong. I don't keep on my implementation - just try to implement XMLTABLE be consistent with another behave and be used all time without any surprise. 

1. Any function that produces a content can be used in target list. We support SRF in target list and in FROM part. Why XMLTABLE should be a exception?

2. In standard the XMLTABLE is placed only on FROM part - but standard doesn't need to solve my question - there are not SRF functions allowed in targetlist.

If there be a common decision so this inconsistency (in behave of this kind of functions) is expected, required - then I have not a problem to remove this support from XMLTABLE.

In this moment I don't see a technical reason for this step - with last Andres changes the support of XMLTABLE in target list needs less than 40 lines and there is not any special path for XMLTABLE only. Andres write support for SRF functions and SRF operator. TableExpr is third category.

Regards

Pavel


Greetings,

Andres Freund

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-25 5:45 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Andres Freund <andres@anarazel.de> writes:
> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
>> XMLTABLE is specified by the standard to return multiple rows ... but
>> then as far as my reading goes, it is only supposed to be supported in
>> the range table (FROM clause) not in the target list.  I wonder if
>> this would end up better if we only tried to support it in RT.  I asked
>> Pavel to implement it like that a few weeks ago, but ...

> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.

+1 --- we're out of the business of having simple expressions that
return rowsets.

If we do decision so this kind of function will have different behave than other SRF functions, then I remove support for this.

There are not technical reasons (maybe I don't see it) - last Andres changes do well support for my code.

Regards

Pavel
 

                        regards, tom lane

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml

Thanks.  I just realized that this is doing more work than necessary --

?? I don't understand?
 
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results.  So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is.  That code doesn't use the value-per-call interface
anyway.

ok


I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly.  I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory).  I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.


ok.

 

I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.

Please fix.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list.  I wonder if
> >> this would end up better if we only tried to support it in RT.  I asked
> >> Pavel to implement it like that a few weeks ago, but ...
> 
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> 
> +1 --- we're out of the business of having simple expressions that
> return rowsets.

Well, that's it.  I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway.  I think you should just go with the flow and
implement this by creating nodeTableexprscan.c.  It's not even
difficult.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Andres Freund
Date:
Hi,

On 2017-01-25 05:45:24 +0100, Pavel Stehule wrote:
> 2017-01-25 1:35 GMT+01:00 Andres Freund <andres@anarazel.de>:
> 
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > +                           bool *isnull);
> > > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate,
> > ExprContext *econtext,
> > > > > +                                   bool *isNull);
> > > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > +                         bool *isNull);
> > > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > +                           Datum doc);
> > > > > +static void ShutdownTableExpr(Datum arg);
> > > >
> > > > To me this (and a lot of the other code) hints quite strongly that
> > > > expression evalution is the wrong approach to implementing this.  What
> > > > you're essentially doing is building a vulcano style scan node.  Even
> > if
> > > > we can this, we shouldn't double down on the bad decision to have these
> > > > magic expressions that return multiple rows.  There's historical reason
> > > > for tSRFs, but we shouldn't add more weirdness like this.
> > >
> > > Thanks for giving it a look.  I have long thought that this patch would
> > > be at odds with your overall executor work.
> >
> > Not fundamentally, but it makes it harder.
> >
> 
> If you plan to hold support SRFin target list, then nothing is different.
> In last patch is executed under nodeProjectSet.

It is, because we suddenly need to call different functions - and I'm
revamping most of execQual to have an opcode dispatch based execution
model (which then also can be JITed).


> > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > then as far as my reading goes, it is only supposed to be supported in
> > > the range table (FROM clause) not in the target list.  I wonder if
> > > this would end up better if we only tried to support it in RT.  I asked
> > > Pavel to implement it like that a few weeks ago, but ...
> >
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> >
> 
> The XMLTABLE function is from user perspective, from implementation
> perspective a form of SRF function. I use own executor node, because fcinfo
> is complex already and not too enough to hold all information about result
> columns.


> The implementation as RT doesn't reduce code - it is just moving to
> different file.

You're introducing a wholly separate callback system (TableExprRoutine)
for the new functionality.  And that stuff is excruciatingly close to
stuff that the normal executor already knows how to do.



> I'll try to explain my motivation. Please, check it and correct me if I am
> wrong. I don't keep on my implementation - just try to implement XMLTABLE
> be consistent with another behave and be used all time without any
> surprise.
> 
> 1. Any function that produces a content can be used in target list. We
> support SRF in target list and in FROM part. Why XMLTABLE should be a
> exception?

targetlist SRFs were a big mistake. They cause a fair number of problems
code-wise. They permeated for a long while into bits of both planner and
executor, where they really shouldn't belong. Even after the recent
changes there's a fair amount of uglyness associated with them.  We
can't remove tSRFs for backward compatibility reasons, but that's not
true for XMLTABLE


Greetings,

Andres Freund



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:



> >
>
> If you plan to hold support SRFin target list, then nothing is different.
> In last patch is executed under nodeProjectSet.

It is, because we suddenly need to call different functions - and I'm
revamping most of execQual to have an opcode dispatch based execution
model (which then also can be JITed). 

> > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > then as far as my reading goes, it is only supposed to be supported in
> > > the range table (FROM clause) not in the target list.  I wonder if
> > > this would end up better if we only tried to support it in RT.  I asked
> > > Pavel to implement it like that a few weeks ago, but ...
> >
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> >
>
> The XMLTABLE function is from user perspective, from implementation
> perspective a form of SRF function. I use own executor node, because fcinfo
> is complex already and not too enough to hold all information about result
> columns.


> The implementation as RT doesn't reduce code - it is just moving to
> different file.

You're introducing a wholly separate callback system (TableExprRoutine)
for the new functionality.  And that stuff is excruciatingly close to
stuff that the normal executor already knows how to do.

These callbacks are related to isolation TableExpr infrastructure and TableExpr implementation - This design is prepared for reusing for JSON_TABLE function.

Any placing of TableExpr code should not impact this callback system (Or I am absolutely out and executor is able do some work what is hidden to me). 
 



> I'll try to explain my motivation. Please, check it and correct me if I am
> wrong. I don't keep on my implementation - just try to implement XMLTABLE
> be consistent with another behave and be used all time without any
> surprise.
>
> 1. Any function that produces a content can be used in target list. We
> support SRF in target list and in FROM part. Why XMLTABLE should be a
> exception?

targetlist SRFs were a big mistake. They cause a fair number of problems
code-wise. They permeated for a long while into bits of both planner and
executor, where they really shouldn't belong. Even after the recent
changes there's a fair amount of uglyness associated with them.  We
can't remove tSRFs for backward compatibility reasons, but that's not
true for XMLTABLE



ok

I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement it partially :( - mainly for usage in "ROWS FROM ()" 
 


Greetings,

Andres Freund

Re: [HACKERS] patch: function xmltable

From
Andres Freund
Date:
Hi,

> > > I'll try to explain my motivation. Please, check it and correct me if I
> > am
> > > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > > be consistent with another behave and be used all time without any
> > > surprise.
> > >
> > > 1. Any function that produces a content can be used in target list. We
> > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > exception?
> >
> > targetlist SRFs were a big mistake. They cause a fair number of problems
> > code-wise. They permeated for a long while into bits of both planner and
> > executor, where they really shouldn't belong. Even after the recent
> > changes there's a fair amount of uglyness associated with them.  We
> > can't remove tSRFs for backward compatibility reasons, but that's not
> > true for XMLTABLE
> >
> >
> >
> ok
> 
> I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
> it partially :( - mainly for usage in "ROWS FROM ()"

Huh?

Greetings,

Andres Freund



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>:
Hi,

> > > I'll try to explain my motivation. Please, check it and correct me if I
> > am
> > > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > > be consistent with another behave and be used all time without any
> > > surprise.
> > >
> > > 1. Any function that produces a content can be used in target list. We
> > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > exception?
> >
> > targetlist SRFs were a big mistake. They cause a fair number of problems
> > code-wise. They permeated for a long while into bits of both planner and
> > executor, where they really shouldn't belong. Even after the recent
> > changes there's a fair amount of uglyness associated with them.  We
> > can't remove tSRFs for backward compatibility reasons, but that's not
> > true for XMLTABLE
> >
> >
> >
> ok
>
> I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
> it partially :( - mainly for usage in "ROWS FROM ()"

The TableExpr implementation is based on SRF now. You and Alvaro propose independent implementation like generic executor node. I am sceptic so FunctionScan supports reading from generic executor node.

Regards

Pavel


Huh?

Greetings,

Andres Freund

Re: [HACKERS] patch: function xmltable

From
Andres Freund
Date:
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>:
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
> 
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.

Why would it need to?



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-25 23:33 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund <andres@anarazel.de>:
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
>
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.

Why would it need to?

Simply - due consistency with any other functions that can returns rows.

Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE function - TableExpr that looks like SRF function, has similar behave - returns more rows, but should be significantly different implemented, and should to have different limits - should not be used there and there ... It is hard to see consistency there for me.

Regards

Pavel


Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list.  I wonder if
> >> this would end up better if we only tried to support it in RT.  I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.

Well, that's it.  I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway.  I think you should just go with the flow and
implement this by creating nodeTableexprscan.c.  It's not even
difficult.

I am playing with this and the patch looks about 15kB longer - just due implementation basic scan functionality - and I didn't touch a planner. 

I am not happy from this - still I have a feeling so I try to reimplement reduced SRF.  

Regards

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-25 15:07 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list.  I wonder if
> >> this would end up better if we only tried to support it in RT.  I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.

Well, that's it.  I'm not committing this patch against two other
committers' opinion, plus I was already on the fence about the
implementation anyway.  I think you should just go with the flow and
implement this by creating nodeTableexprscan.c.  It's not even
difficult.

I am sending new version - it is based on own executor scan node and tuplestore.

Some now obsolete regress tests removed, some new added. 

The executor code (memory context usage) should be cleaned little bit - but other code should be good.

Regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:

> I am sending new version - it is based on own executor scan node and
> tuplestore.
> 
> Some now obsolete regress tests removed, some new added.
> 
> The executor code (memory context usage) should be cleaned little bit - but
> other code should be good.

I think you forgot nodeTableFuncscan.c.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-30 20:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> I am sending new version - it is based on own executor scan node and
> tuplestore.
>
> Some now obsolete regress tests removed, some new added.
>
> The executor code (memory context usage) should be cleaned little bit - but
> other code should be good.

I think you forgot nodeTableFuncscan.c.

true, I am sorry

attached 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Michael Paquier
Date:
On Tue, Jan 31, 2017 at 5:18 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> true, I am sorry

Last status is a new patch and no reviews. On top of that this thread
is quite active. So moved to next CF. Pavel, please be careful about
the status of the patch on the CF app, it was set to "waiting on
author"...
-- 
Michael



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> * SELECT (xmltable(..)).* + regress tests
> * compilation and regress tests without --with-libxml

Thanks.  I just realized that this is doing more work than necessary --
I think it would be simpler to have tableexpr fill a tuplestore with the
results, instead of just expecting function execution to apply
ExecEvalExpr over and over to obtain the results.  So evaluating a
tableexpr returns just the tuplestore, which function evaluation can
return as-is.  That code doesn't use the value-per-call interface
anyway.

I also realized that the expr context callback is not called if there's
an error, which leaves us without shutting down libxml properly.  I
added PG_TRY around the fetchrow calls, but I'm not sure that's correct
either, because there could be an error raised in other parts of the
code, after we've already emitted a few rows (for example out of
memory).  I think the right way is to have PG_TRY around the execution
of the whole thing rather than just row at a time; and the tuplestore
mechanism helps us with that.

I think it would be good to have a more complex test case in regress --
let's say there is a table with some simple XML values, then we use
XMLFOREST (or maybe one of the table_to_xml functions) to generate a
large document, and then XMLTABLE uses that document as input document.

I have a 16K lines long real XML 6.MB. Probably we would not to append it to regress tests.

It is really fast - original customer implementation 20min, nested our xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

I have a plan to create tests based on pg_proc and CTE - if all works, then the query must be empty

with x as (select proname, proowner, procost, pronargs, array_to_string(proargnames,',') as proargnames, array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs, proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y, lateral xmltable('/proc' passing proc columns proname name, proowner oid, procost float, pronargs int, proargnames text, proargtypes text)) select * from z except select * from x;

 

Please fix.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
> 
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
> 
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

That's great numbers, kudos for the hard work here.  That will make for
a nice headline in pg10 PR materials.  But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.

> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
> 
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;

Nice one :-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
>
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
>
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

That's great numbers, kudos for the hard work here.  That will make for
a nice headline in pg10 PR materials.  But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.

I'll try to write some more dynamic examples.
 

> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
>
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;

Nice one :-)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-01-31 14:57 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-01-24 21:38 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > I think it would be good to have a more complex test case in regress --
> > let's say there is a table with some simple XML values, then we use
> > XMLFOREST (or maybe one of the table_to_xml functions) to generate a
> > large document, and then XMLTABLE uses that document as input document.
>
> I have a 16K lines long real XML 6.MB. Probably we would not to append it
> to regress tests.
>
> It is really fast - original customer implementation 20min, nested our
> xpath implementation 10 sec, PLPython xml reader 5 sec, xmltable 400ms

That's great numbers, kudos for the hard work here.  That will make for
a nice headline in pg10 PR materials.  But what I was getting at is that
I would like to exercise a bit more of the expression handling in
xmltable execution, to make sure it doesn't handle just string literals.

done
 

> I have a plan to create tests based on pg_proc and CTE - if all works, then
> the query must be empty
>
> with x as (select proname, proowner, procost, pronargs,
> array_to_string(proargnames,',') as proargnames,
> array_to_string(proargtypes,',') as proargtypes from pg_proc), y as (select
> xmlelement(name proc, xmlforest(proname, proowner, procost, pronargs,
> proargnames, proargtypes)) as proc from x), z as (select xmltable.* from y,
> lateral xmltable('/proc' passing proc columns proname name, proowner oid,
> procost float, pronargs int, proargnames text, proargtypes text)) select *
> from z except select * from x;

Nice one :-)

please see attached patch

* enhanced regress tests
* clean memory context work

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi


please see attached patch

* enhanced regress tests
* clean memory context work

new update 

fix a bug in string compare
fix some typo and obsolete comments

Regards

Pavel
 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-02-16 6:38 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi


please see attached patch

* enhanced regress tests
* clean memory context work

new update 

fix a bug in string compare
fix some typo and obsolete comments

Regards

some minor but interesting fix.

I found so some xml values imported via recv function can have inconsistency between header encoding and used encoding. Internally the header encoding is removed - see xml_out function. 

So now, when I have to prepare data for libxml2, I don't do direct cast, but I use xml_out_internal instead. Maybe this technique should be used elsewhere? Same issue I see on xpath function.

Solved issue is not too often probably - the some different encoding than utf8 should be used in XML document and XML document should be loaded with recv function.

Regards

Pavel

 

Pavel
 

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
I've been giving this a look.  I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:

... begin SGML paste ...   <para>    For example, given the following XML document: <screen><![CDATA[
<ROWS> <ROW id="1">   <COUNTRY_ID>AU</COUNTRY_ID>   <COUNTRY_NAME>Australia</COUNTRY_NAME> </ROW> <ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>  <COUNTRY_NAME>Japan</COUNTRY_NAME>   <PREMIER_NAME>Sinzo Abe</PREMIER_NAME> </ROW> <ROW
id="6">  <COUNTRY_ID>SG</COUNTRY_ID>   <COUNTRY_NAME>Singapore</COUNTRY_NAME>   <SIZE unit="km">791</SIZE> </ROW>
 
</ROWS>
]]></screen>
    the following query produces the result shown below:

<screen><![CDATA[
SELECT  xmltable.* FROM (SELECT data FROM xmldata) x,       LATERAL xmltable('//ROWS/ROW'
PASSINGdata                        COLUMNS id int PATH '@id',                                ordinality FOR ORDINALITY,
                              country_name text PATH 'COUNTRY_NAME',                                country_id text
PATH'COUNTRY_ID',                                size float PATH 'SIZE[@unit = "km"]/text()',
    unit text PATH 'SIZE/@unit',                                premier_name text PATH 'PREMIER_NAME' DEFAULT 'not
specified');
... end SGML paste ...


But the query doesn't actually return a table, but instead it fails with
this error: ERROR:  invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly).  Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.

I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features.  But
those changes should not cause the above error ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

I've been giving this a look.  I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:

... begin SGML paste ...
    <para>
     For example, given the following XML document:
  <screen><![CDATA[
<ROWS>
  <ROW id="1">
    <COUNTRY_ID>AU</COUNTRY_ID>
    <COUNTRY_NAME>Australia</COUNTRY_NAME>
  </ROW>
  <ROW id="5">
    <COUNTRY_ID>JP</COUNTRY_ID>
    <COUNTRY_NAME>Japan</COUNTRY_NAME>
    <PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
  </ROW>
  <ROW id="6">
    <COUNTRY_ID>SG</COUNTRY_ID>
    <COUNTRY_NAME>Singapore</COUNTRY_NAME>
    <SIZE unit="km">791</SIZE>
  </ROW>
</ROWS>
]]></screen>

     the following query produces the result shown below:

<screen><![CDATA[
SELECT  xmltable.*
  FROM (SELECT data FROM xmldata) x,
        LATERAL xmltable('//ROWS/ROW'
                         PASSING data
                         COLUMNS id int PATH '@id',
                                 ordinality FOR ORDINALITY,
                                 country_name text PATH 'COUNTRY_NAME',
                                 country_id text PATH 'COUNTRY_ID',
                                 size float PATH 'SIZE[@unit = "km"]/text()',
                                 unit text PATH 'SIZE/@unit',
                                 premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
... end SGML paste ...


But the query doesn't actually return a table, but instead it fails with
this error:
  ERROR:  invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly).  Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.

I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features.  But
those changes should not cause the above error ...

The example in doc is obsolete. Following example works without problems.

 SELECT xmltable.*
  FROM (SELECT data FROM xmldata) x,       LATERAL xmltable('//ROWS/ROW'                        PASSING data                        COLUMNS id int PATH '@id',                                ordinality FOR ORDINALITY,                                country_name text PATH 'COUNTRY_NAME',                                country_id text PATH 'COUNTRY_ID',                                size float PATH 'SIZE[@unit = "km"]',                                unit text PATH 'SIZE/@unit',                                premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');

It is related to older variants of this patch, where I explicitly mapped empty strings to NULL. 

Now, I don't do it - I use libxml2 result with following mapping

No tag ... NULL
empty tag ... empty string

Important question is about mapping empty tags to Postgres. I prefer current behave, because I have a possibility to differ between these states on application level. If we returns NULL for empty tag, then there will not be possible detect if XML has tag (although empty) or not. The change is simple - just one row - but I am thinking so current behave is better. There is possible risk of using /text() somewhere - it enforce a empty tag with all negative impacts.

I prefer to fix doc in conformance with regress tests and append note about mapping these corner cases from XML to relations.

What do you think about it?

Regards

Pavel









--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-02 8:04 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2017-03-02 1:12 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

I've been giving this a look.  I started by tweaking the docs once
again, and while verifying that the example works as expected, I
replayed what I have in sgml:

... begin SGML paste ...
    <para>
     For example, given the following XML document:
  <screen><![CDATA[
<ROWS>
  <ROW id="1">
    <COUNTRY_ID>AU</COUNTRY_ID>
    <COUNTRY_NAME>Australia</COUNTRY_NAME>
  </ROW>
  <ROW id="5">
    <COUNTRY_ID>JP</COUNTRY_ID>
    <COUNTRY_NAME>Japan</COUNTRY_NAME>
    <PREMIER_NAME>Sinzo Abe</PREMIER_NAME>
  </ROW>
  <ROW id="6">
    <COUNTRY_ID>SG</COUNTRY_ID>
    <COUNTRY_NAME>Singapore</COUNTRY_NAME>
    <SIZE unit="km">791</SIZE>
  </ROW>
</ROWS>
]]></screen>

     the following query produces the result shown below:

<screen><![CDATA[
SELECT  xmltable.*
  FROM (SELECT data FROM xmldata) x,
        LATERAL xmltable('//ROWS/ROW'
                         PASSING data
                         COLUMNS id int PATH '@id',
                                 ordinality FOR ORDINALITY,
                                 country_name text PATH 'COUNTRY_NAME',
                                 country_id text PATH 'COUNTRY_ID',
                                 size float PATH 'SIZE[@unit = "km"]/text()',
                                 unit text PATH 'SIZE/@unit',
                                 premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');
... end SGML paste ...


But the query doesn't actually return a table, but instead it fails with
this error:
  ERROR:  invalid input syntax for type double precision: ""
This is because of the "size" column (if I remove SIZE from the COLUMNS
clause, the query returns correctly).  Apparently, for the rows where
SIZE is not given, we try to inssert an empty string instead of a NULL
value, which is what I expected.

I'm using your v44 code, but trimmed both the XML document used in SGML
as well as modified the query slightly to show additional features.  But
those changes should not cause the above error ...

The example in doc is obsolete. Following example works without problems.

 SELECT xmltable.*
  FROM (SELECT data FROM xmldata) x,       LATERAL xmltable('//ROWS/ROW'                        PASSING data                        COLUMNS id int PATH '@id',                                ordinality FOR ORDINALITY,                                country_name text PATH 'COUNTRY_NAME',                                country_id text PATH 'COUNTRY_ID',                                size float PATH 'SIZE[@unit = "km"]',                                unit text PATH 'SIZE/@unit',                                premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified');

It is related to older variants of this patch, where I explicitly mapped empty strings to NULL. 

Now, I don't do it - I use libxml2 result with following mapping

No tag ... NULL
empty tag ... empty string

Important question is about mapping empty tags to Postgres. I prefer current behave, because I have a possibility to differ between these states on application level. If we returns NULL for empty tag, then there will not be possible detect if XML has tag (although empty) or not. The change is simple - just one row - but I am thinking so current behave is better. There is possible risk of using /text() somewhere - it enforce a empty tag with all negative impacts.

I prefer to fix doc in conformance with regress tests and append note about mapping these corner cases from XML to relations.

What do you think about it?

It is documented already

"If the <literal>PATH</> matches an empty tag the result is an empty string"

Attached new patch

cleaned documentation
regress tests is more robust
appended comment in src related to generating empty string for empty tag

Regards

Pavel
 

Regards

Pavel









--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:

> It is documented already
> 
> "If the <literal>PATH</> matches an empty tag the result is an empty string"

Hmm, okay.  But what we have here is not an empty tag, but a tag that is
completely missing.  I don't think those two cases should be treated in
the same way ...

> Attached new patch
> 
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes.  Here's v46.  I rewrote the
documentation, and fixed a couple of incorrectly copied&pasted comments
in the new executor code; I think that one looks good.  In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


Dne 2. 3. 2017 18:14 napsal uživatel "Alvaro Herrera" <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> It is documented already
>
> "If the <literal>PATH</> matches an empty tag the result is an empty string"

Hmm, okay.  But what we have here is not an empty tag, but a tag that is
completely missing.  I don't think those two cases should be treated in
the same way ...

this information is not propagated from libxml2.


> Attached new patch
>
> cleaned documentation
> regress tests is more robust
> appended comment in src related to generating empty string for empty tag

Thanks, I incorporated those changes.  Here's v46.  I rewrote the
documentation, and fixed a couple of incorrectly copied&pasted comments
in the new executor code; I think that one looks good.  In the future we
could rewrite it to avoid the need for a tuplestore, but I think the
current approach is good enough for a pg10 implementation.

Barring serious problems, I intend to commit this later today.


thank you very much

regards

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
So in the old (non-executor-node) implementation, you could attach WITH
ORDINALITY to the xmltable expression and it would count the output
rows, regardless of which XML document it comes from.  With the new
implementation, the grammar no longer accepts it.  To count output rows,
you still need to use row_number().  Maybe this is okay.  This is the
example from the docs, and I add another XML document with two more rows
for xmltable.  Look at the three numbering columns ...

CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>                       <ROW id="1">   <COUNTRY_ID>AU</COUNTRY_ID>   <COUNTRY_NAME>Australia</COUNTRY_NAME> </ROW>
                                   <ROW id="5">
<COUNTRY_ID>JP</COUNTRY_ID>  <COUNTRY_NAME>Japan</COUNTRY_NAME>   <PREMIER_NAME>Shinzo Abe</PREMIER_NAME>   <SIZE
unit="sq_mi">145935</SIZE></ROW> <ROW id="6">   <COUNTRY_ID>SG</COUNTRY_ID>   <COUNTRY_NAME>Singapore</COUNTRY_NAME>
<SIZEunit="sq_km">697</SIZE> </ROW>
 
</ROWS>
$$ AS data;
insert into xmldata values ($$<ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME>Chile</COUNTRY_NAME></ROW><ROW
id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME>Argentina</COUNTRY_NAME></ROW></ROWS>$$);

SELECT ROW_NUMBER() OVER (), xmltable.* FROM xmldata,
        XMLTABLE('//ROWS/ROW'                                                                                 PASSING
data              COLUMNS id int PATH '@id',                       ordinality FOR ORDINALITY,
"COUNTRY_NAME"text,                       country_id text PATH 'COUNTRY_ID',                       size_sq_km float
PATH'SIZE[@unit = "sq_km"]',                       size_other text PATH
'concat(SIZE[@unit!="sq_km"]," ", SIZE[@unit!="sq_km"]/@unit)',                       premier_name text PATH
'PREMIER_NAME'DEFAULT 'not specified')
 
;
row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │  size_other  │ premier_name  
────────────┼────┼────────────┼──────────────┼────────────┼────────────┼──────────────┼───────────────         1 │  1 │
        1 │ Australia    │ AU         │            │              │ not specified         2 │  5 │          2 │ Japan
    │ JP         │            │ 145935 sq_mi │ Shinzo Abe         3 │  6 │          3 │ Singapore    │ SG         │
  697 │              │ not specified         4 │  2 │          1 │ Chile        │ CL         │            │
│ not specified         5 │  3 │          2 │ Argentina    │ AR         │            │              │ not specified
 


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
So in the old (non-executor-node) implementation, you could attach WITH
ORDINALITY to the xmltable expression and it would count the output
rows, regardless of which XML document it comes from.  With the new
implementation, the grammar no longer accepts it.  To count output rows,
you still need to use row_number().  Maybe this is okay.  This is the
example from the docs, and I add another XML document with two more rows
for xmltable.  Look at the three numbering columns ...

It is expected - now tablefunc are not special case of SRF, so it lost all SRF functionality. It is not critical lost - it supports internally FOR ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced to support WITH ORDINALITY in future, but I have not any use case for it.

Regards

Pavel

 

CREATE TABLE xmldata AS SELECT
xml $$
<ROWS>
  <ROW id="1">
    <COUNTRY_ID>AU</COUNTRY_ID>
    <COUNTRY_NAME>Australia</COUNTRY_NAME>
  </ROW>
  <ROW id="5">
    <COUNTRY_ID>JP</COUNTRY_ID>
    <COUNTRY_NAME>Japan</COUNTRY_NAME>
    <PREMIER_NAME>Shinzo Abe</PREMIER_NAME>
    <SIZE unit="sq_mi">145935</SIZE>
  </ROW>
  <ROW id="6">
    <COUNTRY_ID>SG</COUNTRY_ID>
    <COUNTRY_NAME>Singapore</COUNTRY_NAME>
    <SIZE unit="sq_km">697</SIZE>
  </ROW>
</ROWS>
$$ AS data;

 insert into xmldata values ($$
 <ROWS><ROW id="2"><COUNTRY_ID>CL</COUNTRY_ID><COUNTRY_NAME>Chile</COUNTRY_NAME></ROW>
 <ROW id="3"><COUNTRY_ID>AR</COUNTRY_ID><COUNTRY_NAME>Argentina</COUNTRY_NAME></ROW></ROWS>$$);

SELECT ROW_NUMBER() OVER (), xmltable.*
  FROM xmldata,
       XMLTABLE('//ROWS/ROW'
                PASSING data
                COLUMNS id int PATH '@id',
                        ordinality FOR ORDINALITY,
                        "COUNTRY_NAME" text,
                        country_id text PATH 'COUNTRY_ID',
                        size_sq_km float PATH 'SIZE[@unit = "sq_km"]',
                        size_other text PATH
                             'concat(SIZE[@unit!="sq_km"], " ", SIZE[@unit!="sq_km"]/@unit)',
                        premier_name text PATH 'PREMIER_NAME' DEFAULT 'not specified')
;

 row_number │ id │ ordinality │ COUNTRY_NAME │ country_id │ size_sq_km │  size_other  │ premier_name
────────────┼────┼────────────┼──────────────┼────────────┼────────────┼──────────────┼───────────────
          1 │  1 │          1 │ Australia    │ AU         │            │              │ not specified
          2 │  5 │          2 │ Japan        │ JP         │            │ 145935 sq_mi │ Shinzo Abe
          3 │  6 │          3 │ Singapore    │ SG         │        697 │              │ not specified
          4 │  2 │          1 │ Chile        │ CL         │            │              │ not specified
          5 │  3 │          2 │ Argentina    │ AR         │            │              │ not specified


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
> 
> > So in the old (non-executor-node) implementation, you could attach WITH
> > ORDINALITY to the xmltable expression and it would count the output
> > rows, regardless of which XML document it comes from.  With the new
> > implementation, the grammar no longer accepts it.  To count output rows,
> > you still need to use row_number().  Maybe this is okay.  This is the
> > example from the docs, and I add another XML document with two more rows
> > for xmltable.  Look at the three numbering columns ...
> >
> 
> It is expected - now tablefunc are not special case of SRF, so it lost all
> SRF functionality. It is not critical lost - it supports internally FOR
> ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> to support WITH ORDINALITY in future, but I have not any use case for it.

Fine.

After looking at the new executor code a bit, I noticed that we don't
need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
Because resultSlot was being used in the xml.c code (which already
appeared a bit dubious to me), I changed the interface so that instead
the things that it read from it are passed as parameters -- namely, in
InitBuilder we pass natts, and in GetValue we pass typid and typmod.

Secondly, I noticed we have the FetchRow routine produce a minimal
tuple, put it in a slot; then its caller takes the slot and put the
tuple in the tuplestore.  This is pointless; we can just have FetchRow
put the tuple in the tuplestore directly and not bother with any slot
manipulations there.  This simplifies the code a bit.

Here's v47 with those changes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-02 22:35 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-02 19:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > So in the old (non-executor-node) implementation, you could attach WITH
> > ORDINALITY to the xmltable expression and it would count the output
> > rows, regardless of which XML document it comes from.  With the new
> > implementation, the grammar no longer accepts it.  To count output rows,
> > you still need to use row_number().  Maybe this is okay.  This is the
> > example from the docs, and I add another XML document with two more rows
> > for xmltable.  Look at the three numbering columns ...
> >
>
> It is expected - now tablefunc are not special case of SRF, so it lost all
> SRF functionality. It is not critical lost - it supports internally FOR
> ORDINALITY column, and classic ROW_NUMBER can be used. It can be enhanced
> to support WITH ORDINALITY in future, but I have not any use case for it.

Fine.

After looking at the new executor code a bit, I noticed that we don't
need the resultSlot anymore; we can use the ss_ScanTupleSlot instead.
Because resultSlot was being used in the xml.c code (which already
appeared a bit dubious to me), I changed the interface so that instead
the things that it read from it are passed as parameters -- namely, in
InitBuilder we pass natts, and in GetValue we pass typid and typmod.

I had similar feeling 

Secondly, I noticed we have the FetchRow routine produce a minimal
tuple, put it in a slot; then its caller takes the slot and put the
tuple in the tuplestore.  This is pointless; we can just have FetchRow
put the tuple in the tuplestore directly and not bother with any slot
manipulations there.  This simplifies the code a bit.


has sense

attached update with fixed tests

Regards

Pavel

 
Here's v47 with those changes.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:

> attached update with fixed tests

Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this.  My implementation
emits those lines, so it was failing for me.  I restored them.

I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node.  Let's do that.  So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.

I have to bounce this back to you one more time, hopefully the last one
I hope.  Two things:

1. Please verify that pg_stat_statements behaves correctly.  The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work.  I think something very simple should do.

2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad.  The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way?  My proposal is to change the
column options in gram.y to be something like this:

xmltable_column_option_el:
            IDENT b_expr
                { $$ = makeDefElem($1, $2, @1); }
            | DEFAULT b_expr
                { $$ = makeDefElem("default", $2, @1); }
            | FULL VALUE_P
                { $$ = makeDefElem("full_value", NULL, @1); }
            | NOT NULL_P
                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
            | NULL_P
                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
        ;

Note the FULL VALUE.  Then we can process it like

            else if (strcmp(defel->defname, "full_value") == 0)
            {
                if (fc->colexpr != NULL)
                    ereport(ERROR,
                            (errcode(ERRCODE_SYNTAX_ERROR),
                             errmsg("FULL ROW may not be specified together with PATH"),
                             parser_errposition(defel->location)));
                fc->full_row = true;
            }

So if you want the full XML value of the row, you have to specify it,

 .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )

This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.

It doesn't have to be FULL VALUE, but I couldn't think of anything
better.  (I didn't want to add any new keywords for this.)  If you have
a better idea, let's discuss.

Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment.  That's a good thing -- let's get
rid of that.  Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows.  All those cases would become part of
the normal code path instead of special cases.  I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.


Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution?  The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard).  A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol).  This means you can't have more
than one of those FULL VALUE columns, but that seems okay.


(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> attached update with fixed tests

Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this.  My implementation
emits those lines, so it was failing for me.  I restored them.

I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node.  Let's do that.  So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.

I have to bounce this back to you one more time, hopefully the last one
I hope.  Two things:

1. Please verify that pg_stat_statements behaves correctly.  The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work.  I think something very simple should do.

2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad.  The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way?  My proposal is to change the
column options in gram.y to be something like this:

The clause COLUMNS is optional on Oracle and DB2
 
So I prefer a Oracle, DB2 design. If you are strongly against it, then we can remove it to be ANSI/SQL only.

I am don't see an good idea to introduce third syntax.


xmltable_column_option_el:
                        IDENT b_expr
                                { $$ = makeDefElem($1, $2, @1); }
                        | DEFAULT b_expr
                                { $$ = makeDefElem("default", $2, @1); }
                        | FULL VALUE_P
                                { $$ = makeDefElem("full_value", NULL, @1); }
                        | NOT NULL_P
                                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
                        | NULL_P
                                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
                ;

Note the FULL VALUE.  Then we can process it like

                        else if (strcmp(defel->defname, "full_value") == 0)
                        {
                                if (fc->colexpr != NULL)
                                        ereport(ERROR,
                                                        (errcode(ERRCODE_SYNTAX_ERROR),
                                                         errmsg("FULL ROW may not be specified together with PATH"),
                                                         parser_errposition(defel->location)));
                                fc->full_row = true;
                        }

So if you want the full XML value of the row, you have to specify it,

 .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )

This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.

It doesn't have to be FULL VALUE, but I couldn't think of anything
better.  (I didn't want to add any new keywords for this.)  If you have
a better idea, let's discuss.

I don't see a introduction own syntax as necessary solution here - use Oracle, DB2 compatible syntax, or ANSI.

It is partially corner case - the benefit of this case is almost bigger compatibility with mentioned databases.
 

Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment.  That's a good thing -- let's get
rid of that.  Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows.  All those cases would become part of
the normal code path instead of special cases.  I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.


Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution?  The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard).  A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol).  This means you can't have more
than one of those FULL VALUE columns, but that seems okay.


(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).

You are commiter, and you should to decide - as first I prefer current state, as second a remove this part - it should be good for you too, because code that you don't like will be left.

I dislike introduce new syntax - this case is not too important for this.

Regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-03 19:42 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> attached update with fixed tests

Heh, I noticed that you removed the libxml "context" lines that
differentiate xml.out from xml_2.out when doing this.  My implementation
emits those lines, so it was failing for me.  I restored them.

I also changed a few things to avoid copying into TableFuncScanState
things that come from the TableFunc itself, since the executor state
node can grab them from the plan node.  Let's do that.  So instead of
"evalcols" the code now checks that the column list is empty; and also,
read the ordinality column number from the plan node.

I have to bounce this back to you one more time, hopefully the last one
I hope.  Two things:

1. Please verify that pg_stat_statements behaves correctly.  The patch
doesn't have changes to contrib/ so without testing I'm guessing that it
doesn't work.  I think something very simple should do.

2. As I've complained many times, I find the way we manage an empty
COLUMNS clause pretty bad.  The standard doesn't require that syntax
(COLUMNS is required), and I don't like the implementation, so why not
provide the feature in a different way?  My proposal is to change the
column options in gram.y to be something like this:

The clause COLUMNS is optional on Oracle and DB2
 
So I prefer a Oracle, DB2 design. If you are strongly against it, then we can remove it to be ANSI/SQL only.

I am don't see an good idea to introduce third syntax.


xmltable_column_option_el:
                        IDENT b_expr
                                { $$ = makeDefElem($1, $2, @1); }
                        | DEFAULT b_expr
                                { $$ = makeDefElem("default", $2, @1); }
                        | FULL VALUE_P
                                { $$ = makeDefElem("full_value", NULL, @1); }
                        | NOT NULL_P
                                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(true), @1); }
                        | NULL_P
                                { $$ = makeDefElem("is_not_null", (Node *) makeInteger(false), @1); }
                ;

Note the FULL VALUE.  Then we can process it like

                        else if (strcmp(defel->defname, "full_value") == 0)
                        {
                                if (fc->colexpr != NULL)
                                        ereport(ERROR,
                                                        (errcode(ERRCODE_SYNTAX_ERROR),
                                                         errmsg("FULL ROW may not be specified together with PATH"),
                                                         parser_errposition(defel->location)));
                                fc->full_row = true;
                        }

So if you want the full XML value of the row, you have to specify it,

 .. XMLTABLE ( ... COLUMNS ..., whole_row xml FULL VALUE, ... )

This has the extra feature that you can add, say, an ORDINALITY column
together with the XML value, something that you cannot do with the
current implementation.

It doesn't have to be FULL VALUE, but I couldn't think of anything
better.  (I didn't want to add any new keywords for this.)  If you have
a better idea, let's discuss.

I don't see a introduction own syntax as necessary solution here - use Oracle, DB2 compatible syntax, or ANSI.

It is partially corner case - the benefit of this case is almost bigger compatibility with mentioned databases.
 

Code-wise, this completely removes the "else" block in transformRangeTableFunc
which I marked with an XXX comment.  That's a good thing -- let's get
rid of that.  Also, it should remove the need for the separate "if
!columns" case in tfuncLoadRows.  All those cases would become part of
the normal code path instead of special cases.  I think
XmlTableSetColumnFilter doesn't need any change (we just don't call if
for the FULL VALUE row); and XmlTableGetValue needs a special case that
if the column filter is NULL (i.e. SetColumnFilter wasn't called for
that column) then return the whole row.


Of course, this opens an implementation issue: how do you annotate
things from parse analysis till execution?  The current TableFunc
structure doesn't help, because there are only lists of column names and
expressions; and we can't use the case of a NULL colexpr, because that
case is already used by the column filter being the column name (a
feature required by the standard).  A simple way would be to have a new
"colno" struct member, to store a column number for the column marked
FULL VALUE (just like ordinalitycol).  This means you can't have more
than one of those FULL VALUE columns, but that seems okay.


(Of course, this means that the two cases that have no COLUMNS in the
"xmltable" production in gram.y should go away).

You are commiter, and you should to decide - as first I prefer current state, as second a remove this part - it should be good for you too, because code that you don't like will be left.

I dislike introduce new syntax - this case is not too important for this.


I am able to prepare reduced version if we do a agreement

Regards

Pavel
 
Regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > 2. As I've complained many times, I find the way we manage an empty
> > COLUMNS clause pretty bad.  The standard doesn't require that syntax
> > (COLUMNS is required), and I don't like the implementation, so why not
> > provide the feature in a different way?  My proposal is to change the
> > column options in gram.y to be something like this:
> 
> The clause COLUMNS is optional on Oracle and DB2
> 
> So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> can remove it to be ANSI/SQL only.
> 
> I am don't see an good idea to introduce third syntax.

OK.  I think trying to be syntax compatible with DB2 or Oracle is a lost
cause, because the syntax used in the XPath expressions seems different
-- I think Oracle uses XQuery (which we don't support) and DB2 uses ...
not sure what it is, but it doesn't work in our implementation
(stuff like '$d/employees/emp' in the row expression.)

In existing applications using those Oracle/DB2, is it common to omit
the COLUMNS clause?  I searched for "xmltable oracle" and had a look at
the first few hits outside of the oracle docs:
http://viralpatel.net/blogs/oracle-xmltable-tutorial/
http://www.dba-oracle.com/t_xmltable.htm
http://stackoverflow.com/questions/12690868/how-to-use-xmltable-in-oracle
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533111800346252295
http://stackoverflow.com/questions/1222570/what-is-an-xmltable
https://community.oracle.com/thread/3955198

Not a single one of these omit the COLUMNS clause (though the second one
mentions that the clause can be omitted).

I also looked at a few samples with DB2 -- same thing; it is possible,
but is it common?

Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML
of the row, which I think is the feature I wanted, so I think we're
covered and we can omit the case with no COLUMNS, since we already have
the feature in another way.  No need to implement anything further, and
we can rip out the special case I don't like.  Example:

CREATE TABLE EMPLOYEES
(      id     integer,      data   XML
);
INSERT INTO EMPLOYEES    VALUES (1, '<Employees>   <Employee emplid="1111" type="admin">
<firstname>John</firstname>      <lastname>Watson</lastname>       <age>30</age>       <email>johnwatson@sh.com</email>
 </Employee>   <Employee emplid="2222" type="admin">       <firstname>Sherlock</firstname>
<lastname>Homes</lastname>      <age>32</age>       <email>sherlock@sh.com</email>   </Employee>   <Employee
emplid="3333"type="user">       <firstname>Jim</firstname>       <lastname>Moriarty</lastname>       <age>52</age>
<email>jim@sh.com</email>   </Employee>   <Employee emplid="4444" type="user">       <firstname>Mycroft</firstname>
 <lastname>Holmes</lastname>       <age>41</age>       <email>mycroft@sh.com</email>   </Employee>
 
</Employees>');

This is with COLUMNS omitted:

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data);                xmltable
          
 
──────────────────────────────────────────<Employee emplid="1111" type="admin">   ↵        <firstname>John</firstname>
  ↵        <lastname>Watson</lastname>     ↵        <age>30</age>                   ↵
<email>johnwatson@sh.com</email>↵   </Employee><Employee emplid="2222" type="admin">   ↵
<firstname>Sherlock</firstname>↵        <lastname>Homes</lastname>      ↵        <age>32</age>                   ↵
 <email>sherlock@sh.com</email>  ↵    </Employee><Employee emplid="3333" type="user">    ↵
<firstname>Jim</firstname>     ↵        <lastname>Moriarty</lastname>   ↵        <age>52</age>                   ↵
 <email>jim@sh.com</email>       ↵    </Employee><Employee emplid="4444" type="user">    ↵
<firstname>Mycroft</firstname> ↵        <lastname>Holmes</lastname>     ↵        <age>41</age>                   ↵
 <email>mycroft@sh.com</email>   ↵    </Employee>
 

and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just
for fun):

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data columns row_number for
ordinality,emp xml path '.');row_number │                   emp                    
 
────────────┼──────────────────────────────────────────         1 │ <Employee emplid="1111" type="admin">   ↵
│        <firstname>John</firstname>     ↵           │         <lastname>Watson</lastname>     ↵           │
<age>30</age>                  ↵           │         <email>johnwatson@sh.com</email>↵           │     </Employee>
  2 │ <Employee emplid="2222" type="admin">   ↵           │         <firstname>Sherlock</firstname> ↵           │
 <lastname>Homes</lastname>      ↵           │         <age>32</age>                   ↵           │
<email>sherlock@sh.com</email> ↵           │     </Employee>         3 │ <Employee emplid="3333" type="user">    ↵
    │         <firstname>Jim</firstname>      ↵           │         <lastname>Moriarty</lastname>   ↵           │
 <age>52</age>                   ↵           │         <email>jim@sh.com</email>       ↵           │     </Employee>
    4 │ <Employee emplid="4444" type="user">    ↵           │         <firstname>Mycroft</firstname>  ↵           │
   <lastname>Holmes</lastname>     ↵           │         <age>41</age>                   ↵           │
<email>mycroft@sh.com</email>  ↵           │     </Employee>
 

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-03 21:04 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-03 19:15 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > 2. As I've complained many times, I find the way we manage an empty
> > COLUMNS clause pretty bad.  The standard doesn't require that syntax
> > (COLUMNS is required), and I don't like the implementation, so why not
> > provide the feature in a different way?  My proposal is to change the
> > column options in gram.y to be something like this:
>
> The clause COLUMNS is optional on Oracle and DB2
>
> So I prefer a Oracle, DB2 design. If you are strongly against it, then we
> can remove it to be ANSI/SQL only.
>
> I am don't see an good idea to introduce third syntax.

OK.  I think trying to be syntax compatible with DB2 or Oracle is a lost
cause, because the syntax used in the XPath expressions seems different
-- I think Oracle uses XQuery (which we don't support) and DB2 uses ...
not sure what it is, but it doesn't work in our implementation
(stuff like '$d/employees/emp' in the row expression.)

100% compatibility is not possible - but XPath is subset of XQuery and in reality - the full XQuery examples of XMLTABLE is not often.

Almost all examples of usage XMLTABLE, what I found in blogs, uses XPath only
 

In existing applications using those Oracle/DB2, is it common to omit
the COLUMNS clause?  I searched for "xmltable oracle" and had a look at
the first few hits outside of the oracle docs:
http://viralpatel.net/blogs/oracle-xmltable-tutorial/
http://www.dba-oracle.com/t_xmltable.htm
http://stackoverflow.com/questions/12690868/how-to-use-xmltable-in-oracle
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9533111800346252295
http://stackoverflow.com/questions/1222570/what-is-an-xmltable
https://community.oracle.com/thread/3955198

Not a single one of these omit the COLUMNS clause (though the second one
mentions that the clause can be omitted).

I also looked at a few samples with DB2 -- same thing; it is possible,
but is it common?

I don't think so it is common - it is corner case - and I can live without it well
 

Anyway, I noticed that "xml PATH '.'" can be used to obtain the full XML
of the row, which I think is the feature I wanted, so I think we're
covered and we can omit the case with no COLUMNS, since we already have
the feature in another way.  No need to implement anything further, and
we can rip out the special case I don't like.  Example:

yes,
 

CREATE TABLE EMPLOYEES
(
       id     integer,
       data   XML
);
INSERT INTO EMPLOYEES
     VALUES (1, '<Employees>
    <Employee emplid="1111" type="admin">
        <firstname>John</firstname>
        <lastname>Watson</lastname>
        <age>30</age>
        <email>johnwatson@sh.com</email>
    </Employee>
    <Employee emplid="2222" type="admin">
        <firstname>Sherlock</firstname>
        <lastname>Homes</lastname>
        <age>32</age>
        <email>sherlock@sh.com</email>
    </Employee>
    <Employee emplid="3333" type="user">
        <firstname>Jim</firstname>
        <lastname>Moriarty</lastname>
        <age>52</age>
        <email>jim@sh.com</email>
    </Employee>
    <Employee emplid="4444" type="user">
        <firstname>Mycroft</firstname>
        <lastname>Holmes</lastname>
        <age>41</age>
        <email>mycroft@sh.com</email>
    </Employee>
</Employees>');

This is with COLUMNS omitted:

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data);
                 xmltable
──────────────────────────────────────────
 <Employee emplid="1111" type="admin">   ↵
         <firstname>John</firstname>     ↵
         <lastname>Watson</lastname>     ↵
         <age>30</age>                   ↵
         <email>johnwatson@sh.com</email>↵
     </Employee>
 <Employee emplid="2222" type="admin">   ↵
         <firstname>Sherlock</firstname> ↵
         <lastname>Homes</lastname>      ↵
         <age>32</age>                   ↵
         <email>sherlock@sh.com</email>  ↵
     </Employee>
 <Employee emplid="3333" type="user">    ↵
         <firstname>Jim</firstname>      ↵
         <lastname>Moriarty</lastname>   ↵
         <age>52</age>                   ↵
         <email>jim@sh.com</email>       ↵
     </Employee>
 <Employee emplid="4444" type="user">    ↵
         <firstname>Mycroft</firstname>  ↵
         <lastname>Holmes</lastname>     ↵
         <age>41</age>                   ↵
         <email>mycroft@sh.com</email>   ↵
     </Employee>

and this is what you get with "xml PATH '.'" (I threw in ORDINALITY just
for fun):

alvherre=# select xmltable.* from employees, xmltable('/Employees/Employee' passing data columns row_number for ordinality, emp xml path '.');
 row_number │                   emp
────────────┼──────────────────────────────────────────
          1 │ <Employee emplid="1111" type="admin">   ↵
            │         <firstname>John</firstname>     ↵
            │         <lastname>Watson</lastname>     ↵
            │         <age>30</age>                   ↵
            │         <email>johnwatson@sh.com</email>↵
            │     </Employee>
          2 │ <Employee emplid="2222" type="admin">   ↵
            │         <firstname>Sherlock</firstname> ↵
            │         <lastname>Homes</lastname>      ↵
            │         <age>32</age>                   ↵
            │         <email>sherlock@sh.com</email>  ↵
            │     </Employee>
          3 │ <Employee emplid="3333" type="user">    ↵
            │         <firstname>Jim</firstname>      ↵
            │         <lastname>Moriarty</lastname>   ↵
            │         <age>52</age>                   ↵
            │         <email>jim@sh.com</email>       ↵
            │     </Employee>
          4 │ <Employee emplid="4444" type="user">    ↵
            │         <firstname>Mycroft</firstname>  ↵
            │         <lastname>Holmes</lastname>     ↵
            │         <age>41</age>                   ↵
            │         <email>mycroft@sh.com</email>   ↵
            │     </Employee>

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:
Hi

I used your idea about special columns when COLUMNS are not explicitly defined.

All lines that you are dislike removed.

Now, almost all code, related to this behave, is in next few lines.

+   /*
+    * Use implicit column when it is necessary. The COLUMNS clause is optional
+    * on Oracle and DB2. In this case a result is complete row of XML type.
+    */
+   if (rtf->columns == NIL)
+   {
+       RangeTableFuncCol *fc = makeNode(RangeTableFuncCol);
+       A_Const *n = makeNode(A_Const);
+
+       fc->colname = "xmltable";
+       fc->typeName = makeTypeNameFromOid(XMLOID, -1);
+       n->val.type = T_String;
+       n->val.val.str = ".";
+       n->location = -1;
+
+       fc->colexpr = (Node *) n;
+       rtf->columns = list_make1(fc);
+   }

all regress tests passing.

Regards

Pavel


Attachment

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> Hi
> 
> I used your idea about special columns when COLUMNS are not explicitly
> defined.
> 
> All lines that you are dislike removed.

I just pushed XMLTABLE, after some additional changes.  Please test it
thoroughly and report any problems.

I didn't add the change you proposed here to keep COLUMNS optional;
instead, I just made COLUMNS mandatory.  I think what you propose here
is not entirely out of the question, but you left out ruleutils.c
support for it, so I decided to leave it aside for now so that I could
get this patch out of my plate once and for all.  If you really want
that feature, you can submit another patch for it and discuss with the
RMT whether it belongs in PG10 or not.

Some changes I made:
* I added some pg_stat_statements support.  It works fine for simple
tests, but deeper testing of it would be appreciated.

* I removed the "buildercxt" memory context.  It seemed mostly
pointless, and I was disturbed by the MemoryContextResetOnly().
Per-value memory still uses the per-value memory context, but the rest
of the stuff is in the per-query context, which should be pretty much
the same.

* Desultory stylistic changes

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> Hi
>
> I used your idea about special columns when COLUMNS are not explicitly
> defined.
>
> All lines that you are dislike removed.

I just pushed XMLTABLE, after some additional changes.  Please test it
thoroughly and report any problems.

Thank you 

I didn't add the change you proposed here to keep COLUMNS optional;
instead, I just made COLUMNS mandatory.  I think what you propose here
is not entirely out of the question, but you left out ruleutils.c
support for it, so I decided to leave it aside for now so that I could
get this patch out of my plate once and for all.  If you really want
that feature, you can submit another patch for it and discuss with the
RMT whether it belongs in PG10 or not.

It is interesting feature - because it replaces XPATH function, but not important enough.

For daily work the default schema support is much more interesting. 
 

Some changes I made:
* I added some pg_stat_statements support.  It works fine for simple
tests, but deeper testing of it would be appreciated.

* I removed the "buildercxt" memory context.  It seemed mostly
pointless, and I was disturbed by the MemoryContextResetOnly().
Per-value memory still uses the per-value memory context, but the rest
of the stuff is in the per-query context, which should be pretty much
the same.

* Desultory stylistic changes

ok

Regards

Pavel 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > I didn't add the change you proposed here to keep COLUMNS optional;
> > instead, I just made COLUMNS mandatory.  I think what you propose here
> > is not entirely out of the question, but you left out ruleutils.c
> > support for it, so I decided to leave it aside for now so that I could
> > get this patch out of my plate once and for all.  If you really want
> > that feature, you can submit another patch for it and discuss with the
> > RMT whether it belongs in PG10 or not.
> 
> It is interesting feature - because it replaces XPATH function, but not
> important enough.

OK.

> For daily work the default schema support is much more interesting.

Let's see that one, then.  It was part of the original submission so
depending on how the patch we looks can still cram it in.  But other
patches have priority for me now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] patch: function xmltable

From
Pavel Stehule
Date:


2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:
> 2017-03-08 17:01 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > I didn't add the change you proposed here to keep COLUMNS optional;
> > instead, I just made COLUMNS mandatory.  I think what you propose here
> > is not entirely out of the question, but you left out ruleutils.c
> > support for it, so I decided to leave it aside for now so that I could
> > get this patch out of my plate once and for all.  If you really want
> > that feature, you can submit another patch for it and discuss with the
> > RMT whether it belongs in PG10 or not.
>
> It is interesting feature - because it replaces XPATH function, but not
> important enough.

OK.

> For daily work the default schema support is much more interesting.

Let's see that one, then.  It was part of the original submission so
depending on how the patch we looks can still cram it in.  But other
patches have priority for me now.

It is theme for 11

Thank you very much

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] patch: function xmltable

From
Alvaro Herrera
Date:
Pavel Stehule wrote:
> 2017-03-08 17:32 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:

> > > For daily work the default schema support is much more interesting.
> >
> > Let's see that one, then.  It was part of the original submission so
> > depending on how the patch we looks can still cram it in.  But other
> > patches have priority for me now.
> 
> It is theme for 11

Ah, great.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services