Thread: How to share the result data of separated plan

How to share the result data of separated plan

From
Hitoshi Harada
Date:
While designing writeable CTEs, I found it needs a new infrastructure
that helps to share the result data among separated PlannedStmt.

Up to now, in a portal we only have multiple plans that share nothing
but relations, then executor starts, runs and ends separately.
Writeable CTEs need the result tuplestore of separated plans that were
executed before main query . In a prototype Marko and I made adds a
List of DerivedTable to Portal,  which holds each tuplestore and
tupledesc set and the previously-executed result of the child plans
are stored in these structure.

Here we have some problems:

- the process to make tuplestore receiver, to set it to executor of
child plans, to run it and to update snapshot is very common but very
local in pquery.c. We need similar copy in explain.c and functions.c
- we are not sure where to save this List of DerivedTable. In pquery.c
it is Portal, and we can see it in ExplainState in explan.c. But very
messy...
- Wait, isn't there any problem to share data without relations among
separated plans?

The most conservative way to solve it is to make it by rewriter not by
planner to separated executions with temporary tables instead of
tuplestore. In this way we don't do so much in planner and writeable
CTEs is only a rewrite process. The first reason why we are hacking
the planner not rewriter is that the main purpose of writeable CTEs is
the performance gained by using tuplestore instead of temporary
tables. In the second place, we will be able to extend it to combine
RECURSIVE queries (which may be discarded in the first version for
simplicity).

Please find attached patch (pquery.c only) as a prototype we are
working to see what is done about the problem. PlannedStmt now has
childStmt member to store separated child plans(PlannedStmts) and the
portal calls FillDerivedTables() to execute them and fill tuplestores.
Tuplestores are saved in the portal member ddts. They will be referred
by main query so they are passed to CreateQueryDesc() on the main
query execution.

I'm very sorry the explanation is poor, but if you have some thoughts,
please let us know.

Regards,

--
Hitoshi Harada

Attachment

Re: How to share the result data of separated plan

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> Up to now, in a portal we only have multiple plans that share nothing
> but relations, then executor starts, runs and ends separately.
> Writeable CTEs need the result tuplestore of separated plans that were
> executed before main query .

Um ... why?  I thought the whole point of breaking out ModifyTable
as a separate node type was so that a query involving writeable CTEs
would still be just one plan tree.
        regards, tom lane


Re: How to share the result data of separated plan

From
Marko Tiikkaja
Date:
On 2010-11-07 6:23 PM +0200, Tom Lane wrote:
> Hitoshi Harada<umi.tanuki@gmail.com>  writes:
>> Up to now, in a portal we only have multiple plans that share nothing
>> but relations, then executor starts, runs and ends separately.
>> Writeable CTEs need the result tuplestore of separated plans that were
>> executed before main query .
>
> Um ... why?  I thought the whole point of breaking out ModifyTable
> as a separate node type was so that a query involving writeable CTEs
> would still be just one plan tree.

We tried that for 9.0 and it didn't work.  Almost all work for 9.1 has 
been spent on creating an infrastructure for running the executor 
separately for every WITH list element when wCTEs are present.


Regards,
Marko Tiikkaja


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-11-07 6:23 PM +0200, Tom Lane wrote:
>> Um ... why?  I thought the whole point of breaking out ModifyTable
>> as a separate node type was so that a query involving writeable CTEs
>> would still be just one plan tree.

> We tried that for 9.0 and it didn't work.  Almost all work for 9.1 has 
> been spent on creating an infrastructure for running the executor 
> separately for every WITH list element when wCTEs are present.

I guess I shoulda been paying closer attention :-(.  That really, really
seems like fundamentally the wrong direction.  What was it that was
unfixable about the other way?  If it is unfixable, should we revert
ModifyTable?
        regards, tom lane


Re: How to share the result data of separated plan

From
Marko Tiikkaja
Date:
On 2010-11-07 8:08 PM +0200, Tom Lane wrote:
> Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi>  writes:
>> On 2010-11-07 6:23 PM +0200, Tom Lane wrote:
>>> Um ... why?  I thought the whole point of breaking out ModifyTable
>>> as a separate node type was so that a query involving writeable CTEs
>>> would still be just one plan tree.
>
>> We tried that for 9.0 and it didn't work.  Almost all work for 9.1 has
>> been spent on creating an infrastructure for running the executor
>> separately for every WITH list element when wCTEs are present.
>
> I guess I shoulda been paying closer attention :-(.  That really, really
> seems like fundamentally the wrong direction.  What was it that was
> unfixable about the other way?  If it is unfixable, should we revert
> ModifyTable?

The immediate problem at hand was that scan nodes cached the number of 
blocks a relation has and didn't see new tuples added by wCTEs.  That's 
not unfixable, but the consensus seemed to be that running the executor 
separately is better.

ModifyTable nodes still save us from duplicating a lot of code; you can 
just run an executor on a plan and expect it to work, even it's a DML 
statement.  Reverting it would be a huge mistake in my opinion.


Regards,
Marko Tiikkaja


Re: How to share the result data of separated plan

From
Robert Haas
Date:
On Sun, Nov 7, 2010 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> On 2010-11-07 6:23 PM +0200, Tom Lane wrote:
>>> Um ... why?  I thought the whole point of breaking out ModifyTable
>>> as a separate node type was so that a query involving writeable CTEs
>>> would still be just one plan tree.
>
>> We tried that for 9.0 and it didn't work.  Almost all work for 9.1 has
>> been spent on creating an infrastructure for running the executor
>> separately for every WITH list element when wCTEs are present.
>
> I guess I shoulda been paying closer attention :-(.  That really, really
> seems like fundamentally the wrong direction.  What was it that was
> unfixable about the other way?  If it is unfixable, should we revert
> ModifyTable?

The relevant thread is here:

http://archives.postgresql.org/pgsql-hackers/2010-02/msg00783.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How to share the result data of separated plan

From
Hitoshi Harada
Date:
2010/11/8 Tom Lane <tgl@sss.pgh.pa.us>:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> On 2010-11-07 6:23 PM +0200, Tom Lane wrote:
>>> Um ... why?  I thought the whole point of breaking out ModifyTable
>>> as a separate node type was so that a query involving writeable CTEs
>>> would still be just one plan tree.
>
>> We tried that for 9.0 and it didn't work.  Almost all work for 9.1 has
>> been spent on creating an infrastructure for running the executor
>> separately for every WITH list element when wCTEs are present.
>
> I guess I shoulda been paying closer attention :-(.  That really, really
> seems like fundamentally the wrong direction.  What was it that was
> unfixable about the other way?  If it is unfixable, should we revert
> ModifyTable?

Especially snapshot should not be updated during execution so we
decided to separate plans and restart executor.

But reading closer your response, it occurred to me that ModifyTable
can be the storage to be shared among separated plans instead of newly
invented DtScan. Are you suggesting embed ModifyTable into the main
query, that is also on the top of child plans? In this way we can
still separate plans and scan the result of child plans without much
modification of portal/explain?

Regards,

--
Hitoshi Harada


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Nov 7, 2010 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I guess I shoulda been paying closer attention :-(. �That really, really
>> seems like fundamentally the wrong direction. �What was it that was
>> unfixable about the other way? �If it is unfixable, should we revert
>> ModifyTable?

> The relevant thread is here:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00783.php

My opinion is still the same as here:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php

namely, that all we should be worrying about is a tuplestore full of
RETURNING tuples.  Any other side-effects of a DML subquery should
*not* be visible to the calling query, and therefore all this argument
about snapshots and seqscan limits is beside the point.
        regards, tom lane


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> But reading closer your response, it occurred to me that ModifyTable
> can be the storage to be shared among separated plans instead of newly
> invented DtScan. Are you suggesting embed ModifyTable into the main
> query, that is also on the top of child plans? In this way we can
> still separate plans and scan the result of child plans without much
> modification of portal/explain?

There is noplace where we need to "scan the result of child plans".
We need to scan the set of RETURNING tuples emitted by a child
ModifyTable node.  That has nothing whatever to do with what it wrote to
disk.  We would hardly wish to try to reconstruct those tuples from
re-reading what it wrote to disk anyway --- that idea fails completely
for a DELETE RETURNING subquery, for example.
        regards, tom lane


Re: How to share the result data of separated plan

From
Marko Tiikkaja
Date:
On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
> Robert Haas<robertmhaas@gmail.com>  writes:
>> On Sun, Nov 7, 2010 at 1:08 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> I guess I shoulda been paying closer attention :-(.  That really, really
>>> seems like fundamentally the wrong direction.  What was it that was
>>> unfixable about the other way?  If it is unfixable, should we revert
>>> ModifyTable?
>
>> The relevant thread is here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00783.php
>
> My opinion is still the same as here:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>
> namely, that all we should be worrying about is a tuplestore full of
> RETURNING tuples.  Any other side-effects of a DML subquery should
> *not* be visible to the calling query, and therefore all this argument
> about snapshots and seqscan limits is beside the point.

What happened to:

http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?


Regards,
Marko Tiikkaja


Re: How to share the result data of separated plan

From
Hitoshi Harada
Date:
2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Nov 7, 2010 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I guess I shoulda been paying closer attention :-(.  That really, really
>>> seems like fundamentally the wrong direction.  What was it that was
>>> unfixable about the other way?  If it is unfixable, should we revert
>>> ModifyTable?
>
>> The relevant thread is here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00783.php
>
> My opinion is still the same as here:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>
> namely, that all we should be worrying about is a tuplestore full of
> RETURNING tuples.  Any other side-effects of a DML subquery should
> *not* be visible to the calling query, and therefore all this argument
> about snapshots and seqscan limits is beside the point.

Current consensus says:

WITH x AS (SELECT count(*) FROM t), y AS (DELETE FROM t), z AS (SELECT
count(*) FROM t) SELECT x.count, z.count FROM x, z;

should return 0 for z.count but some number of original rows for
x.count. So I think you need to read the underlying table itself as
well as the emitted data of ModfyTable (or the result of any writeable
CTE queries) in *predictable order*. To make it happen, we need CCI in
each execution of child plans.

Regards,

--
Hitoshi Harada


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>> My opinion is still the same as here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>> 
>> namely, that all we should be worrying about is a tuplestore full of
>> RETURNING tuples.  Any other side-effects of a DML subquery should
>> *not* be visible to the calling query, and therefore all this argument
>> about snapshots and seqscan limits is beside the point.

> What happened to:
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?

On the whole, I think that's overspecifying the behavior.  Compare what
happens if you do the same thing in one query now; that is, you have
say an UPDATE query and within that you invoke some function that looks
directly at the target table.  Is it going to see a consistent view of
the data?  No, it's going to see a partially updated table.

The argument I'm making at the moment is that a query containing DML
CTEs should all execute with the same snapshot, meaning that no part of
the query should see the effects of any other part.  If, within the
query, you execute volatile functions that look at the target tables,
they will see an unspecified subset of the updates as having been
applied already.  This is the same as what happens with a volatile
function in an updating query now, and we've not heard many complaints
about it.

The alternative is to artificially serialize the DML CTEs, which
while it does have some advantages doesn't seem like a win overall.
In particular, defining that as the behavior will greatly constrain
our abilities to optimize anything in future.  I think that a typical
use-case will be something like
WITH u AS (UPDATE data_table SET ... RETURNING ...)INSERT INTO log_table SELECT * FROM u;

If that UPDATE touches a lot of rows, users will be unhappy if the
RETURNING data builds up in memory rather than getting streamed directly
to log_table.  (I'm not saying this has to work that way on day zero,
but I am thinking we'd better be able to do it eventually.)
        regards, tom lane


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> 2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
>> My opinion is still the same as here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php

> Current consensus says:

> WITH x AS (SELECT count(*) FROM t), y AS (DELETE FROM t), z AS (SELECT
> count(*) FROM t) SELECT x.count, z.count FROM x, z;

> should return 0 for z.count but some number of original rows for
> x.count.

Consensus according to who?  It's at least as consistent for all the
queries to start from the same snapshot, meaning that x and z would
produce the same results (independent of what y does).

It might be worth inspecting the SQL2011 draft to see if they provide
any guidance on what ought to happen here.
        regards, tom lane


Re: How to share the result data of separated plan

From
Hitoshi Harada
Date:
2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
> Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> 2010/11/9 Tom Lane <tgl@sss.pgh.pa.us>:
>>> My opinion is still the same as here:
>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>
>> Current consensus says:
>
>> WITH x AS (SELECT count(*) FROM t), y AS (DELETE FROM t), z AS (SELECT
>> count(*) FROM t) SELECT x.count, z.count FROM x, z;
>
>> should return 0 for z.count but some number of original rows for
>> x.count.
>
> Consensus according to who?  It's at least as consistent for all the
> queries to start from the same snapshot, meaning that x and z would
> produce the same results (independent of what y does).

Consensus according to the pgsql-hackers archive (I'll collect some
links if you want). But to tell the truth, while trying to implement
it I've been feeling it is really bad design to separate plans not
queries (as rewriter). Through overall the PostgreSQL as an SQL
engine, it is not designed to do so. Especially if you imagine letting
the CCI-style writeable CTEs in SQL functions makes disaster; SQL
functions prepare the plan on the function init and wCTEs need to
execute the child plan since it is necessary before the
CreateQueryDesc, then it results in that only child plans are executed
but main query are not in case of the function wasn't called. Very
inconsistent.

> It might be worth inspecting the SQL2011 draft to see if they provide
> any guidance on what ought to happen here.

I agree we need decision of what to do (or what we want) with this feature.


Regards,

--
Hitoshi Harada


Re: How to share the result data of separated plan

From
Marko Tiikkaja
Date:
On 2010-11-08 7:26 PM +0200, Tom Lane wrote:
> Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi>  writes:
>> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>>> My opinion is still the same as here:
>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>>>
>>> namely, that all we should be worrying about is a tuplestore full of
>>> RETURNING tuples.  Any other side-effects of a DML subquery should
>>> *not* be visible to the calling query, and therefore all this argument
>>> about snapshots and seqscan limits is beside the point.
>
>> What happened to:
>> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?
>
> The alternative is to artificially serialize the DML CTEs, which
> while it does have some advantages doesn't seem like a win overall.
> In particular, defining that as the behavior will greatly constrain
> our abilities to optimize anything in future.  I think that a typical
> use-case will be something like
>
>     WITH u AS (UPDATE data_table SET ... RETURNING ...)
>     INSERT INTO log_table SELECT * FROM u;
>
> If that UPDATE touches a lot of rows, users will be unhappy if the
> RETURNING data builds up in memory rather than getting streamed directly
> to log_table.  (I'm not saying this has to work that way on day zero,
> but I am thinking we'd better be able to do it eventually.)

We've discussed this before and the consensus was that as long as we 
don't change the results, we can optimize the materialization away. 
This seems to be a textbook example of such a case.


Regards,
Marko Tiikkaja


Re: How to share the result data of separated plan

From
Robert Haas
Date:
On Mon, Nov 8, 2010 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>>> My opinion is still the same as here:
>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>>>
>>> namely, that all we should be worrying about is a tuplestore full of
>>> RETURNING tuples.  Any other side-effects of a DML subquery should
>>> *not* be visible to the calling query, and therefore all this argument
>>> about snapshots and seqscan limits is beside the point.
>
>> What happened to:
>> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?
>
> On the whole, I think that's overspecifying the behavior.  Compare what
> happens if you do the same thing in one query now; that is, you have
> say an UPDATE query and within that you invoke some function that looks
> directly at the target table.  Is it going to see a consistent view of
> the data?  No, it's going to see a partially updated table.
>
> The argument I'm making at the moment is
> [exactly the opposite of what I said before]

You've now vetoed both "using the same snapshot" and "using different
snapshots" at different points in time.  Had you taken your current
position at this time a year ago, we might well have this feature in
9.0.  I guess that's water under the bridge at this point, but I
believe that a pretty substantial amount of work has gone into
whacking this patch around based on your previously-expressed views as
to how it should be implemented, and they seem to have just shifted
again.

Frankly, I buy both arguments.  I think that if we use the same
snapshot, we are going to get people complaining about really screwy
behavior if the same table is used more than once in the same query
(BEFORE triggers already make my head explode).  I think if we use
different snapshots, optimization is going to be rather difficult, and
people are going to complain about the performance.  So I don't know
what the right thing to do is, but it is not the case that we are
designing this tabula rasa.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-11-08 7:26 PM +0200, Tom Lane wrote:
>> The alternative is to artificially serialize the DML CTEs, which
>> while it does have some advantages doesn't seem like a win overall.

> We've discussed this before and the consensus was that as long as we 
> don't change the results, we can optimize the materialization away. 

No, because the problem is mainly about what might happen if
user-defined functions choose to look at the target tables.  We can't
really tell what triggers are going to do, to take one item that the
planner has no access to.

I think we have to decide up front which implementation behavior
it's going to be, and if we go with serialized queries, we'll be
stuck with that.
        regards, tom lane


Re: How to share the result data of separated plan

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> You've now vetoed both "using the same snapshot" and "using different
> snapshots" at different points in time.

Yeah, I did change my mind between 2009-10 and 2010-02.  I was not aware
that I'm not allowed to change my mind ...
        regards, tom lane


Re: How to share the result data of separated plan

From
Robert Haas
Date:
On Mon, Nov 8, 2010 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You've now vetoed both "using the same snapshot" and "using different
>> snapshots" at different points in time.
>
> Yeah, I did change my mind between 2009-10 and 2010-02.  I was not aware
> that I'm not allowed to change my mind ...

I didn't mean to say that you aren't, just that it's hard to hit a
moving target.  As far as I was aware until yesterday, your 2009-10
opinion was still the most current one.  Your email also gave the
(perhaps incorrect) impression that you didn't remember the previous
conversations on this topic.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: How to share the result data of separated plan

From
Marko Tiikkaja
Date:
On 2010-11-08 8:30 PM +0200, Tom Lane wrote:
> Marko Tiikkaja<marko.tiikkaja@cs.helsinki.fi>  writes:
>> On 2010-11-08 7:26 PM +0200, Tom Lane wrote:
>>> The alternative is to artificially serialize the DML CTEs, which
>>> while it does have some advantages doesn't seem like a win overall.
>
>> We've discussed this before and the consensus was that as long as we
>> don't change the results, we can optimize the materialization away.
>
> No, because the problem is mainly about what might happen if
> user-defined functions choose to look at the target tables.  We can't
> really tell what triggers are going to do, to take one item that the
> planner has no access to.

The relevant thread seems to be this one:

http://archives.postgresql.org/pgsql-hackers/2009-11/msg00003.php

and I do agree with what you said there.


Regards,
Marko Tiikkaja