Re: How to share the result data of separated plan - Mailing list pgsql-hackers

From Tom Lane
Subject Re: How to share the result data of separated plan
Date
Msg-id 26443.1289237182@sss.pgh.pa.us
Whole thread Raw
In response to Re: How to share the result data of separated plan  (Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>)
Responses Re: How to share the result data of separated plan
Re: How to share the result data of separated plan
List pgsql-hackers
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2010-11-08 6:38 PM +0200, Tom Lane wrote:
>> My opinion is still the same as here:
>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00688.php
>> 
>> namely, that all we should be worrying about is a tuplestore full of
>> RETURNING tuples.  Any other side-effects of a DML subquery should
>> *not* be visible to the calling query, and therefore all this argument
>> about snapshots and seqscan limits is beside the point.

> What happened to:
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php ?

On the whole, I think that's overspecifying the behavior.  Compare what
happens if you do the same thing in one query now; that is, you have
say an UPDATE query and within that you invoke some function that looks
directly at the target table.  Is it going to see a consistent view of
the data?  No, it's going to see a partially updated table.

The argument I'm making at the moment is that a query containing DML
CTEs should all execute with the same snapshot, meaning that no part of
the query should see the effects of any other part.  If, within the
query, you execute volatile functions that look at the target tables,
they will see an unspecified subset of the updates as having been
applied already.  This is the same as what happens with a volatile
function in an updating query now, and we've not heard many complaints
about it.

The alternative is to artificially serialize the DML CTEs, which
while it does have some advantages doesn't seem like a win overall.
In particular, defining that as the behavior will greatly constrain
our abilities to optimize anything in future.  I think that a typical
use-case will be something like
WITH u AS (UPDATE data_table SET ... RETURNING ...)INSERT INTO log_table SELECT * FROM u;

If that UPDATE touches a lot of rows, users will be unhappy if the
RETURNING data builds up in memory rather than getting streamed directly
to log_table.  (I'm not saying this has to work that way on day zero,
but I am thinking we'd better be able to do it eventually.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hitoshi Harada
Date:
Subject: Re: How to share the result data of separated plan
Next
From: Tom Lane
Date:
Subject: Re: How to share the result data of separated plan