Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. - Mailing list pgsql-hackers

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>> useful.

> Perhaps if an index exists to provide an ordering that makes it clear
> what this means, then yes.

The $64 question is whether we'd accept an implementation that fails
if the target table has children (ie, is partitioned).  That seems
to me to not be up to the project's usual quality expectations, but
maybe if there's enough demand for a partial solution we should do so.

It strikes me that a big part of the problem here is that the current
support for this case assumes that the children don't all have the
same rowtype.  Which is important if you think of "child table" as
meaning "subclass in the OO sense".  But for ordinary partitioning
cases it's just useless complexity, and ModifyTable isn't the only
thing that suffers from that useless complexity.

If we had a notion of "partitioned table" that involved a restriction
that all the child tables have the exact same rowtype, we could implement
UPDATE/DELETE in a much saner fashion --- just one plan tree, not one
per child table --- and it would be possible to support UPDATE/DELETE
ORDER BY LIMIT with no more work than for the single-table case.
So that might shift the calculation as to whether we're willing to
accept a partial implementation.

Another idea is that the main reason we do things like this is the
assumption that for UPDATE, ModifyTable receives complete new rows
that only need to be pushed back into the table (and hence have
to already match the rowtype of the specific child table).  What if
we got rid of that and had the incoming tuples just have the target
row identifier (tableoid+TID) and the values for the updated columns?
ModifyTable then would have to visit the old row (something it must
do anyway, NB), pull out the values for the not-to-be-updated columns,
form the final tuple and store it.  It could implement this separately
for each child table, with a different mapping of which columns receive
the updates.  This eliminates the whole multiple-plan-tree business
at a stroke ... and TBH, it's not immediately obvious that this would
not be as efficient or more so than the way we do UPDATEs today, even
in the single-target-table case.  Pumping all those not-updated column
values through the plan tree isn't free.  The more I think about it,
the more promising this sounds --- though I confess to being badly
undercaffeinated at the moment, so maybe there's some fatal problem
I'm missing.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_class.relpages/allvisible probably shouldn't be a int4
Next
From: Tom Lane
Date:
Subject: Re: postgresql.auto.conf read from wrong directory