Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. |
Date | |
Msg-id | 1598.1399826841@sss.pgh.pa.us Whole thread Raw |
In response to | Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT .. |
List | pgsql-hackers |
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote: >> I don't know. I'd find UPDATE/DELETE ORDER BY something rather >> useful. > Perhaps if an index exists to provide an ordering that makes it clear > what this means, then yes. The $64 question is whether we'd accept an implementation that fails if the target table has children (ie, is partitioned). That seems to me to not be up to the project's usual quality expectations, but maybe if there's enough demand for a partial solution we should do so. It strikes me that a big part of the problem here is that the current support for this case assumes that the children don't all have the same rowtype. Which is important if you think of "child table" as meaning "subclass in the OO sense". But for ordinary partitioning cases it's just useless complexity, and ModifyTable isn't the only thing that suffers from that useless complexity. If we had a notion of "partitioned table" that involved a restriction that all the child tables have the exact same rowtype, we could implement UPDATE/DELETE in a much saner fashion --- just one plan tree, not one per child table --- and it would be possible to support UPDATE/DELETE ORDER BY LIMIT with no more work than for the single-table case. So that might shift the calculation as to whether we're willing to accept a partial implementation. Another idea is that the main reason we do things like this is the assumption that for UPDATE, ModifyTable receives complete new rows that only need to be pushed back into the table (and hence have to already match the rowtype of the specific child table). What if we got rid of that and had the incoming tuples just have the target row identifier (tableoid+TID) and the values for the updated columns? ModifyTable then would have to visit the old row (something it must do anyway, NB), pull out the values for the not-to-be-updated columns, form the final tuple and store it. It could implement this separately for each child table, with a different mapping of which columns receive the updates. This eliminates the whole multiple-plan-tree business at a stroke ... and TBH, it's not immediately obvious that this would not be as efficient or more so than the way we do UPDATEs today, even in the single-target-table case. Pumping all those not-updated column values through the plan tree isn't free. The more I think about it, the more promising this sounds --- though I confess to being badly undercaffeinated at the moment, so maybe there's some fatal problem I'm missing. regards, tom lane
pgsql-hackers by date: