Thread: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

22 February 2014, 23:49:48

Hello hackers,

I know you're busy wrapping up the 9.4 release, so please ignore this patch.



♜

Attachment

update_delete_order_by_limit_v0.diff

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Peter Geoghegan

Date:

22 February 2014, 23:57:00

On Sat, Feb 22, 2014 at 3:46 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote:
> I know you're busy wrapping up the 9.4 release, so please ignore this patch.

I think you should describe what the patch does, why you believe the
feature is necessary, and perhaps how it compares to other, similar
things. You have documentation changes here, but that doesn't really
tell us why this is important.


-- 
Peter Geoghegan

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

23 February 2014, 00:05:35

On Saturday, February 22, 2014 11:57:06 PM, Peter Geoghegan <pg@heroku.com> wrote:

I think you should describe what the patch does, why you believe the
feature is necessary, and perhaps how it compares to other, similar
things. You have documentation changes here, but that doesn't really
tell us why this is important.

Sorry, I wanted to minimize the attention my message attracts.  I mostly posted it to let people know I plan on working
onthis for 9.5 to avoid duplicated effort.  I will post more documentation and my reasons for wanting this feature in
postgrelater, if that's all right. 



♜

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Robert Haas

Date:

25 February 2014, 16:49:43

On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote:
> Sorry, I wanted to minimize the attention my message attracts.  I mostly posted it to let people know I plan on
workingon this for 9.5 to avoid duplicated effort.  I will post more documentation and my reasons for wanting this
featurein postgre later, if that's all right. 

I've wanted this more than once.  I suspect it's a pretty hard project, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Pavel Golub

Date:

26 February 2014, 11:20:14

Hello, Robert.

You wrote:

RH> On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote:
>> Sorry, I wanted to minimize the attention my message attracts.  I mostly posted it to let people know I plan on
workingon this for 9.5 to avoid duplicated effort.  I will post more documentation and my reasons for wanting this
featurein postgre later, if that's all right.
 

RH> I've wanted this more than once.  I suspect it's a pretty hard project, though.

+1 from me. This is the exciting functionality. There was even a poll
in my blog year ago:
http://pgolub.wordpress.com/2012/11/23/do-we-need-limit-clause-in-update-and-delete-statements-for-postgresql/

So the results were (for those who don't want check the post):
Yes, for functionality: 194 (61.4%)
No way!                 78 (24.7%)
Do not care             44 (13.9%)

RH> -- 
RH> Robert Haas
RH> EnterpriseDB: http://www.enterprisedb.com
RH> The Enterprise PostgreSQL Company





-- 
With best wishes,Pavel                          mailto:pavel@gf.microolap.com

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

12 March 2014, 21:15:34

Hi,

Here's an updated patch.  I had to push the LIMIT processing into ModifyTable to make the behaviour sane in parallel
scenarios. As usual, please ignore if you're busy with 9.4.  I will work on better docs and more tests from now on and
ampreparing to make a solid case for adding this. 



♜

Attachment

update_delete_order_by_limit_v1.diff

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

12 March 2014, 22:19:20

Oops.  Of course shouldn't try and change how INSERT works.  Latest version attached.



♜

Attachment

update_delete_order_by_limit_v2.diff

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Amit Kapila

Date:

11 May 2014, 05:38:02

On Thu, Mar 13, 2014 at 3:49 AM, Rukh Meski <rukh.meski@yahoo.ca> wrote:
> Oops.  Of course shouldn't try and change how INSERT works.  Latest version attached.

I had given a brief look into this patch and found that the
implementation for Update .. ORDER BY is not appropriate for
inheritance tables.

It just tries to sort for individual tables in inheritance hierarchy
which can give wrong behaviour.

As an example try below case:
CREATE TABLE cities ( name       text, population real, altitude   int
);

CREATE TABLE capitals ( state      char(2)
) INHERITS (cities);

insert rows in both tables and then try to see the plan of below
Update statement
postgres=# explain update cities set population=150 where altitude<25 order by n
ame limit 1;                              QUERY PLAN
------------------------------------------------------------------------Update on cities  (cost=2.65..28.80 rows=396
width=47) ->  Sort  (cost=2.65..2.74 rows=39 width=42)        Sort Key: cities.name        ->  Seq Scan on cities
(cost=0.00..2.45rows=39 width=42)              Filter: (altitude < 25)  ->  Sort  (cost=25.16..26.05 rows=357 width=48)
      Sort Key: capitals.name        ->  Seq Scan on capitals  (cost=0.00..23.38 rows=357 width=48)
Filter:(altitude < 25)Planning time: 0.292 ms
 
(10 rows)

Here as it sorts for individual tables, the final result could be wrong.

As far as I can trace from the previous discussion of this feature,
you need to find the solution for below 2 key problems for
UPDATE ... ORDER BY:

1. How will you sort the rows from different tables in inheritance
hierarchy especially when they contain different columns as in
above example.

2. How would ModifyTable knows which table row came from.

Tom Lane has explained these problems in a very clear manner
in his below mail and shared his opinion about this feature as
well.
http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us

So I think if you want to implement this feature, then lets first
try to find a solution for above problems.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Simon Riggs

Date:

11 May 2014, 08:33:24

On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

> Tom Lane has explained these problems in a very clear manner
> in his below mail and shared his opinion about this feature as
> well.
> http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us

I don't have Tom's wonderfully articulate way of saying things, so
I'll say it my way:

If you want to do this, you already can already write a query that has
the same effect. But supporting the syntax directly to execute a
statement with an undefinable outcome is a pretty bad idea, and worse
than that, there's a ton of useful things that we *do* want that would
be a much higher priority for work than this. If you support Postgres,
prioritise, please.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Andres Freund

Date:

11 May 2014, 09:18:47

On 2014-05-11 10:33:10 +0200, Simon Riggs wrote:
> On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> > Tom Lane has explained these problems in a very clear manner
> > in his below mail and shared his opinion about this feature as
> > well.
> > http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us
> 
> I don't have Tom's wonderfully articulate way of saying things, so
> I'll say it my way:
> 
> If you want to do this, you already can already write a query that has
> the same effect. But supporting the syntax directly to execute a
> statement with an undefinable outcome is a pretty bad idea, and worse
> than that, there's a ton of useful things that we *do* want that would
> be a much higher priority for work than this. If you support Postgres,
> prioritise, please.

I don't know. I'd find UPDATE/DELETE ORDER BY something rather
useful. It's required to avoid deadlocks in many scenarios and it's not
that obvious how to write the queries in a correct manner.
LIMIT would be a nice bonus for queues, especially if we can get SKIP
LOCKED.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

11 May 2014, 10:06:33

On Sun, May 11, 2014 at 10:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>> Tom Lane has explained these problems in a very clear manner
>> in his below mail and shared his opinion about this feature as
>> well.
>> http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us
>
> I don't have Tom's wonderfully articulate way of saying things, so
> I'll say it my way:
>
> If you want to do this, you already can already write a query that has
> the same effect. But supporting the syntax directly to execute a
> statement with an undefinable outcome is a pretty bad idea, and worse
> than that, there's a ton of useful things that we *do* want that would
> be a much higher priority for work than this. If you support Postgres,
> prioritise, please.

Yes, you can already achieve what this feature can do by other means,
but this feature makes these cases 1) more efficient in terms of how
much work has to be done 2) simpler and more elegant to write.  But
frankly I find it a bit insulting that I shouldn't work on this based
on other people's priorities.  This is a high priority item for me.

I'll look at the problem reported by Amit and come back with a
solution and my rationale for adding this feature.

♜

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Simon Riggs

Date:

11 May 2014, 10:08:06

On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-05-11 10:33:10 +0200, Simon Riggs wrote:
>> On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> > Tom Lane has explained these problems in a very clear manner
>> > in his below mail and shared his opinion about this feature as
>> > well.
>> > http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us
>>
>> I don't have Tom's wonderfully articulate way of saying things, so
>> I'll say it my way:
>>
>> If you want to do this, you already can already write a query that has
>> the same effect. But supporting the syntax directly to execute a
>> statement with an undefinable outcome is a pretty bad idea, and worse
>> than that, there's a ton of useful things that we *do* want that would
>> be a much higher priority for work than this. If you support Postgres,
>> prioritise, please.
>
> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
> useful.

Perhaps if an index exists to provide an ordering that makes it clear
what this means, then yes.

Forcing Rukh to implement ORDER BY when an index doesn't exist sounds
like useless work though, so if we were to accept that this statement
gives an ERROR in the absence of such an index, it would make sense
all round.

> It's required to avoid deadlocks in many scenarios and it's not
> that obvious how to write the queries in a correct manner.
> LIMIT would be a nice bonus for queues, especially if we can get SKIP
> LOCKED.

With SKIP LOCKED it begins to have some use. Thanks for the nudge.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Tom Lane

Date:

11 May 2014, 16:47:30

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>> useful.

> Perhaps if an index exists to provide an ordering that makes it clear
> what this means, then yes.

The $64 question is whether we'd accept an implementation that fails
if the target table has children (ie, is partitioned).  That seems
to me to not be up to the project's usual quality expectations, but
maybe if there's enough demand for a partial solution we should do so.

It strikes me that a big part of the problem here is that the current
support for this case assumes that the children don't all have the
same rowtype.  Which is important if you think of "child table" as
meaning "subclass in the OO sense".  But for ordinary partitioning
cases it's just useless complexity, and ModifyTable isn't the only
thing that suffers from that useless complexity.

If we had a notion of "partitioned table" that involved a restriction
that all the child tables have the exact same rowtype, we could implement
UPDATE/DELETE in a much saner fashion --- just one plan tree, not one
per child table --- and it would be possible to support UPDATE/DELETE
ORDER BY LIMIT with no more work than for the single-table case.
So that might shift the calculation as to whether we're willing to
accept a partial implementation.

Another idea is that the main reason we do things like this is the
assumption that for UPDATE, ModifyTable receives complete new rows
that only need to be pushed back into the table (and hence have
to already match the rowtype of the specific child table).  What if
we got rid of that and had the incoming tuples just have the target
row identifier (tableoid+TID) and the values for the updated columns?
ModifyTable then would have to visit the old row (something it must
do anyway, NB), pull out the values for the not-to-be-updated columns,
form the final tuple and store it.  It could implement this separately
for each child table, with a different mapping of which columns receive
the updates.  This eliminates the whole multiple-plan-tree business
at a stroke ... and TBH, it's not immediately obvious that this would
not be as efficient or more so than the way we do UPDATEs today, even
in the single-target-table case.  Pumping all those not-updated column
values through the plan tree isn't free.  The more I think about it,
the more promising this sounds --- though I confess to being badly
undercaffeinated at the moment, so maybe there's some fatal problem
I'm missing.
        regards, tom lane

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Andres Freund

Date:

11 May 2014, 21:42:51

On 2014-05-11 12:47:21 -0400, Tom Lane wrote:
> Another idea is that the main reason we do things like this is the
> assumption that for UPDATE, ModifyTable receives complete new rows
> that only need to be pushed back into the table (and hence have
> to already match the rowtype of the specific child table).  What if
> we got rid of that and had the incoming tuples just have the target
> row identifier (tableoid+TID) and the values for the updated columns?
> ModifyTable then would have to visit the old row (something it must
> do anyway, NB), pull out the values for the not-to-be-updated columns,
> form the final tuple and store it.  It could implement this separately
> for each child table, with a different mapping of which columns receive
> the updates.  This eliminates the whole multiple-plan-tree business
> at a stroke ... and TBH, it's not immediately obvious that this would
> not be as efficient or more so than the way we do UPDATEs today, even
> in the single-target-table case.  Pumping all those not-updated column
> values through the plan tree isn't free.  The more I think about it,
> the more promising this sounds --- though I confess to being badly
> undercaffeinated at the moment, so maybe there's some fatal problem
> I'm missing.

Yes, that sounds like a rather good plan. There's probably some fun
keeping the executor state straight when switching more frequently than
now and we'd probably need some (implicitly?) added type coercions? I
also agree, while there probably are some cases where'd be slower, that the
majority of cases will be faster.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Amit Kapila

Date:

13 May 2014, 06:03:15

On Sun, May 11, 2014 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>>> useful.
>
>> Perhaps if an index exists to provide an ordering that makes it clear
>> what this means, then yes.
>
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).  That seems
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.
>
> It strikes me that a big part of the problem here is that the current
> support for this case assumes that the children don't all have the
> same rowtype.  Which is important if you think of "child table" as
> meaning "subclass in the OO sense".  But for ordinary partitioning
> cases it's just useless complexity, and ModifyTable isn't the only
> thing that suffers from that useless complexity.
>
> If we had a notion of "partitioned table" that involved a restriction
> that all the child tables have the exact same rowtype, we could implement
> UPDATE/DELETE in a much saner fashion --- just one plan tree, not one
> per child table --- and it would be possible to support UPDATE/DELETE
> ORDER BY LIMIT with no more work than for the single-table case.
> So that might shift the calculation as to whether we're willing to
> accept a partial implementation.

I think there are many use cases where current inheritance mechanism
is used for partitioning the table without adding new columns in child
table, so if we could support UPDATE/DELETE .. ORDER BY for
those cases, then it will be quite useful, but not sure if it is viable to
see simpler implementation for this case along with keeping current logic.

> Another idea is that the main reason we do things like this is the
> assumption that for UPDATE, ModifyTable receives complete new rows
> that only need to be pushed back into the table (and hence have
> to already match the rowtype of the specific child table).  What if
> we got rid of that and had the incoming tuples just have the target
> row identifier (tableoid+TID) and the values for the updated columns?
> ModifyTable then would have to visit the old row (something it must
> do anyway, NB), pull out the values for the not-to-be-updated columns,
> form the final tuple and store it.  It could implement this separately
> for each child table, with a different mapping of which columns receive
> the updates.

How about sorting step, are you thinking to have MergeAppend
node for it beneath ModifyTable?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Tom Lane

Date:

13 May 2014, 14:35:03

Amit Kapila <amit.kapila16@gmail.com> writes:
> How about sorting step, are you thinking to have MergeAppend
> node for it beneath ModifyTable?

Well yeah, that's pretty much the point.
        regards, tom lane

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

13 May 2014, 19:45:29

On Sun, May 11, 2014 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).  That seems
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.
>
> It strikes me that a big part of the problem here is that the current
> support for this case assumes that the children don't all have the
> same rowtype.  Which is important if you think of "child table" as
> meaning "subclass in the OO sense".  But for ordinary partitioning
> cases it's just useless complexity, and ModifyTable isn't the only
> thing that suffers from that useless complexity.
>
> If we had a notion of "partitioned table" that involved a restriction
> that all the child tables have the exact same rowtype, we could implement
> UPDATE/DELETE in a much saner fashion --- just one plan tree, not one
> per child table --- and it would be possible to support UPDATE/DELETE
> ORDER BY LIMIT with no more work than for the single-table case.
> So that might shift the calculation as to whether we're willing to
> accept a partial implementation.

None of the use cases I have in mind would ever (have to) use this on
a parent table; in the worst case it might make sense to do it on the
child tables individually.  Personally, I think that just refusing to
operate on tables with children is a reasonable start.  I have no
interest in working on improving partitioning, but I don't think
pushing this feature back in the hopes that someone else will would
help anyone.

But definitely only my two cents on this issue.


♜

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Amit Kapila

Date:

14 May 2014, 04:41:14

On Tue, May 13, 2014 at 8:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> How about sorting step, are you thinking to have MergeAppend
>> node for it beneath ModifyTable?
>
> Well yeah, that's pretty much the point.

IIUC, the way new design will work is that for new tuple we will now
get tableoid+TID, modified column values as an input (for inheritance
tables we will get this for all child tables as well) for ModifyTable
and get old tuple (which in current case will be provided by MergeAppend
or in general by some scan node) from some node beneath the
ModifyTable.  It then matches the tableoid from old tuple with appropriate
tableoid incase of child tables and then form the new tuple for that
tableoid using old tuple and modified column values.
In this case can we safely assume that we will always get tableoid from
old tuple, ideally it should be there but just not sure and another minor
point is won't we get TID from old tuple (tuple we get from node beneath
ModifyTable), what's the need to pass for new tuple?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Robert Haas

Date:

14 May 2014, 15:26:54

On Sun, May 11, 2014 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>>> useful.
>
>> Perhaps if an index exists to provide an ordering that makes it clear
>> what this means, then yes.
>
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).

I'd say "no".  Partitioning is important, and we need to make it more
seamless and better-integrated, not add new warts.

> That seems
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.

I like this feature, but if I were searching for places where it makes
sense to loosen our project's usual quality expectations, this isn't
where I'd start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Jeff Janes

Date:

15 May 2014, 21:45:45

On Wed, May 14, 2014 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, May 11, 2014 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>>> useful.
>
>> Perhaps if an index exists to provide an ordering that makes it clear
>> what this means, then yes.
>
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).

I'd say "no". Partitioning is important, and we need to make it more
seamless and better-integrated, not add new warts.

I think the importance of partitioning argues the other way on this issue. Where I most wanted a LIMIT clause on DELETE is where I was moving tuples from one partition to a different one in a transactional way using bite-size chunks that wouldn't choke the live system with locks or with IO.

So the DELETE was always running against either a child by name, or against ONLY parent, not against the whole inheritance tree. Not being able to do this on single partitions makes partitioning harder, not easier.

Sure, I can select the nth smallest ctid and then "WITH T AS (DELETE FROM ONLY foo WHERE ctid < :that RETURNING *) INSERT INTO bar SELECT * from T", but how annoying.

> That seems
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.

I like this feature, but if I were searching for places where it makes
sense to loosen our project's usual quality expectations, this isn't
where I'd start.

In this case I'd much rather have half a loaf rather than none at all. We wouldn't be adding warts to partitioning, but removing warts from the simpler case.

Cheers,

Jeff

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Heikki Linnakangas

Date:

24 June 2014, 09:09:25

On 05/13/2014 10:45 PM, Rukh Meski wrote:
> On Sun, May 11, 2014 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The $64 question is whether we'd accept an implementation that fails
>> if the target table has children (ie, is partitioned).  That seems
>> to me to not be up to the project's usual quality expectations, but
>> maybe if there's enough demand for a partial solution we should do so.
>>
>> It strikes me that a big part of the problem here is that the current
>> support for this case assumes that the children don't all have the
>> same rowtype.  Which is important if you think of "child table" as
>> meaning "subclass in the OO sense".  But for ordinary partitioning
>> cases it's just useless complexity, and ModifyTable isn't the only
>> thing that suffers from that useless complexity.
>>
>> If we had a notion of "partitioned table" that involved a restriction
>> that all the child tables have the exact same rowtype, we could implement
>> UPDATE/DELETE in a much saner fashion --- just one plan tree, not one
>> per child table --- and it would be possible to support UPDATE/DELETE
>> ORDER BY LIMIT with no more work than for the single-table case.
>> So that might shift the calculation as to whether we're willing to
>> accept a partial implementation.
>
> None of the use cases I have in mind would ever (have to) use this on
> a parent table; in the worst case it might make sense to do it on the
> child tables individually.  Personally, I think that just refusing to
> operate on tables with children is a reasonable start.  I have no
> interest in working on improving partitioning, but I don't think
> pushing this feature back in the hopes that someone else will would
> help anyone.

IMHO this needs to work with inheritance if we are to accept it. It 
would be a rather strange limitation for no apparent reason, other than 
that we didn't bother to implement it. It doesn't seem very difficult in 
theory to add the table OID to the plan as a junk column, and use that 
in the ModifyTable node to know which table a row came from.

In any case, the patch as it stands is clearly not acceptable, because 
it just produces wrong results with inheritance. I'm marking it as 
returned with feedback in the commitfest app. I'd suggest that you solve 
the inheritance problems and resubmit.

Per the docs in the patch:

> +  <para>
> +   If the <literal>LIMIT</> (or <literal>FETCH FIRST</>) clause
> +   is present, processing will stop after the system has attempted
> +   to delete the specified amount of rows.  In particular, if a row
> +   was concurrently changed not to match the given <literal>WHERE</>
> +   clause, it will count towards the <literal>LIMIT</> despite it
> +   not being actually deleted.  Unlike in <literal>SELECT</>, the
> +   <literal>OFFSET</literal> clause is not available in
> +   <literal>DELETE</>.
> +  </para>

That behavior with READ COMMITTED mode and concurrent changes is 
surprising. Do we really want it to behave like that, and if so, why?

Why is OFFSET not supported? Not that I care much for that, but I'm curious.

- Heikki

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Marko Tiikkaja

Date:

09 July 2014, 15:12:46

Hi Amit,

On 5/14/14 6:41 AM, Amit Kapila wrote:
> IIUC, the way new design will work is that for new tuple we will now
> get tableoid+TID, modified column values as an input (for inheritance
> tables we will get this for all child tables as well) for ModifyTable
> and get old tuple (which in current case will be provided by MergeAppend
> or in general by some scan node) from some node beneath the
> ModifyTable.  It then matches the tableoid from old tuple with appropriate
> tableoid incase of child tables and then form the new tuple for that
> tableoid using old tuple and modified column values.

Having now read the discussion upthread a bit more carefully, I think 
one of us is confused.  AIUI, what was suggested was that the plan nodes 
below the ModifyTable node would only give you back the modified 
columns, the tableoid and the TID of the tuple, and no "old values" at 
all.  This might be a reasonable approach, but I haven't given it that 
much thought yet.

> In this case can we safely assume that we will always get tableoid from
> old tuple, ideally it should be there but just not sure

It has to be there or otherwise the scheme won't work.  Is there a 
specific case you're worried about?

> and another minor
> point is won't we get TID from old tuple (tuple we get from node beneath
> ModifyTable), what's the need to pass for new tuple?

I don't understand this part, could you rephrase?

.marko

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Marko Tiikkaja

Date:

09 July 2014, 15:15:30

On 5/11/14 6:47 PM, Tom Lane wrote:
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).  That seems
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.

I think that partial support is better than no support unless there are 
concerns about forwards compatibility.  I don't see such concerns having 
been expressed for this feature.

.marko

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Rukh Meski

Date:

09 July 2014, 18:36:26

On Tue, Jun 24, 2014 at 04:08 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> IMHO this needs to work with inheritance if we are to accept it. It would be
> a rather strange limitation for no apparent reason, other than that we
> didn't bother to implement it. It doesn't seem very difficult in theory to
> add the table OID to the plan as a junk column, and use that in the
> ModifyTable node to know which table a row came from.

I can have a go at that, but I'm somewhat afraid of the performance
implications this might have.  And it's not just users of this feature
that would pay the penalty, it would be everyone.

> Per the docs in the patch:
>
>> +  <para>
>> +   If the <literal>LIMIT</> (or <literal>FETCH FIRST</>) clause
>> +   is present, processing will stop after the system has attempted
>> +   to delete the specified amount of rows.  In particular, if a row
>> +   was concurrently changed not to match the given <literal>WHERE</>
>> +   clause, it will count towards the <literal>LIMIT</> despite it
>> +   not being actually deleted.  Unlike in <literal>SELECT</>, the
>> +   <literal>OFFSET</literal> clause is not available in
>> +   <literal>DELETE</>.
>> +  </para>
>
>
> That behavior with READ COMMITTED mode and concurrent changes is surprising.
> Do we really want it to behave like that, and if so, why?

Oh, oops.  Looks like I didn't submit the latest version of the patch
for the commit fest, where I had fixed the documentation.  It doesn't
work that way anymore, as we really don't want it to work that way.

> Why is OFFSET not supported? Not that I care much for that, but I'm curious.

I thought it seemed weird.  But it's supported for FOR UPDATE, so
maybe we should support it here as well.


♜

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Amit Kapila

Date:

10 July 2014, 03:44:51

On Wed, Jul 9, 2014 at 8:42 PM, Marko Tiikkaja <marko@joh.to> wrote:
>
> Hi Amit,
>
>
> On 5/14/14 6:41 AM, Amit Kapila wrote:
>>
>> IIUC, the way new design will work is that for new tuple we will now
>> get tableoid+TID, modified column values as an input (for inheritance
>> tables we will get this for all child tables as well) for ModifyTable
>> and get old tuple (which in current case will be provided by MergeAppend
>> or in general by some scan node) from some node beneath the
>> ModifyTable. It then matches the tableoid from old tuple with appropriate
>> tableoid incase of child tables and then form the new tuple for that
>> tableoid using old tuple and modified column values.
>
>
> Having now read the discussion upthread a bit more carefully, I think one of us is confused. AIUI, what was suggested was that the plan nodes below the ModifyTable node would only give you back the modified columns, the tableoid and the TID of the tuple, and no "old values" at all.

Plan node below ModifyTable will be a scan node, it will give you old

tuple, whats the use of getting modified columns from it. We need

modified columns for new tuple which can be input for ModifyTuple.

>> In this case can we safely assume that we will always get tableoid from
>> old tuple, ideally it should be there but just not sure
>
>
> It has to be there or otherwise the scheme won't work. Is there a specific case you're worried about?
>
>
>> and another minor
>> point is won't we get TID from old tuple (tuple we get from node beneath
>> ModifyTable), what's the need to pass for new tuple?
>
>
> I don't understand this part, could you rephrase?

Basically, I wanted to say that apart from modified columns, we just

need to pass table OID. If I am reading correctly, the same is

mentioned by Heikki as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Marko Tiikkaja

Date:

10 July 2014, 09:17:48

On 7/10/14 5:44 AM, Amit Kapila wrote:
> Basically, I wanted to say that apart from modified columns, we just
> need to pass table OID.  If I am reading correctly, the same is
> mentioned by Heikki as well.

Yes, Heikki was talking about that approach.  I was more interested in 
the significantly more invasive approach Tom and Andres talked about 
upthread, which your email was a response to.

.marko

Re: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..

From

Marko Tiikkaja

Date:

20 July 2014, 23:15:24

On 2014-06-24 11:08, Heikki Linnakangas wrote:
> IMHO this needs to work with inheritance if we are to accept it. It
> would be a rather strange limitation for no apparent reason, other than
> that we didn't bother to implement it. It doesn't seem very difficult in
> theory to add the table OID to the plan as a junk column, and use that
> in the ModifyTable node to know which table a row came from.

So I've been trying to look at this with Rukh, and it doesn't seem at 
all as easy as you make it out to be.  Consider the following example:
  =# create table foo(a int);  =# create table foo_c1(a int, b int) inherits (foo);  =# create table foo_c2(a int, c
text)inherits (foo);

  =# update foo set a = a+1;

Currently, the way that works is that the ModifyTable node knows of 
three plans, all with different target lists, targeting each of the 
tables separately.  To make $SUBJECT work with inheritance, we would 
somehow have to get all three different types of tuples through an 
Append node to the ModifyTable (with ctid and tableoid as junk columns).  I can see how the approach Tom and Andres
talkabout upthread could 

work, but that's a helluva lot of work just so we can check the 
inheritance check box for this feature, even though it's not clear 
anyone would actually want to use this on inheritance sets.  Other 
approach I can think of would be to create some kind of a Frankenstein 
tuple which would satisfy the needs of all tables in the set, but that 
sounds really awful.

Or we could just forget that we ever had inheritance and assume that 
partitioning is the only interesting use case.  But it's not clear to me 
that we could assume the output to be the same in every case even then.

If someone has a clear idea on how this could be implemented in a 
reasonable time, we'd be happy to hear about it.

.marko