Thread: 9.5: UPDATE/DELETE .. ORDER BY .. LIMIT ..
Hello hackers, I know you're busy wrapping up the 9.4 release, so please ignore this patch. ♜
Attachment
On Sat, Feb 22, 2014 at 3:46 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote: > I know you're busy wrapping up the 9.4 release, so please ignore this patch. I think you should describe what the patch does, why you believe the feature is necessary, and perhaps how it compares to other, similar things. You have documentation changes here, but that doesn't really tell us why this is important. -- Peter Geoghegan
On Saturday, February 22, 2014 11:57:06 PM, Peter Geoghegan <pg@heroku.com> wrote: I think you should describe what the patch does, why you believe the feature is necessary, and perhaps how it compares to other, similar things. You have documentation changes here, but that doesn't really tell us why this is important. Sorry, I wanted to minimize the attention my message attracts. I mostly posted it to let people know I plan on working onthis for 9.5 to avoid duplicated effort. I will post more documentation and my reasons for wanting this feature in postgrelater, if that's all right. ♜
On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote: > Sorry, I wanted to minimize the attention my message attracts. I mostly posted it to let people know I plan on workingon this for 9.5 to avoid duplicated effort. I will post more documentation and my reasons for wanting this featurein postgre later, if that's all right. I've wanted this more than once. I suspect it's a pretty hard project, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, Robert. You wrote: RH> On Sat, Feb 22, 2014 at 7:02 PM, Rukh Meski <rukh.meski@yahoo.ca> wrote: >> Sorry, I wanted to minimize the attention my message attracts. I mostly posted it to let people know I plan on workingon this for 9.5 to avoid duplicated effort. I will post more documentation and my reasons for wanting this featurein postgre later, if that's all right. RH> I've wanted this more than once. I suspect it's a pretty hard project, though. +1 from me. This is the exciting functionality. There was even a poll in my blog year ago: http://pgolub.wordpress.com/2012/11/23/do-we-need-limit-clause-in-update-and-delete-statements-for-postgresql/ So the results were (for those who don't want check the post): Yes, for functionality: 194 (61.4%) No way! 78 (24.7%) Do not care 44 (13.9%) RH> -- RH> Robert Haas RH> EnterpriseDB: http://www.enterprisedb.com RH> The Enterprise PostgreSQL Company -- With best wishes,Pavel mailto:pavel@gf.microolap.com
Hi, Here's an updated patch. I had to push the LIMIT processing into ModifyTable to make the behaviour sane in parallel scenarios. As usual, please ignore if you're busy with 9.4. I will work on better docs and more tests from now on and ampreparing to make a solid case for adding this. ♜
Attachment
Oops. Of course shouldn't try and change how INSERT works. Latest version attached. ♜
Attachment
On Thu, Mar 13, 2014 at 3:49 AM, Rukh Meski <rukh.meski@yahoo.ca> wrote: > Oops. Of course shouldn't try and change how INSERT works. Latest version attached. I had given a brief look into this patch and found that the implementation for Update .. ORDER BY is not appropriate for inheritance tables. It just tries to sort for individual tables in inheritance hierarchy which can give wrong behaviour. As an example try below case: CREATE TABLE cities ( name text, population real, altitude int ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities); insert rows in both tables and then try to see the plan of below Update statement postgres=# explain update cities set population=150 where altitude<25 order by n ame limit 1; QUERY PLAN ------------------------------------------------------------------------Update on cities (cost=2.65..28.80 rows=396 width=47) -> Sort (cost=2.65..2.74 rows=39 width=42) Sort Key: cities.name -> Seq Scan on cities (cost=0.00..2.45rows=39 width=42) Filter: (altitude < 25) -> Sort (cost=25.16..26.05 rows=357 width=48) Sort Key: capitals.name -> Seq Scan on capitals (cost=0.00..23.38 rows=357 width=48) Filter:(altitude < 25)Planning time: 0.292 ms (10 rows) Here as it sorts for individual tables, the final result could be wrong. As far as I can trace from the previous discussion of this feature, you need to find the solution for below 2 key problems for UPDATE ... ORDER BY: 1. How will you sort the rows from different tables in inheritance hierarchy especially when they contain different columns as in above example. 2. How would ModifyTable knows which table row came from. Tom Lane has explained these problems in a very clear manner in his below mail and shared his opinion about this feature as well. http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us So I think if you want to implement this feature, then lets first try to find a solution for above problems. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > Tom Lane has explained these problems in a very clear manner > in his below mail and shared his opinion about this feature as > well. > http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us I don't have Tom's wonderfully articulate way of saying things, so I'll say it my way: If you want to do this, you already can already write a query that has the same effect. But supporting the syntax directly to execute a statement with an undefinable outcome is a pretty bad idea, and worse than that, there's a ton of useful things that we *do* want that would be a much higher priority for work than this. If you support Postgres, prioritise, please. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-05-11 10:33:10 +0200, Simon Riggs wrote: > On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Tom Lane has explained these problems in a very clear manner > > in his below mail and shared his opinion about this feature as > > well. > > http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us > > I don't have Tom's wonderfully articulate way of saying things, so > I'll say it my way: > > If you want to do this, you already can already write a query that has > the same effect. But supporting the syntax directly to execute a > statement with an undefinable outcome is a pretty bad idea, and worse > than that, there's a ton of useful things that we *do* want that would > be a much higher priority for work than this. If you support Postgres, > prioritise, please. I don't know. I'd find UPDATE/DELETE ORDER BY something rather useful. It's required to avoid deadlocks in many scenarios and it's not that obvious how to write the queries in a correct manner. LIMIT would be a nice bonus for queues, especially if we can get SKIP LOCKED. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, May 11, 2014 at 10:33 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> Tom Lane has explained these problems in a very clear manner >> in his below mail and shared his opinion about this feature as >> well. >> http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us > > I don't have Tom's wonderfully articulate way of saying things, so > I'll say it my way: > > If you want to do this, you already can already write a query that has > the same effect. But supporting the syntax directly to execute a > statement with an undefinable outcome is a pretty bad idea, and worse > than that, there's a ton of useful things that we *do* want that would > be a much higher priority for work than this. If you support Postgres, > prioritise, please. Yes, you can already achieve what this feature can do by other means, but this feature makes these cases 1) more efficient in terms of how much work has to be done 2) simpler and more elegant to write. But frankly I find it a bit insulting that I shouldn't work on this based on other people's priorities. This is a high priority item for me. I'll look at the problem reported by Amit and come back with a solution and my rationale for adding this feature. ♜
On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-05-11 10:33:10 +0200, Simon Riggs wrote: >> On 11 May 2014 07:37, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> > Tom Lane has explained these problems in a very clear manner >> > in his below mail and shared his opinion about this feature as >> > well. >> > http://www.postgresql.org/message-id/26819.1291133045@sss.pgh.pa.us >> >> I don't have Tom's wonderfully articulate way of saying things, so >> I'll say it my way: >> >> If you want to do this, you already can already write a query that has >> the same effect. But supporting the syntax directly to execute a >> statement with an undefinable outcome is a pretty bad idea, and worse >> than that, there's a ton of useful things that we *do* want that would >> be a much higher priority for work than this. If you support Postgres, >> prioritise, please. > > I don't know. I'd find UPDATE/DELETE ORDER BY something rather > useful. Perhaps if an index exists to provide an ordering that makes it clear what this means, then yes. Forcing Rukh to implement ORDER BY when an index doesn't exist sounds like useless work though, so if we were to accept that this statement gives an ERROR in the absence of such an index, it would make sense all round. > It's required to avoid deadlocks in many scenarios and it's not > that obvious how to write the queries in a correct manner. > LIMIT would be a nice bonus for queues, especially if we can get SKIP > LOCKED. With SKIP LOCKED it begins to have some use. Thanks for the nudge. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote: >> I don't know. I'd find UPDATE/DELETE ORDER BY something rather >> useful. > Perhaps if an index exists to provide an ordering that makes it clear > what this means, then yes. The $64 question is whether we'd accept an implementation that fails if the target table has children (ie, is partitioned). That seems to me to not be up to the project's usual quality expectations, but maybe if there's enough demand for a partial solution we should do so. It strikes me that a big part of the problem here is that the current support for this case assumes that the children don't all have the same rowtype. Which is important if you think of "child table" as meaning "subclass in the OO sense". But for ordinary partitioning cases it's just useless complexity, and ModifyTable isn't the only thing that suffers from that useless complexity. If we had a notion of "partitioned table" that involved a restriction that all the child tables have the exact same rowtype, we could implement UPDATE/DELETE in a much saner fashion --- just one plan tree, not one per child table --- and it would be possible to support UPDATE/DELETE ORDER BY LIMIT with no more work than for the single-table case. So that might shift the calculation as to whether we're willing to accept a partial implementation. Another idea is that the main reason we do things like this is the assumption that for UPDATE, ModifyTable receives complete new rows that only need to be pushed back into the table (and hence have to already match the rowtype of the specific child table). What if we got rid of that and had the incoming tuples just have the target row identifier (tableoid+TID) and the values for the updated columns? ModifyTable then would have to visit the old row (something it must do anyway, NB), pull out the values for the not-to-be-updated columns, form the final tuple and store it. It could implement this separately for each child table, with a different mapping of which columns receive the updates. This eliminates the whole multiple-plan-tree business at a stroke ... and TBH, it's not immediately obvious that this would not be as efficient or more so than the way we do UPDATEs today, even in the single-target-table case. Pumping all those not-updated column values through the plan tree isn't free. The more I think about it, the more promising this sounds --- though I confess to being badly undercaffeinated at the moment, so maybe there's some fatal problem I'm missing. regards, tom lane
On 2014-05-11 12:47:21 -0400, Tom Lane wrote: > Another idea is that the main reason we do things like this is the > assumption that for UPDATE, ModifyTable receives complete new rows > that only need to be pushed back into the table (and hence have > to already match the rowtype of the specific child table). What if > we got rid of that and had the incoming tuples just have the target > row identifier (tableoid+TID) and the values for the updated columns? > ModifyTable then would have to visit the old row (something it must > do anyway, NB), pull out the values for the not-to-be-updated columns, > form the final tuple and store it. It could implement this separately > for each child table, with a different mapping of which columns receive > the updates. This eliminates the whole multiple-plan-tree business > at a stroke ... and TBH, it's not immediately obvious that this would > not be as efficient or more so than the way we do UPDATEs today, even > in the single-target-table case. Pumping all those not-updated column > values through the plan tree isn't free. The more I think about it, > the more promising this sounds --- though I confess to being badly > undercaffeinated at the moment, so maybe there's some fatal problem > I'm missing. Yes, that sounds like a rather good plan. There's probably some fun keeping the executor state straight when switching more frequently than now and we'd probably need some (implicitly?) added type coercions? I also agree, while there probably are some cases where'd be slower, that the majority of cases will be faster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, May 11, 2014 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote: >>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather >>> useful. > >> Perhaps if an index exists to provide an ordering that makes it clear >> what this means, then yes. > > The $64 question is whether we'd accept an implementation that fails > if the target table has children (ie, is partitioned). That seems > to me to not be up to the project's usual quality expectations, but > maybe if there's enough demand for a partial solution we should do so. > > It strikes me that a big part of the problem here is that the current > support for this case assumes that the children don't all have the > same rowtype. Which is important if you think of "child table" as > meaning "subclass in the OO sense". But for ordinary partitioning > cases it's just useless complexity, and ModifyTable isn't the only > thing that suffers from that useless complexity. > > If we had a notion of "partitioned table" that involved a restriction > that all the child tables have the exact same rowtype, we could implement > UPDATE/DELETE in a much saner fashion --- just one plan tree, not one > per child table --- and it would be possible to support UPDATE/DELETE > ORDER BY LIMIT with no more work than for the single-table case. > So that might shift the calculation as to whether we're willing to > accept a partial implementation. I think there are many use cases where current inheritance mechanism is used for partitioning the table without adding new columns in child table, so if we could support UPDATE/DELETE .. ORDER BY for those cases, then it will be quite useful, but not sure if it is viable to see simpler implementation for this case along with keeping current logic. > Another idea is that the main reason we do things like this is the > assumption that for UPDATE, ModifyTable receives complete new rows > that only need to be pushed back into the table (and hence have > to already match the rowtype of the specific child table). What if > we got rid of that and had the incoming tuples just have the target > row identifier (tableoid+TID) and the values for the updated columns? > ModifyTable then would have to visit the old row (something it must > do anyway, NB), pull out the values for the not-to-be-updated columns, > form the final tuple and store it. It could implement this separately > for each child table, with a different mapping of which columns receive > the updates. How about sorting step, are you thinking to have MergeAppend node for it beneath ModifyTable? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > How about sorting step, are you thinking to have MergeAppend > node for it beneath ModifyTable? Well yeah, that's pretty much the point. regards, tom lane
On Sun, May 11, 2014 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The $64 question is whether we'd accept an implementation that fails > if the target table has children (ie, is partitioned). That seems > to me to not be up to the project's usual quality expectations, but > maybe if there's enough demand for a partial solution we should do so. > > It strikes me that a big part of the problem here is that the current > support for this case assumes that the children don't all have the > same rowtype. Which is important if you think of "child table" as > meaning "subclass in the OO sense". But for ordinary partitioning > cases it's just useless complexity, and ModifyTable isn't the only > thing that suffers from that useless complexity. > > If we had a notion of "partitioned table" that involved a restriction > that all the child tables have the exact same rowtype, we could implement > UPDATE/DELETE in a much saner fashion --- just one plan tree, not one > per child table --- and it would be possible to support UPDATE/DELETE > ORDER BY LIMIT with no more work than for the single-table case. > So that might shift the calculation as to whether we're willing to > accept a partial implementation. None of the use cases I have in mind would ever (have to) use this on a parent table; in the worst case it might make sense to do it on the child tables individually. Personally, I think that just refusing to operate on tables with children is a reasonable start. I have no interest in working on improving partitioning, but I don't think pushing this feature back in the hopes that someone else will would help anyone. But definitely only my two cents on this issue. ♜
On Tue, May 13, 2014 at 8:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> How about sorting step, are you thinking to have MergeAppend >> node for it beneath ModifyTable? > > Well yeah, that's pretty much the point. IIUC, the way new design will work is that for new tuple we will now get tableoid+TID, modified column values as an input (for inheritance tables we will get this for all child tables as well) for ModifyTable and get old tuple (which in current case will be provided by MergeAppend or in general by some scan node) from some node beneath the ModifyTable. It then matches the tableoid from old tuple with appropriate tableoid incase of child tables and then form the new tuple for that tableoid using old tuple and modified column values. In this case can we safely assume that we will always get tableoid from old tuple, ideally it should be there but just not sure and another minor point is won't we get TID from old tuple (tuple we get from node beneath ModifyTable), what's the need to pass for new tuple? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, May 11, 2014 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote: >>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather >>> useful. > >> Perhaps if an index exists to provide an ordering that makes it clear >> what this means, then yes. > > The $64 question is whether we'd accept an implementation that fails > if the target table has children (ie, is partitioned). I'd say "no". Partitioning is important, and we need to make it more seamless and better-integrated, not add new warts. > That seems > to me to not be up to the project's usual quality expectations, but > maybe if there's enough demand for a partial solution we should do so. I like this feature, but if I were searching for places where it makes sense to loosen our project's usual quality expectations, this isn't where I'd start. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 14, 2014 at 8:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, May 11, 2014 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:I'd say "no". Partitioning is important, and we need to make it more
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 May 2014 11:18, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't know. I'd find UPDATE/DELETE ORDER BY something rather
>>> useful.
>
>> Perhaps if an index exists to provide an ordering that makes it clear
>> what this means, then yes.
>
> The $64 question is whether we'd accept an implementation that fails
> if the target table has children (ie, is partitioned).
seamless and better-integrated, not add new warts.
I think the importance of partitioning argues the other way on this issue. Where I most wanted a LIMIT clause on DELETE is where I was moving tuples from one partition to a different one in a transactional way using bite-size chunks that wouldn't choke the live system with locks or with IO.
So the DELETE was always running against either a child by name, or against ONLY parent, not against the whole inheritance tree. Not being able to do this on single partitions makes partitioning harder, not easier.
Sure, I can select the nth smallest ctid and then "WITH T AS (DELETE FROM ONLY foo WHERE ctid < :that RETURNING *) INSERT INTO bar SELECT * from T", but how annoying.
> That seemsI like this feature, but if I were searching for places where it makes
> to me to not be up to the project's usual quality expectations, but
> maybe if there's enough demand for a partial solution we should do so.
sense to loosen our project's usual quality expectations, this isn't
where I'd start.
In this case I'd much rather have half a loaf rather than none at all. We wouldn't be adding warts to partitioning, but removing warts from the simpler case.
Cheers,
Jeff
On 05/13/2014 10:45 PM, Rukh Meski wrote: > On Sun, May 11, 2014 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The $64 question is whether we'd accept an implementation that fails >> if the target table has children (ie, is partitioned). That seems >> to me to not be up to the project's usual quality expectations, but >> maybe if there's enough demand for a partial solution we should do so. >> >> It strikes me that a big part of the problem here is that the current >> support for this case assumes that the children don't all have the >> same rowtype. Which is important if you think of "child table" as >> meaning "subclass in the OO sense". But for ordinary partitioning >> cases it's just useless complexity, and ModifyTable isn't the only >> thing that suffers from that useless complexity. >> >> If we had a notion of "partitioned table" that involved a restriction >> that all the child tables have the exact same rowtype, we could implement >> UPDATE/DELETE in a much saner fashion --- just one plan tree, not one >> per child table --- and it would be possible to support UPDATE/DELETE >> ORDER BY LIMIT with no more work than for the single-table case. >> So that might shift the calculation as to whether we're willing to >> accept a partial implementation. > > None of the use cases I have in mind would ever (have to) use this on > a parent table; in the worst case it might make sense to do it on the > child tables individually. Personally, I think that just refusing to > operate on tables with children is a reasonable start. I have no > interest in working on improving partitioning, but I don't think > pushing this feature back in the hopes that someone else will would > help anyone. IMHO this needs to work with inheritance if we are to accept it. It would be a rather strange limitation for no apparent reason, other than that we didn't bother to implement it. It doesn't seem very difficult in theory to add the table OID to the plan as a junk column, and use that in the ModifyTable node to know which table a row came from. In any case, the patch as it stands is clearly not acceptable, because it just produces wrong results with inheritance. I'm marking it as returned with feedback in the commitfest app. I'd suggest that you solve the inheritance problems and resubmit. Per the docs in the patch: > + <para> > + If the <literal>LIMIT</> (or <literal>FETCH FIRST</>) clause > + is present, processing will stop after the system has attempted > + to delete the specified amount of rows. In particular, if a row > + was concurrently changed not to match the given <literal>WHERE</> > + clause, it will count towards the <literal>LIMIT</> despite it > + not being actually deleted. Unlike in <literal>SELECT</>, the > + <literal>OFFSET</literal> clause is not available in > + <literal>DELETE</>. > + </para> That behavior with READ COMMITTED mode and concurrent changes is surprising. Do we really want it to behave like that, and if so, why? Why is OFFSET not supported? Not that I care much for that, but I'm curious. - Heikki
Hi Amit, On 5/14/14 6:41 AM, Amit Kapila wrote: > IIUC, the way new design will work is that for new tuple we will now > get tableoid+TID, modified column values as an input (for inheritance > tables we will get this for all child tables as well) for ModifyTable > and get old tuple (which in current case will be provided by MergeAppend > or in general by some scan node) from some node beneath the > ModifyTable. It then matches the tableoid from old tuple with appropriate > tableoid incase of child tables and then form the new tuple for that > tableoid using old tuple and modified column values. Having now read the discussion upthread a bit more carefully, I think one of us is confused. AIUI, what was suggested was that the plan nodes below the ModifyTable node would only give you back the modified columns, the tableoid and the TID of the tuple, and no "old values" at all. This might be a reasonable approach, but I haven't given it that much thought yet. > In this case can we safely assume that we will always get tableoid from > old tuple, ideally it should be there but just not sure It has to be there or otherwise the scheme won't work. Is there a specific case you're worried about? > and another minor > point is won't we get TID from old tuple (tuple we get from node beneath > ModifyTable), what's the need to pass for new tuple? I don't understand this part, could you rephrase? .marko
On 5/11/14 6:47 PM, Tom Lane wrote: > The $64 question is whether we'd accept an implementation that fails > if the target table has children (ie, is partitioned). That seems > to me to not be up to the project's usual quality expectations, but > maybe if there's enough demand for a partial solution we should do so. I think that partial support is better than no support unless there are concerns about forwards compatibility. I don't see such concerns having been expressed for this feature. .marko
On Tue, Jun 24, 2014 at 04:08 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > IMHO this needs to work with inheritance if we are to accept it. It would be > a rather strange limitation for no apparent reason, other than that we > didn't bother to implement it. It doesn't seem very difficult in theory to > add the table OID to the plan as a junk column, and use that in the > ModifyTable node to know which table a row came from. I can have a go at that, but I'm somewhat afraid of the performance implications this might have. And it's not just users of this feature that would pay the penalty, it would be everyone. > Per the docs in the patch: > >> + <para> >> + If the <literal>LIMIT</> (or <literal>FETCH FIRST</>) clause >> + is present, processing will stop after the system has attempted >> + to delete the specified amount of rows. In particular, if a row >> + was concurrently changed not to match the given <literal>WHERE</> >> + clause, it will count towards the <literal>LIMIT</> despite it >> + not being actually deleted. Unlike in <literal>SELECT</>, the >> + <literal>OFFSET</literal> clause is not available in >> + <literal>DELETE</>. >> + </para> > > > That behavior with READ COMMITTED mode and concurrent changes is surprising. > Do we really want it to behave like that, and if so, why? Oh, oops. Looks like I didn't submit the latest version of the patch for the commit fest, where I had fixed the documentation. It doesn't work that way anymore, as we really don't want it to work that way. > Why is OFFSET not supported? Not that I care much for that, but I'm curious. I thought it seemed weird. But it's supported for FOR UPDATE, so maybe we should support it here as well. ♜
On Wed, Jul 9, 2014 at 8:42 PM, Marko Tiikkaja <marko@joh.to> wrote:
>
> Hi Amit,
>
>
> On 5/14/14 6:41 AM, Amit Kapila wrote:
>>
>> IIUC, the way new design will work is that for new tuple we will now
>> get tableoid+TID, modified column values as an input (for inheritance
>> tables we will get this for all child tables as well) for ModifyTable
>> and get old tuple (which in current case will be provided by MergeAppend
>> or in general by some scan node) from some node beneath the
>> ModifyTable. It then matches the tableoid from old tuple with appropriate
>> tableoid incase of child tables and then form the new tuple for that
>> tableoid using old tuple and modified column values.
>
>
> Having now read the discussion upthread a bit more carefully, I think one of us is confused. AIUI, what was suggested was that the plan nodes below the ModifyTable node would only give you back the modified columns, the tableoid and the TID of the tuple, and no "old values" at all.
>
> Hi Amit,
>
>
> On 5/14/14 6:41 AM, Amit Kapila wrote:
>>
>> IIUC, the way new design will work is that for new tuple we will now
>> get tableoid+TID, modified column values as an input (for inheritance
>> tables we will get this for all child tables as well) for ModifyTable
>> and get old tuple (which in current case will be provided by MergeAppend
>> or in general by some scan node) from some node beneath the
>> ModifyTable. It then matches the tableoid from old tuple with appropriate
>> tableoid incase of child tables and then form the new tuple for that
>> tableoid using old tuple and modified column values.
>
>
> Having now read the discussion upthread a bit more carefully, I think one of us is confused. AIUI, what was suggested was that the plan nodes below the ModifyTable node would only give you back the modified columns, the tableoid and the TID of the tuple, and no "old values" at all.
Plan node below ModifyTable will be a scan node, it will give you old
tuple, whats the use of getting modified columns from it. We need
modified columns for new tuple which can be input for ModifyTuple.
>> In this case can we safely assume that we will always get tableoid from
>> old tuple, ideally it should be there but just not sure
>
>
> It has to be there or otherwise the scheme won't work. Is there a specific case you're worried about?
>
>
>> and another minor
>> point is won't we get TID from old tuple (tuple we get from node beneath
>> ModifyTable), what's the need to pass for new tuple?
>
>
> I don't understand this part, could you rephrase?
>> In this case can we safely assume that we will always get tableoid from
>> old tuple, ideally it should be there but just not sure
>
>
> It has to be there or otherwise the scheme won't work. Is there a specific case you're worried about?
>
>
>> and another minor
>> point is won't we get TID from old tuple (tuple we get from node beneath
>> ModifyTable), what's the need to pass for new tuple?
>
>
> I don't understand this part, could you rephrase?
Basically, I wanted to say that apart from modified columns, we just
need to pass table OID. If I am reading correctly, the same is
On 7/10/14 5:44 AM, Amit Kapila wrote: > Basically, I wanted to say that apart from modified columns, we just > need to pass table OID. If I am reading correctly, the same is > mentioned by Heikki as well. Yes, Heikki was talking about that approach. I was more interested in the significantly more invasive approach Tom and Andres talked about upthread, which your email was a response to. .marko
On 2014-06-24 11:08, Heikki Linnakangas wrote: > IMHO this needs to work with inheritance if we are to accept it. It > would be a rather strange limitation for no apparent reason, other than > that we didn't bother to implement it. It doesn't seem very difficult in > theory to add the table OID to the plan as a junk column, and use that > in the ModifyTable node to know which table a row came from. So I've been trying to look at this with Rukh, and it doesn't seem at all as easy as you make it out to be. Consider the following example: =# create table foo(a int); =# create table foo_c1(a int, b int) inherits (foo); =# create table foo_c2(a int, c text)inherits (foo); =# update foo set a = a+1; Currently, the way that works is that the ModifyTable node knows of three plans, all with different target lists, targeting each of the tables separately. To make $SUBJECT work with inheritance, we would somehow have to get all three different types of tuples through an Append node to the ModifyTable (with ctid and tableoid as junk columns). I can see how the approach Tom and Andres talkabout upthread could work, but that's a helluva lot of work just so we can check the inheritance check box for this feature, even though it's not clear anyone would actually want to use this on inheritance sets. Other approach I can think of would be to create some kind of a Frankenstein tuple which would satisfy the needs of all tables in the set, but that sounds really awful. Or we could just forget that we ever had inheritance and assume that partitioning is the only interesting use case. But it's not clear to me that we could assume the output to be the same in every case even then. If someone has a clear idea on how this could be implemented in a reasonable time, we'd be happy to hear about it. .marko