Thread: DELETE with LIMIT (or my first hack)
Hi,<br /><br />frequently i have accidents with DELETE/UPDATE commands. In fact, sometimes in the last 8 or 9 years (ok,a lot of times) I forget the entire WHERE clause or have a “not so perfectly“ WHERE clause, with an awful suprise. There’sno words to figure the horror ever time i see that the number of affected rows its not 1 or two how expected, butthe entire table. So I planned to make a hack to make the “LIMIT” directive available to “DELETE” command.<br /><br />So,can anyone help-me in how to do this ? This its my plan: 1) change the lex grammar (wheres the file ?) 2) change theparser to accept the new grammar 3) change the executor to stop after “n” successful iterations. Is this correct ?<br/><br />Greets,<br />--<br /><br />Daniel Loureiro<br />------------------------------<br /><a href="http://diffcoder.blogspot.com/">http://diffcoder.blogspot.com/</a><br/>
On Mon, Nov 29, 2010 at 9:08 PM, Daniel Loureiro <loureirorg@gmail.com> wrote: > > 3) change the executor to stop after “n” successful iterations. Is > this correct ? > no. it means you will delete the n first tuples that happen to be found, if you don't have a WHERE clause that means is very possible you delete something you don't want to... the correct solution is to use always try DELETE's inside transactions and only if you see the right thing happening issue a COMMIT besides i think this has been proposed and rejected before -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
good point. But when you use a LIMIT in a SELECT statement you WANT n RANDOM tuples - its wrong to get RANDOM tuples ? So,in the same logic, its wrong to exclude n random tuples ? Besides, if you want DELETE just 1 tuple, why the executor haveto scan the entire table, and not just stoping after find the 1 tuple ? Why the LIMIT clause should be used to speeduponly SELECT statements ? if the programmer know the expected number of affected rows why not use it to speed up DELETE/UPDATE?<br /><br />cheers,<br />--<br clear="all" />Daniel Loureiro<br /><a href="http://diffcoder.blogspot.com/">http://diffcoder.blogspot.com/</a><br/><br /><div class="gmail_quote">2010/11/30 JaimeCasanova <span dir="ltr"><<a href="mailto:jaime@2ndquadrant.com">jaime@2ndquadrant.com</a>></span><br /><blockquoteclass="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left:1ex;"><div class="im">On Mon, Nov 29, 2010 at 9:08 PM, Daniel Loureiro <<a href="mailto:loureirorg@gmail.com">loureirorg@gmail.com</a>>wrote:<br /> ><br /> > 3) change the executor to stopafter “n” successful iterations. Is<br /> > this correct ?<br /> ><br /><br /></div>no. it means you will deletethe n first tuples that happen to be<br /> found, if you don't have a WHERE clause that means is very possible<br />you delete something you don't want to... the correct solution is to<br /> use always try DELETE's inside transactionsand only if you see the<br /> right thing happening issue a COMMIT<br /><br /> besides i think this has beenproposed and rejected before<br /><font color="#888888"><br /> --<br /> Jaime Casanova <a href="http://www.2ndQuadrant.com"target="_blank">www.2ndQuadrant.com</a><br /> Professional PostgreSQL: Soporte y capacitaciónde PostgreSQL<br /></font></blockquote></div><br />
On Mon, Nov 29, 2010 at 9:55 PM, Daniel Loureiro <loureirorg@gmail.com> wrote: > good point. But when you use a LIMIT in a SELECT statement you WANT n RANDOM > tuples no. at least IMHO the only sensible way that LIMIT is usefull is with an ORDER BY clause with make the results very well defined... -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL
On Mon, Nov 29, 2010 at 9:08 PM, Daniel Loureiro <loureirorg@gmail.com> wrote: > frequently i have accidents with DELETE/UPDATE commands. In fact, sometimes > in the last 8 or 9 years (ok, a lot of times) I forget the entire WHERE > clause or have a “not so perfectly“ WHERE clause, with an awful suprise. > There’s no words to figure the horror ever time i see that the number of > affected rows its not 1 or two how expected, but the entire table. So I > planned to make a hack to make the “LIMIT” directive available to “DELETE” > command. > > So, can anyone help-me in how to do this ? This its my plan: 1) change the > lex grammar (wheres the file ?) 2) change the parser to accept the new > grammar 3) change the executor to stop after “n” successful iterations. Is > this correct ? I don't think your use case sounds very compelling - as Jaime says, you could still easily blow away data that you have no easy way to get back - but I agree that DELETE (or UPDATE) is useful in combination with LIMIT. For example, suppose you want to roll your own replication solution for a table with no primary key. So you set up some triggers. Whenever you see an INSERT on the source table, you do a matching INSERT on the target table. When you see a DELETE on the source table, you do a DELETE on the target table that constrains all the columns to be equal and also includes LIMIT 1. Similarly for UPDATE. Then, your boss gives you a big raise and commends you for your awesome programming skills. Woot! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 29, 2010 at 10:09 PM, Jaime Casanova <jaime@2ndquadrant.com> wrote: > On Mon, Nov 29, 2010 at 9:55 PM, Daniel Loureiro <loureirorg@gmail.com> wrote: >> good point. But when you use a LIMIT in a SELECT statement you WANT n RANDOM >> tuples > > no. at least IMHO the only sensible way that LIMIT is usefull is with > an ORDER BY clause with make the results very well defined... That's not 100% true - it can sometimes be very useful when digging through a database to grab 50 rows from a table just to get a feel for what kind of stuff in there. Maybe it's stupid, but I find it handy. But even granting the premise, that's an argument for making DELETE support both ORDER BY and LIMIT, not for supporting neither of them. For example, suppose we're trying to govern an ancient Greek democracy: http://en.wikipedia.org/wiki/Ostracism DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1; I think the executor already pretty much knows how to do this. The planner might need some fiddling to hand over the correct instructions, not sure. But this might not even be super hard, though Daniel might want to pick something a little less ambitious for his very first project, because debugging planner and executor problems is not so easy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<br /><br /> On 11/29/2010 10:19 PM, Robert Haas wrote: <blockquote cite="mid:AANLkTinsts6=TQD29J2WhQK37k58_i4jNVGsYn2c7xk5@mail.gmail.com"type="cite"><br /><pre wrap="">For example, supposewe're trying to govern an ancient Greek democracy: <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Ostracism">http://en.wikipedia.org/wiki/Ostracism</a> DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1; </pre></blockquote><br /> I'm not sure this is a very good example. Assuming there isn't a tie, I'd do it like this:<br /><br/><blockquote>DELETE FROM residents_of_athens <br /> WHERE ostracism_votes >= 6000 <br /> and ostracism_votes= <br /> (SELECT max(ostracism_votes) <br /> FROM residents_of_athens);<br /></blockquote><br /><prewrap="">I can't say I'd be excited by this feature. In quite a few years of writing SQL I don't recall ever wantingsuch a gadget. cheers andrew </pre><br />
On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 11/29/2010 10:19 PM, Robert Haas wrote: > > For example, suppose we're trying to govern an ancient Greek > democracy: > > http://en.wikipedia.org/wiki/Ostracism > > DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1; > > I'm not sure this is a very good example. Assuming there isn't a tie, I'd do > it like this: > > DELETE FROM residents_of_athens > WHERE ostracism_votes >= 6000 > and ostracism_votes = > (SELECT max(ostracism_votes) > FROM residents_of_athens); That might be a lot less efficient, though, and sometimes it's not OK to delete more than one record. Imagine, for example, wanting to dequeue the work item with the highest priority. Sure, you can use SELECT ... LIMIT to identify one and then DELETE it by some other key, but DELETE .. ORDER BY .. RETURNING .. LIMIT would be cool, and would let you do it with just one scan. > I can't say I'd be excited by this feature. In quite a few years of writing > SQL I don't recall ever wanting such a gadget. It's something I've wanted periodically, though not badly enough to do the work to make it happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Nov 30, 2010 at 05:09, Jaime Casanova <jaime@2ndquadrant.com> wrote: > at least IMHO the only sensible way that LIMIT is usefull is with > an ORDER BY clause with make the results very well defined... DELETE with LIMIT is also useful for deleting things in batches, so you can do large deletes on a live system without starving other users from I/O. In this case deletion order doesn't matter (it's more efficient to delete rows in physical table order) -- ORDER BY isn't necessary. Regards, Marti
Hi all, The workaround recommended some time ago by Tom is: DELETE FROM residents_of_athens WHERE ctid = any(array(SELECT ctid FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1)); It is about as efficient as the requested feature would be, just uglier to write down. I use it all the time when batch-deleting something large (to avoid long running transactions and to not crash slony). It also helps to vacuum frequently if you do that on large amount of data... Cheers, Csaba. On Tue, 2010-11-30 at 00:05 -0500, Robert Haas wrote: > On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > > > > > On 11/29/2010 10:19 PM, Robert Haas wrote: > > > > For example, suppose we're trying to govern an ancient Greek > > democracy: > > > > http://en.wikipedia.org/wiki/Ostracism > > > > DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1; > > > > I'm not sure this is a very good example. Assuming there isn't a tie, I'd do > > it like this: > > > > DELETE FROM residents_of_athens > > WHERE ostracism_votes >= 6000 > > and ostracism_votes = > > (SELECT max(ostracism_votes) > > FROM residents_of_athens); > > That might be a lot less efficient, though, and sometimes it's not OK > to delete more than one record. Imagine, for example, wanting to > dequeue the work item with the highest priority. Sure, you can use > SELECT ... LIMIT to identify one and then DELETE it by some other key, > but DELETE .. ORDER BY .. RETURNING .. LIMIT would be cool, and would > let you do it with just one scan. > > > I can't say I'd be excited by this feature. In quite a few years of writing > > SQL I don't recall ever wanting such a gadget. > > It's something I've wanted periodically, though not badly enough to do > the work to make it happen. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
On Mon, Nov 29, 2010 at 10:50 PM, Marti Raudsepp <marti@juffo.org> wrote: > On Tue, Nov 30, 2010 at 05:09, Jaime Casanova <jaime@2ndquadrant.com> wrote: >> at least IMHO the only sensible way that LIMIT is usefull is with >> an ORDER BY clause with make the results very well defined... > > DELETE with LIMIT is also useful for deleting things in batches, so > you can do large deletes on a live system without starving other users > from I/O. In this case deletion order doesn't matter (it's more > efficient to delete rows in physical table order) -- ORDER BY isn't > necessary. > > Regards, > Marti > ++ I have a lot of DELETE with LIMIT in my (mysql) environment for this reason. -- Rob Wultsch wultsch@gmail.com
On Tue, Nov 30, 2010 at 4:25 AM, Csaba Nagy <ncslists@googlemail.com> wrote: > The workaround recommended some time ago by Tom is: > > DELETE FROM residents_of_athens WHERE ctid = any(array(SELECT ctid FROM > residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1)); > > It is about as efficient as the requested feature would be, just uglier > to write down. I use it all the time when batch-deleting something large > (to avoid long running transactions and to not crash slony). It also > helps to vacuum frequently if you do that on large amount of data... That's a very elegant hack, but not exactly obvious to a novice user or, say, me. So I think it'd be nicer to have the obvious syntax work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> I can't say I'd be excited by this feature. In quite a few years of writing >> SQL I don't recall ever wanting such a gadget. > It's something I've wanted periodically, though not badly enough to do > the work to make it happen. It would certainly look like nothing but a crude hack if the feature is only available for DELETE and not UPDATE. Unfortunately, the UPDATE case would be an order of magnitude harder (think inheritance trees where the children aren't all alike). regards, tom lane
<br /><br /> On 11/30/2010 09:57 AM, Csaba Nagy wrote: <blockquote cite="mid:1291129033.30816.34.camel@pcd12478" type="cite"><br/><pre wrap=""> So it is really an ideological thing and not lack of demand or implementation attempts... I for myself can't write working C code anyway, so I got my peace with the workaround - I wish you good luck arguing Tom :-) </pre></blockquote><br /> We need a convincing use case for it. So far the only one that's seemed at all convincing to meis the one about deleting in batches. But that might be enough.<br /><br /> As for it being illogical, I don't think it'sany more so than<br /><blockquote>DELETE FROM foo WHERE random() < 0.1;<br /></blockquote> and you can do that today.<br/><br /> cheers<br /><br /> andrew<br /><br />
On Tue, Nov 30, 2010 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >>> I can't say I'd be excited by this feature. In quite a few years of writing >>> SQL I don't recall ever wanting such a gadget. > >> It's something I've wanted periodically, though not badly enough to do >> the work to make it happen. > > It would certainly look like nothing but a crude hack if the feature is > only available for DELETE and not UPDATE. I'm not sure this is true, given Andrew's comment that the bulk deletion argument is the only one he finds compelling, but I'd surely be in favor of supporting both. > Unfortunately, the UPDATE > case would be an order of magnitude harder (think inheritance trees > where the children aren't all alike). I don't understand why there's anything more to this than sticking a Limit node either immediately above or immediately below the ModifyTable node. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Nov 30, 2010 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Unfortunately, the UPDATE >> case would be an order of magnitude harder (think inheritance trees >> where the children aren't all alike). > I don't understand why there's anything more to this than sticking a > Limit node either immediately above or immediately below the > ModifyTable node. 1. You need to support ORDER BY too, otherwise I *will* be on the warpath against this as a foot-gun with no redeeming social value. 2. So what you need is Sort underneath Limit underneath ModifyTable. Putting them above it would be quite the wrong semantics. 3. This doesn't work tremendously well for inheritance trees, where ModifyTable acts as sort of an implicit Append node. You can't just funnel all the tuples through one Sort or Limit node because they aren't all the same rowtype. (Limit might perhaps not care, but Sort will.) But you can't have a separate Sort/Limit for each table either, because that would give the wrong behavior. Another problem with funneling all the rows through one Sort/Limit is that ModifyTable did need to know which table each row came from, so it can apply the modify to the right table. I don't offhand see a solution other than integrating the responsibility for limit-counting and sorting into ModifyTable itself, making it into an unholy union of ModifyTable+Limit+MergeAppend (with the individual inputs required to deliver sorted outputs separately). That's sufficiently ugly, and probably bad for performance in the normal case, that I don't think it's going to be acceptable for such a marginal feature. Or I guess you could try to persuade us that DELETE/UPDATE with ORDER BY or LIMIT doesn't need to support inherited target tables. I wouldn't bet on that proposal flying either. regards, tom lane
Hi Robert, On Tue, 2010-11-30 at 09:19 -0500, Robert Haas wrote: > That's a very elegant hack, but not exactly obvious to a novice user > or, say, me. So I think it'd be nicer to have the obvious syntax > work. I fully agree - but you first have to convince core hackers that this is not just a foot-gun. This was discussed many times in the past, patches were also offered (perhaps not complete one, but proving that there is an itch getting scratched): http://archives.postgresql.org/pgsql-patches/2002-09/msg00255.php The reaction: http://archives.postgresql.org/pgsql-patches/2002-09/msg00256.php There are other discussions too, if I remember correctly Tom once admitted that the core of implementing the feature would likely consist in letting it work, as the infrastructure is there to do it but it is actively disabled. I can't find the mail now though. So it is really an ideological thing and not lack of demand or implementation attempts... I for myself can't write working C code anyway, so I got my peace with the workaround - I wish you good luck arguing Tom :-) Cheers, Csaba.
On Tue, Nov 30, 2010 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Nov 30, 2010 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Unfortunately, the UPDATE >>> case would be an order of magnitude harder (think inheritance trees >>> where the children aren't all alike). > >> I don't understand why there's anything more to this than sticking a >> Limit node either immediately above or immediately below the >> ModifyTable node. > > 1. You need to support ORDER BY too, otherwise I *will* be on the > warpath against this as a foot-gun with no redeeming social value. Will you be wielding a Tom-ahawk? > 2. So what you need is Sort underneath Limit underneath ModifyTable. > Putting them above it would be quite the wrong semantics. OK. > 3. This doesn't work tremendously well for inheritance trees, where > ModifyTable acts as sort of an implicit Append node. You can't just > funnel all the tuples through one Sort or Limit node because they aren't > all the same rowtype. (Limit might perhaps not care, but Sort will.) > But you can't have a separate Sort/Limit for each table either, because > that would give the wrong behavior. Another problem with funneling all > the rows through one Sort/Limit is that ModifyTable did need to know > which table each row came from, so it can apply the modify to the right > table. Could you possibly have ModifyTable -> Limit -> MergeAppend? > Or I guess you could try to persuade us that DELETE/UPDATE with ORDER BY > or LIMIT doesn't need to support inherited target tables. I wouldn't > bet on that proposal flying either. I've spent enough time worrying about the fact that tables with inheritance children don't behave as nicely as those that don't to have any interest in going in the other direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> 3. This doesn't work tremendously well for inheritance trees, where<br /> > ModifyTable acts as sort of an implicitAppend node. You can't just<br /> > funnel all the tuples through one Sort or Limit node because they aren't<br/> > all the same rowtype. (Limit might perhaps not care, but Sort will.)<br /> > But you can't have a separateSort/Limit for each table either, because<br /> > that would give the wrong behavior. Another problem with funnelingall<br /> > the rows through one Sort/Limit is that ModifyTable did need to know<br /> > which table eachrow came from, so it can apply the modify to the right<br /> > table.<br /><br />So I guess that I have choose thewrong hack to start.<br /><br />Just for curiosity, why the result of "WHERE" filter (in SELECT/DELETE/UPDATE) is notput in memory, i.e. an array of ctid, like an buffer and then executed by SELECT/DELETE/UPDATE at once ?<br /><br />Greets,<br/>--<br />Daniel Loureiro<br />
to me the key its security - its a anti-DBA-with-lack-of-attention feature. If i forget the "WHERE" statement, I will deletesome valid tuples and messed up the bd, but its less-than-worst that exclude all the table. A DBA who never forgotan "WHERE" in an "DELETE" is not an DBA. Just kidding, but this happens often enough.<br /><br />is there another optionto implement this ? Its possible to be done by plugins/extension (in a Firefox browser style) ?<br /><br />Sds,<br/>--<br clear="all" />Daniel Loureiro<br />------------------------------<br /><br /><div class="gmail_quote"> 2010/11/30Andrew Dunstan <span dir="ltr"><<a href="mailto:andrew@dunslane.net">andrew@dunslane.net</a>></span><br /><blockquoteclass="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left:1ex;"><div bgcolor="#ffffff" text="#000000"><div class="im"><br /><br /> On 11/30/2010 09:57 AM, Csaba Nagywrote: <blockquote type="cite"><br /><pre>So it is really an ideological thing and not lack of demand or implementation attempts... I for myself can't write working C code anyway, so I got my peace with the workaround - I wish you good luck arguing Tom :-) </pre></blockquote><br /></div> We need a convincing use case for it. So far the only one that's seemed at all convincingto me is the one about deleting in batches. But that might be enough.<br /><br /> As for it being illogical, Idon't think it's any more so than<br /><blockquote>DELETE FROM foo WHERE random() < 0.1;<br /></blockquote> and you cando that today.<br /><br /> cheers<br /><br /> andrew<br /><br /></div></blockquote></div><br />
Daniel Loureiro <daniel@termasa.com.br> wrote: > to me the key its security - its a anti-DBA-with-lack-of-attention > feature. Well, it seems pretty weak to me for that purpose. You still trash data, and you don't have any immediate clue as to what. If you wanted protection from that you'd want more of an "assert limit" that would fail if the affected row count was above what you specified. For me the best solution is to develop good habits. I first type my statement as "SELECT * FROM ..." and after reviewing the results arrow up and replace "SELECT *" with "DELETE". If there's enough volatility or complexity to make that insufficient insurance, I begin a transaction. That way I can not only review row counts but run queries against the modified data to confirm correct modification before issuing a COMMIT (or ROLLBACK). The batching of updates so that vacuums can make space available for re-use is more compelling to me, but still pretty iffy, since the work-arounds aren't that hard to find. -Kevin
On 11/30/2010 02:12 PM, Kevin Grittner wrote: > Daniel Loureiro<daniel@termasa.com.br> wrote: > >> to me the key its security - its a anti-DBA-with-lack-of-attention >> feature. > > Well, it seems pretty weak to me for that purpose. You still trash > data, and you don't have any immediate clue as to what. I agree, that argument is completely misconceived. If the DBA is paying enough attention to use LIMIT, s/he should be paying enough attention not to do damage in the first place. If that were the only argument in its favor I'd be completely against the feature. cheers andrew
> On 11/30/2010 02:12 PM, Kevin Grittner wrote: >> Daniel Loureiro<daniel@termasa.com.br> wrote: >> >>> to me the key its security - its a anti-DBA-with-lack-of-attention >>> feature. >> >> Well, it seems pretty weak to me for that purpose. You still trash >> data, and you don't have any immediate clue as to what. > > I agree, that argument is completely misconceived. If the DBA is paying > enough attention to use LIMIT, s/he should be paying enough attention > not to do damage in the first place. If that were the only argument in > its favor I'd be completely against the feature. I don't buy the argument either; why would you put a LIMIT there and delete one row by accident when you could put a BEGIN; in front and not do any damage at all? Regards, Marko Tiikkaja
On Tue, 2010-11-30 at 11:12 -0500, Robert Haas wrote: > > 3. This doesn't work tremendously well for inheritance trees, where > > ModifyTable acts as sort of an implicit Append node. You can't just > > funnel all the tuples through one Sort or Limit node because they aren't > > all the same rowtype. (Limit might perhaps not care, but Sort will.) > > But you can't have a separate Sort/Limit for each table either, because > > that would give the wrong behavior. Another problem with funneling all > > the rows through one Sort/Limit is that ModifyTable did need to know > > which table each row came from, so it can apply the modify to the right > > table. > > Could you possibly have ModifyTable -> Limit -> MergeAppend? Before MergeAppend knows which tuple to produce, it needs to see the tuples (at least the first one from each of its children), meaning that it needs to pull them through ModifyTable; and at that point it's already too late. Also, assuming LIMIT K, MergeAppend will have N children, meaning N limits, meaning an effective limit of K*N rather than K. Can you be a little more specific about what you mean? Regards,Jeff Davis
While reading this thread, I thought of two things I think we could do if this feature was implemented: 1. Sort large UPDATE/DELETEs so it is done in heap order This is actually a TODO item. I imagine it would be possible to do something like: DELETE FROM foo USING (...) ORDER BY ctid; with this patch to help this case. 2. Reducing deadlocks in big UPDATE/DELETEs One problem that sometimes occurs when doing multiple multi-row UPDATEs or DELETEs concurrently is that the transactions end up working on the same rows, but in a different order. One could use an ORDER BY clause to make sure the transactions don't deadlock. Thoughts? Regards, Marko Tiikkaja
On Tuesday 30 November 2010 20:24:52 Marko Tiikkaja wrote: > > On 11/30/2010 02:12 PM, Kevin Grittner wrote: > >> Daniel Loureiro<daniel@termasa.com.br> wrote: > >>> to me the key its security - its a anti-DBA-with-lack-of-attention > >>> feature. > >> > >> Well, it seems pretty weak to me for that purpose. You still trash > >> data, and you don't have any immediate clue as to what. > > > > I agree, that argument is completely misconceived. If the DBA is paying > > enough attention to use LIMIT, s/he should be paying enough attention > > not to do damage in the first place. If that were the only argument in > > its favor I'd be completely against the feature. > > I don't buy the argument either; why would you put a LIMIT there and > delete one row by accident when you could put a BEGIN; in front and not > do any damage at all? Because the delete of the whole table may take awfully long? Andres
On Tue, Nov 30, 2010 at 9:24 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote: >> On 11/30/2010 02:12 PM, Kevin Grittner wrote: >>> >>> Daniel Loureiro<daniel@termasa.com.br> wrote: >>> >>>> to me the key its security - its a anti-DBA-with-lack-of-attention >>>> feature. >>> >>> Well, it seems pretty weak to me for that purpose. You still trash >>> data, and you don't have any immediate clue as to what. >> >> I agree, that argument is completely misconceived. If the DBA is paying >> enough attention to use LIMIT, s/he should be paying enough attention >> not to do damage in the first place. If that were the only argument in >> its favor I'd be completely against the feature. > > I don't buy the argument either; why would you put a LIMIT there and delete > one row by accident when you could put a BEGIN; in front and not do any > damage at all? > It is valuable as a DBA carelessness/typo catcher only if it is imposed by default (in line with Kevin's point), and only if it rolls back rather than reduces the number of affected rows (as per Marko). We have implemented a damage limitation solution similar to this with triggers on an MSSQL database, and it has worked for the specific environment it's in. The safety net is basically that the DBA has to set an environment variable before a very large delete or update operation. If the operation is recognised as being beyond the threshold size the enviroment variable is checked - if it is set the transaction passes and the variable is reset, if not the transaction is rolled back. It should be possible to implement something along these lines in triggers, all that would be needed is a structure for defining the (optional) limits on potentially destructive operations. More flexible options or options based on the number of rows in a table will rapidly increase the performance impact of the triggers - but may make them more useful. I'm not sure if there is a way to persist data (like a row count) between per row triggers so that the operation could be aborted at the limit rather than only once all the rows had been updated (potentially a big peformance gain). Alastair "Bell" Turner Technical Lead ^F5
On 11/30/2010 03:16 PM, Andres Freund wrote: > On Tuesday 30 November 2010 20:24:52 Marko Tiikkaja wrote: >>> On 11/30/2010 02:12 PM, Kevin Grittner wrote: >>>> Daniel Loureiro<daniel@termasa.com.br> wrote: >>>>> to me the key its security - its a anti-DBA-with-lack-of-attention >>>>> feature. >>>> Well, it seems pretty weak to me for that purpose. You still trash >>>> data, and you don't have any immediate clue as to what. >>> I agree, that argument is completely misconceived. If the DBA is paying >>> enough attention to use LIMIT, s/he should be paying enough attention >>> not to do damage in the first place. If that were the only argument in >>> its favor I'd be completely against the feature. >> I don't buy the argument either; why would you put a LIMIT there and >> delete one row by accident when you could put a BEGIN; in front and not >> do any damage at all? > Because the delete of the whole table may take awfully long? > > I don't see that that has anything to do with restricting damage. LIMIT might be useful for the reason you give, but not as any sort of protection against DBA carelessness. That's what the discussion above is about. cheers andrew
On Tue, Nov 30, 2010 at 2:45 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2010-11-30 at 11:12 -0500, Robert Haas wrote: >> > 3. This doesn't work tremendously well for inheritance trees, where >> > ModifyTable acts as sort of an implicit Append node. You can't just >> > funnel all the tuples through one Sort or Limit node because they aren't >> > all the same rowtype. (Limit might perhaps not care, but Sort will.) >> > But you can't have a separate Sort/Limit for each table either, because >> > that would give the wrong behavior. Another problem with funneling all >> > the rows through one Sort/Limit is that ModifyTable did need to know >> > which table each row came from, so it can apply the modify to the right >> > table. >> >> Could you possibly have ModifyTable -> Limit -> MergeAppend? > > Before MergeAppend knows which tuple to produce, it needs to see the > tuples (at least the first one from each of its children), meaning that > it needs to pull them through ModifyTable; and at that point it's > already too late. > > Also, assuming LIMIT K, MergeAppend will have N children, meaning N > limits, meaning an effective limit of K*N rather than K. > > Can you be a little more specific about what you mean? You seem to be imagining the MergeAppend node on top, but I had it in the other order in my mind. The ModifyTable node would be the outermost plan node, pulling from the Limit, which would deliver the first n table rows from the MergeAppend, which would be reponsible for getting it from the various child tables. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes: > While reading this thread, I thought of two things I think we could do > if this feature was implemented: > 1. Sort large UPDATE/DELETEs so it is done in heap order > This is actually a TODO item. I imagine it would be possible to do > something like: > DELETE FROM foo USING (...) ORDER BY ctid; > with this patch to help this case. Well, that's strictly an implementation detail; it is not a reason to expose ORDER BY to the user, and even less of a reason to invent LIMIT. It also hasn't got any of the problems we were discussing with inheritance situations, since it'd be perfectly OK (in fact probably desirable) to sort each table's rows separately. > 2. Reducing deadlocks in big UPDATE/DELETEs > One problem that sometimes occurs when doing multiple multi-row UPDATEs > or DELETEs concurrently is that the transactions end up working on the > same rows, but in a different order. One could use an ORDER BY clause > to make sure the transactions don't deadlock. That, on the other hand, seems like potentially a valid use-case. Note that the user-given order would have to override any internal attempt to order by ctid for this to be usable. I had thought of a slightly different application, which could be summarized with this example: UPDATE sometab SET somecol = nextval('seq') ORDER BY id; with the expectation that somecol's values would then fall in the same order as the id column. Unfortunately, that won't actually *work* reliably, the reason being that ORDER BY is applied after targetlist computation. I think enough people would get burnt this way that we'd have popular demand to make ORDER BY work differently in UPDATE than it does in SELECT, which seems rather ugly not only from the definitional side but the implementation side. (DELETE escapes this issue because it has no user-definable elements in its targetlist, which is another way that DELETE is simpler here.) regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On Tuesday 30 November 2010 20:24:52 Marko Tiikkaja wrote: >> I don't buy the argument either; why would you put a LIMIT there and >> delete one row by accident when you could put a BEGIN; in front and not >> do any damage at all? > Because the delete of the whole table may take awfully long? Then you just C-c and that's your ROLLBACK. Been there, seen that (a developer came to me sweating over maybe-lost data — his chance was that forgetting the WHERE clause, it did take long enough for him to C-c by reflex, the oops moment). But more to the point, I don't see that we're this much on the policy side of things rather than on the mechanism side. This feature has real appealing usages (cheap work queues, anti-deadlock, huge data purges with no production locking — you do that in little steps in a loop). To summarize, people that are arguing against are saying they will not themselves put time on the feature more than anything else, I think. I don't see us refusing a good implementation on the grounds that misuse is possible. After all, advisory locks are session based, to name another great foot gun. If you don't think it's big enough, think about web environments and pgbouncer in transaction pooling mode. Loads of fun. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Robert Haas <robertmhaas@gmail.com> writes: > You seem to be imagining the MergeAppend node on top, but I had it in > the other order in my mind. The ModifyTable node would be the > outermost plan node, pulling from the Limit, which would deliver the > first n table rows from the MergeAppend, which would be reponsible for > getting it from the various child tables. That's just a variation of the Sort/Limit/ModifyTable approach. It doesn't fix the problem of how ModifyTable knows which table each row came from, and it doesn't fix the problem of the rows not being all the same rowtype. (In fact it makes the latter worse, since now MergeAppend has to be included in whatever kluge you invent to work around it.) regards, tom lane
Excerpts from Daniel Loureiro's message of mar nov 30 15:04:17 -0300 2010: > So I guess that I have choose the wrong hack to start. So it seems :-D -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, 2010-11-30 at 15:52 -0500, Robert Haas wrote: > On Tue, Nov 30, 2010 at 2:45 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, 2010-11-30 at 11:12 -0500, Robert Haas wrote: > >> > >> Could you possibly have ModifyTable -> Limit -> MergeAppend? > > > > Before MergeAppend knows which tuple to produce, it needs to see the > > tuples (at least the first one from each of its children), meaning that > > it needs to pull them through ModifyTable; and at that point it's > > already too late. > > > > You seem to be imagining the MergeAppend node on top Yes, I assumed that the tuples flowed in the direction of the arrows ;) Now that I think about it, your representation makes some sense given our EXPLAIN output. Regards,Jeff Davis
Daniel Loureiro wrote: > > 3. This doesn't work tremendously well for inheritance trees, where > > ModifyTable acts as sort of an implicit Append node. You can't just > > funnel all the tuples through one Sort or Limit node because they aren't > > all the same rowtype. (Limit might perhaps not care, but Sort will.) > > But you can't have a separate Sort/Limit for each table either, because > > that would give the wrong behavior. Another problem with funneling all > > the rows through one Sort/Limit is that ModifyTable did need to know > > which table each row came from, so it can apply the modify to the right > > table. > > So I guess that I have choose the wrong hack to start. > > Just for curiosity, why the result of "WHERE" filter (in > SELECT/DELETE/UPDATE) is not put in memory, i.e. an array of ctid, like an > buffer and then executed by SELECT/DELETE/UPDATE at once ? Informix dbaccess would prompt a user for confirmation if it saw a DELETE with no WHERE. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
its pretty clear to me that's 2 different needs here, both linked to DELETE/UPDATE behavior.
A) an feature MySQL-like which will DELETE/UPDATE just K tuples
B) an feature to protect the database in case the DBA forget the "WHERE" statement
I think that the first feature its pretty reasonable for many reasons - some of then listed below (not in order of importance):
1) MySql compatibility: will turn more easy intercompatibility
2) speed: why scan all the table if its expected to affect just one row ?
3) possibility to batch operation (paginate UPDATE/DELETE)
4) easy-to-use in some operations (like delete the row with higher Y field): its necessary to implement with "ORDER BY"
5) some others independent (and possibly weird needs) things that i forget
The second feature its something to turn the PostgreSQL more secure: in others words armor from DBA. The syntax maybe will something like "DELETE .... ASSERT 1", or an explicit keyword for this, like: "DELETEO ...". So, the mechanism should be give an error and rollback if the command affect more than specified tuples. IMHO this its a very weird syntax and so much non-standard SQL. So I believe this not a so-necessary feature. Ok I known that I started this discussion (around this weird feature, not the first and reasonable feature), but was good to instigate others thoughts.
Sds,
--
Daniel Loureiro
A) an feature MySQL-like which will DELETE/UPDATE just K tuples
B) an feature to protect the database in case the DBA forget the "WHERE" statement
I think that the first feature its pretty reasonable for many reasons - some of then listed below (not in order of importance):
1) MySql compatibility: will turn more easy intercompatibility
2) speed: why scan all the table if its expected to affect just one row ?
3) possibility to batch operation (paginate UPDATE/DELETE)
4) easy-to-use in some operations (like delete the row with higher Y field): its necessary to implement with "ORDER BY"
5) some others independent (and possibly weird needs) things that i forget
The second feature its something to turn the PostgreSQL more secure: in others words armor from DBA. The syntax maybe will something like "DELETE .... ASSERT 1", or an explicit keyword for this, like: "DELETEO ...". So, the mechanism should be give an error and rollback if the command affect more than specified tuples. IMHO this its a very weird syntax and so much non-standard SQL. So I believe this not a so-necessary feature. Ok I known that I started this discussion (around this weird feature, not the first and reasonable feature), but was good to instigate others thoughts.
Sds,
--
Daniel Loureiro
2010/11/30 Bruce Momjian <bruce@momjian.us>
Informix dbaccess would prompt a user for confirmation if it saw aDaniel Loureiro wrote:
> > 3. This doesn't work tremendously well for inheritance trees, where
> > ModifyTable acts as sort of an implicit Append node. You can't just
> > funnel all the tuples through one Sort or Limit node because they aren't
> > all the same rowtype. (Limit might perhaps not care, but Sort will.)
> > But you can't have a separate Sort/Limit for each table either, because
> > that would give the wrong behavior. Another problem with funneling all
> > the rows through one Sort/Limit is that ModifyTable did need to know
> > which table each row came from, so it can apply the modify to the right
> > table.
>
> So I guess that I have choose the wrong hack to start.
>
> Just for curiosity, why the result of "WHERE" filter (in
> SELECT/DELETE/UPDATE) is not put in memory, i.e. an array of ctid, like an
> buffer and then executed by SELECT/DELETE/UPDATE at once ?
DELETE with no WHERE.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Hi,
-- Valentine Gogichashvili
actually introducing LIMIT and OFFSET with ORDER BY to DELETE/UPDATE would make it much easier to push data from one (say "queue") table to another. And to fetch chunks of queue entries updating their status in one statement. Now I have to do SELECT...ORDER BY...LIMIT and then do some magic with arrays of IDs and updates/deletes or UPDATE ... WHERE id in (SELECT .. ORDER BY... LIMIT) RETURNING ... to make that work, but this is still possible to do with the WHERE clause, though I am not quite sure if that is most efficient in comparison to the direct approach. And speaking about pushing data from one table to another, what I really would like to be able to do would be also something like:
INSERT INTO ...
DELETE FROM ... WHERE... ORDER BY.. [LIMIT...]
RETURNING...;
this would be also quite efficient when re-arranging data in table partitions (though LIMIT/OFFSET there will be just nice to have possibility for reducing chunk sized of data being moved).
Additionally we need quite often to clean up some log tables depending not on the timestamps but on the number of rows in that tables, so leaving only last N newest records in a table... OFFSET would be really cool to have for that usecase as well...
With best regards,
-- Valentine Gogichashvili
On 2010-12-01 1:46 PM, Valentine Gogichashvili wrote: > And speaking about pushing > data from one table to another, what I really would like to be able to do > would be also something like: > > INSERT INTO ... > DELETE FROM ... WHERE... ORDER BY.. [LIMIT...] > RETURNING...; > > this would be also quite efficient when re-arranging data in table > partitions There already are plans for implementing this (and actually a patch in the latest commitfest, look for "writeable CTEs"), sans the ORDER BY and LIMIT part. Regards, Marko Tiikkaja
On Wed, Dec 1, 2010 at 4:01 AM, Daniel Loureiro <loureirorg@gmail.com> wrote: > A) an feature MySQL-like which will DELETE/UPDATE just K tuples > B) an feature to protect the database in case the DBA forget the "WHERE" > statement > MySQL has B as well. To quote the manual: "For beginners, a useful startup option is --safe-updates (or --i-am-a-dummy, which has the same effect). This option was introduced in MySQL 3.23.11. It is helpful for cases when you might have issued a DELETE FROM tbl_name statement but forgotten the WHERE clause. Normally, such a statement deletes all rows from the table. With --safe-updates, you can delete rows only by specifying the key values that identify them. This helps prevent accidents. ... * You are not permitted to execute an UPDATE or DELETE statement unless you specify a key constraint in the WHERE clause or provide a LIMIT clause (or both). For example: UPDATE tbl_name SET not_key_column=val WHERE key_column=val; UPDATE tbl_name SET not_key_column=val LIMIT 1; * The server limits all large SELECT results to 1,000 rows unless the statement includes a LIMIT clause. * The server aborts multiple-table SELECT statements that probably need to examine more than 1,000,000 row combinations." I have actually suggested that a certain subset of my users only connect to the database if they are willing to use the --i-am-a-dummy flag. -- Rob Wultsch wultsch@gmail.com
Am 01.12.2010 15:37, schrieb Rob Wultsch: > "For beginners, a useful startup option is --safe-updates (or > --i-am-a-dummy, which has the same effect). This option was introduced > in MySQL 3.23.11. It is helpful for cases when you might have issued a > DELETE FROM tbl_name statement but forgotten the WHERE clause. > Normally, such a statement deletes all rows from the table. With > --safe-updates, you can delete rows only by specifying the key values > that identify them. This helps prevent accidents. Is it really up to the database to decide what queries are ok? It's the task of the developers to test their applikations.
Mario Weilguni <roadrunner6@gmx.at> wrote: > Is it really up to the database to decide what queries are ok? > It's the task of the developers to test their applikations. We're talking about ad hoc queries here, entered directly through psql or similar. -Kevin
> We need a convincing use case for it. So far the only one that's seemed > at all convincing to me is the one about deleting in batches. But that > might be enough. Queueing. If logless tables are in 9.1, then using PostgreSQL as the backend for a queue becomes a sensible thing to do. And what is a "pop" off a queue other than: DELETE FROM my_queue ORDER BY age LIMIT 1; For this reason, I think accepting a good patch for DELETE would be worthwhile even if we don't have UPDATE yet. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Hey,
I don't clearly understand why anybody should perform DELETE
directly from a psql terminal on a production system. WHY ?
I can't understand what problem with DELETE without WHERE clause
for application developers and why DBMS should "protect" them
from DELETE FROM table.
PS. Anybody can perform rm -rf from the shell as root. So what ?..
--
// Dmitriy.
I don't clearly understand why anybody should perform DELETE
directly from a psql terminal on a production system. WHY ?
I can't understand what problem with DELETE without WHERE clause
for application developers and why DBMS should "protect" them
from DELETE FROM table.
PS. Anybody can perform rm -rf from the shell as root. So what ?..
2010/12/1 Kevin Grittner <Kevin.Grittner@wicourts.gov>
Mario Weilguni <roadrunner6@gmx.at> wrote:We're talking about ad hoc queries here, entered directly through
> Is it really up to the database to decide what queries are ok?
> It's the task of the developers to test their applikations.
psql or similar.
-Kevin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
// Dmitriy.
On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote: > I agree, that argument is completely misconceived. If the DBA is > paying enough attention to use LIMIT, s/he should be paying enough > attention not to do damage in the first place. If that were the only > argument in its favor I'd be completely against the feature. I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could be very useful if you are doing full-table updates and you don't have enough space so you do it in chunks.
Peter Eisentraut wrote: > On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote: > > I agree, that argument is completely misconceived. If the DBA is > > paying enough attention to use LIMIT, s/he should be paying enough > > attention not to do damage in the first place. If that were the only > > argument in its favor I'd be completely against the feature. > > I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could > be very useful if you are doing full-table updates and you don't have > enough space so you do it in chunks. So should this now be a TODO item? Text? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Dec 2, 2010 at 7:12 PM, Bruce Momjian <bruce@momjian.us> wrote: > Peter Eisentraut wrote: >> On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote: >> > I agree, that argument is completely misconceived. If the DBA is >> > paying enough attention to use LIMIT, s/he should be paying enough >> > attention not to do damage in the first place. If that were the only >> > argument in its favor I'd be completely against the feature. >> >> I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could >> be very useful if you are doing full-table updates and you don't have >> enough space so you do it in chunks. > > So should this now be a TODO item? Text? Allow DELETE and UPDATE to be used with LIMIT and ORDER BY. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Dec 2, 2010 at 7:12 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Peter Eisentraut wrote: > >> On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote: > >> > I agree, that argument is completely misconceived. If the DBA is > >> > paying enough attention to use LIMIT, s/he should be paying enough > >> > attention not to do damage in the first place. If that were the only > >> > argument in its favor I'd be completely against the feature. > >> > >> I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could > >> be very useful if you are doing full-table updates and you don't have > >> enough space so you do it in chunks. > > > > So should this now be a TODO item? ?Text? > > Allow DELETE and UPDATE to be used with LIMIT and ORDER BY. Done: Allow DELETE and UPDATE to be used with LIMIT and ORDER BY * http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php * http://archives.postgresql.org/pgsql-hackers/2010-12/msg00021.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +