Thread: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
srb@cuci.nl (Stephen R. van den Berg)
Date:
The patch allows for a LIMIT clause on a DELETE or UPDATE statement. Why is this needed? - I have a table which has several identical entries, and I want to delete or update just one of them (I don't care which one, obviously). And, no, I cannot use OIDS because they'd represent unwanted overhead (the table contains a lot of entries). - It allows you to speed up DELETE or UPDATE statements which are known in advance to match only one record by adding a LIMIT 1. - It makes migrations from MySQL to PostgreSQL easier (MySQL already supports LIMIT on DELETEs and UPDATEs). It might simplify other migrations as well, I'm not sure what other DBMSes support the construct. The patch includes documentation updates (which might need to be edited to change the look and feel to the rest of the documentation). When checking the patches, please pay attention to the three extra warnings the yacc file now generates. I looked into it, but can't see the problem, actually (I'm probably overlooking something). -- Sincerely, srb@cuci.nl Stephen R. van den Berg (AKA BuGless). "Sleep: A completely inadequate substitute for caffeine."
Attachment
srb@cuci.nl (Stephen R. van den Berg) writes: > The patch allows for a LIMIT clause on a DELETE or UPDATE statement. And how exactly do you control *which* tuple(s) get deleted or updated, if the WHERE clause selects more than the limit? This just seems like a really bad idea ... > Why is this needed? I find none of these arguments compelling. > - I have a table which has several identical entries, and I want to > delete or update just one of them (I don't care which one, obviously). > And, no, I cannot use OIDS because they'd represent unwanted overhead > (the table contains a lot of entries). Then use ctid. > - It allows you to speed up DELETE or UPDATE statements which are known > in advance to match only one record by adding a LIMIT 1. Have you got any evidence that there's a meaningful speedup? > - It makes migrations from MySQL to PostgreSQL easier (MySQL already > supports LIMIT on DELETEs and UPDATEs). Just because MySQL is willing to implement nonstandard bad ideas doesn't mean we are. In any case the idea that this might provide some amount of compatibility is illusory: the odds are good that we'd delete or update a different tuple than they do, because of implementation differences. An application that actually depends on MySQL's behavior would surely be broken. > When checking the patches, please pay attention to the three extra warnings > the yacc file now generates. We have a zero-tolerance policy on yacc warnings. regards, tom lane
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
srb@cuci.nl (Stephen R. van den Berg)
Date:
Tom Lane wrote: >srb@cuci.nl (Stephen R. van den Berg) writes: >> The patch allows for a LIMIT clause on a DELETE or UPDATE statement. >And how exactly do you control *which* tuple(s) get deleted or updated, >if the WHERE clause selects more than the limit? You don't. The idea is that the database deletes at most x items. It's documented as such, it's supposed to work this way. It *can* be used as a safeguard against catastrophic failure of the (programmer or) application driving the database. I.e. use "DELETE FROM tabley WHERE x=3 LIMIT 1;" will never delete more than one item, even if the silly programmer used the wrong column (like the condition should have been "z=3", because x=3 happens to match all table entries). >This just seems like a really bad idea ... It's logical and consistent (it works as advertised) and doesn't cost much implementation wise. >> - I have a table which has several identical entries, and I want to >> delete or update just one of them (I don't care which one, obviously). >> And, no, I cannot use OIDS because they'd represent unwanted overhead >> (the table contains a lot of entries). >Then use ctid. Hmmm, I didn't know about ctid. It does seem to allow me to distinguish values. It will require a SELECT followed by a DELETE or UPDATE though AFAICS. But I agree that it seems to solve my problem. >> - It allows you to speed up DELETE or UPDATE statements which are known >> in advance to match only one record by adding a LIMIT 1. >Have you got any evidence that there's a meaningful speedup? No. I just noted this as a sideeffect which is a result of me solving the delete/update problem above. The actual speedup depends on the query planner. I presume that the planner is likely to spend less time optimising the query if it knows in advance that it's going to need just one result row. >> - It makes migrations from MySQL to PostgreSQL easier (MySQL already >> supports LIMIT on DELETEs and UPDATEs). >Just because MySQL is willing to implement nonstandard bad ideas doesn't >mean we are. In any case the idea that this might provide some amount >of compatibility is illusory: the odds are good that we'd delete or >update a different tuple than they do, because of implementation >differences. An application that actually depends on MySQL's behavior >would surely be broken. MySQL documents that the actual record being deleted is "random". Any application which expects and uses this feature in its documented way would work equally well on PostgreSQL (and yes, the records deleted might differ, but for all intents and purposes they are the same anyway, if not, the WHERE clause is not specific enough). >> When checking the patches, please pay attention to the three extra warnings >> the yacc file now generates. >We have a zero-tolerance policy on yacc warnings. I already assumed that, which is why I made note of the fact that they are being generated due to my proposed patch. If my patch should be accepted, I'm willing to eliminate the warnings. It's just that it could be that to an experienced PostgreSQL hacker these warnings might have been trivial to fix, whereas my familiarity with the PostgreSQL source code currently is based on a 20 minute cursory reading of it while patching. Thing is, I spent 4 weeks (off and on) trying to find an efficient workaround for the delete-just-one-of-a-set-of-identical-records problem in PostgreSQL, and as I finally got fed up with it, I unpacked the source and looked if adding the LIMIT clause was difficult or not. It turned out it was not. IMO the patch still has its merits; but if it's not accepted, I'm content with a slightly elaborate and even more non-standard ctid hack. -- Sincerely, srb@cuci.nl Stephen R. van den Berg (AKA BuGless). "Sleep: A completely inadequate substitute for caffeine."
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
Alvaro Herrera
Date:
En Sat, 21 Sep 2002 19:14:20 +0200 srb@cuci.nl (Stephen R. van den Berg) escribió: > Tom Lane wrote: > >srb@cuci.nl (Stephen R. van den Berg) writes: > >> The patch allows for a LIMIT clause on a DELETE or UPDATE statement. > > >And how exactly do you control *which* tuple(s) get deleted or updated, > >if the WHERE clause selects more than the limit? > > It *can* be used as a safeguard against catastrophic failure of > the (programmer or) application driving the database. > I.e. use "DELETE FROM tabley WHERE x=3 LIMIT 1;" will never delete more > than one item, even if the silly programmer used the wrong column (like > the condition should have been "z=3", because x=3 happens to match all > table entries). In this case, use an explicit transaction "just to be sure", and if you find that more tuples were deleted that should have been, rollback. As is, this is a dangerous feature because it's not predictable. > >This just seems like a really bad idea ... > > It's logical and consistent (it works as advertised) and doesn't cost > much implementation wise. It's not consistent, because it can delete/update different rows, given the same dataset and the same query. It's not logical if you look from the user's point of view. It may be internally, but that's another story. > >Have you got any evidence that there's a meaningful speedup? > > No. I just noted this as a sideeffect which is a result of me solving > the delete/update problem above. > The actual speedup depends on the query planner. I presume that > the planner is likely to spend less time optimising the query if it > knows in advance that it's going to need just one result row. I suspect you'd have to tweak the planner... > MySQL documents that the actual record being deleted is "random". I think this feature would be much more useful if you could use ORDER BY and an expression on the LIMIT clause instead of just a number. And the corresponding OFFSET clause should be added as well. So one can say "drop the three worst customers" or "change to 'gold' the status of the customers with total > $10000 last week". I don't know if this can be done on one query with the current featureset. -- Alvaro Herrera (<alvherre[a]atentus.com>) "Lo esencial es invisible para los ojos" (A. de Saint Exúpery)
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
srb@cuci.nl (Stephen R. van den Berg)
Date:
Alvaro Herrera wrote: >srb@cuci.nl (Stephen R. van den Berg) escribi?: >> Tom Lane wrote: >> >srb@cuci.nl (Stephen R. van den Berg) writes: >> It *can* be used as a safeguard against catastrophic failure of >> the (programmer or) application driving the database. >> I.e. use "DELETE FROM tabley WHERE x=3 LIMIT 1;" will never delete more >> than one item, even if the silly programmer used the wrong column (like >In this case, use an explicit transaction "just to be sure", and if you >find that more tuples were deleted that should have been, rollback. Ok, granted. This will work in the interactive case. This will not work in the case of a bug in an applicationprogram, unless you enclose every delete and update in a transaction which is checked for tuples modified (checking the number of tuples modified is not possible in many application languages). >> It's logical and consistent (it works as advertised) and doesn't cost >> much implementation wise. >It's not consistent, because it can delete/update different rows, given >the same dataset and the same query. It's not logical if you look from >the user's point of view. It may be internally, but that's another IMHO it is logical, because if it's documented to be a random tuple, then it should be a random tuple. There's no user that can complain about that (unless he/she does not consult the manual). >> >Have you got any evidence that there's a meaningful speedup? >> No. I just noted this as a sideeffect which is a result of me solving >> the delete/update problem above. >> The actual speedup depends on the query planner. I presume that >> the planner is likely to spend less time optimising the query if it >> knows in advance that it's going to need just one result row. >I suspect you'd have to tweak the planner... I may be mistaken, but I believe to have noticed a small changelog entry somewhere before 7.2.1 which read that the planner now considers LIMIT when optimising the query. >> MySQL documents that the actual record being deleted is "random". >I think this feature would be much more useful if you could use ORDER BY >and an expression on the LIMIT clause instead of just a number. And the >corresponding OFFSET clause should be added as well. So one can say >"drop the three worst customers" or "change to 'gold' the status of the >customers with total > $10000 last week". I don't know if this can be >done on one query with the current featureset. As far as I could determine this appears to be rather easy to add if so desired (the engine supports it already). -- Sincerely, srb@cuci.nl Stephen R. van den Berg (AKA BuGless). Do more than anyone expects, and pretty soon everyone will expect more.
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
srb@cuci.nl (Stephen R. van den Berg)
Date:
Stephen R. van den Berg wrote: >Tom Lane wrote: >>Then use ctid. >Hmmm, I didn't know about ctid. It does seem to allow me to distinguish >values. It will require a SELECT followed by a DELETE or UPDATE though >AFAICS. But I agree that it seems to solve my problem. Funny, actually. I just changed my application from doing a: DELETE FROM a WHERE b=3 LIMIT 1; to: DELETE FROM a WHERE ctid=(SELECT ctid FROM a WHERE b=3 LIMIT 1); It works. Actually, it works *identically*, i.e. the actual row deleted is just as randomly chosen as in the DELETE with LIMIT case. The first version looks cleaner to me (and better understandable) though. Incidentally, using a SELECT without an ORDER BY but with a LIMIT is documented to give unpredictable results, yet users are expected cope with this fact, but are expected to have problems with a similar fact in an UPDATE or DELETE statement? Somehow the argumentation is not conclusive. -- Sincerely, srb@cuci.nl Stephen R. van den Berg (AKA BuGless). Do more than anyone expects, and pretty soon everyone will expect more.
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
Alvaro Herrera
Date:
En Sun, 22 Sep 2002 01:19:24 +0200 srb@cuci.nl (Stephen R. van den Berg) escribió: > Incidentally, using a SELECT without an ORDER BY but with a LIMIT is > documented to give unpredictable results, yet users are expected cope with > this fact, but are expected to have problems with a similar fact in > an UPDATE or DELETE statement? > Somehow the argumentation is not conclusive. Yes, I was thinking the same thing when I answered earlier. I am in the same position as you here (meaning someone who has contributed some patch), so my opinion doesn't have a lot of weigth; but as I already said, the feature has some value with the ORDER BY added, and the LIMIT/OFFSET thing expanded to allow expressions (this last part is in TODO). Clearly an ORDER BY clause without LIMIT doesn't make any sense; but it does with it. -- Alvaro Herrera (<alvherre[a]atentus.com>) "Para tener mas hay que desear menos"
> srb@cuci.nl (Stephen R. van den Berg) escribi�: >> Incidentally, using a SELECT without an ORDER BY but with a LIMIT is >> documented to give unpredictable results, yet users are expected cope with >> this fact, but are expected to have problems with a similar fact in >> an UPDATE or DELETE statement? Well, IMHO there's a big difference in documented unpredictable output from a documented-unpredictable query, as opposed to documented-unpredictable changes in the database state. There is not a lot of use for the latter AFAICS. Alvaro Herrera <alvherre@atentus.com> writes: > as I already said, the feature has some value with the ORDER BY added, > and the LIMIT/OFFSET thing expanded to allow expressions (this last part > is in TODO). I'd have more confidence in the usefulness of the idea if it included ORDER BY to make the LIMIT predictable. But before you run off and implement that: does MySQL support such a thing? If not, the argument of improving compatibility still doesn't hold any water... regards, tom lane
Tom Lane wrote: > > srb@cuci.nl (Stephen R. van den Berg) escribi�: > >> Incidentally, using a SELECT without an ORDER BY but with a LIMIT is > >> documented to give unpredictable results, yet users are expected cope with > >> this fact, but are expected to have problems with a similar fact in > >> an UPDATE or DELETE statement? > > Well, IMHO there's a big difference in documented unpredictable output > from a documented-unpredictable query, as opposed to > documented-unpredictable changes in the database state. There is not > a lot of use for the latter AFAICS. > > Alvaro Herrera <alvherre@atentus.com> writes: > > as I already said, the feature has some value with the ORDER BY added, > > and the LIMIT/OFFSET thing expanded to allow expressions (this last part > > is in TODO). > > I'd have more confidence in the usefulness of the idea if it included > ORDER BY to make the LIMIT predictable. But before you run off and > implement that: does MySQL support such a thing? If not, the argument > of improving compatibility still doesn't hold any water... I see no reason to add stuff to UPDATE/DELETE when a subquery does the job just as well. It just seems like bloat. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Hello! On Sun, 22 Sep 2002, Bruce Momjian wrote: > I see no reason to add stuff to UPDATE/DELETE when a subquery does the > job just as well. It just seems like bloat. That's looks funny but can be useful. Imagine typical usage of LIMIT/OFFSET: pagination of a web-output. Say, the output is fetched thru "select id,body from articles limit 10 offset 20". Now, content-admin, surfing the content and looking to the page say 2, wanna drop all info on THAT page 2. Guess how it could ease the life for programmer?8) -- WBR, Yury Bokhoncovich, Senior System Administrator, NOC of F1 Group. Phone: +7 (3832) 106228, ext.140, E-mail: byg@center-f1.ru. Unix is like a wigwam -- no Gates, no Windows, and an Apache inside.
On Mon, 23 Sep 2002, Yury Bokhoncovich wrote: > Hello! > > On Sun, 22 Sep 2002, Bruce Momjian wrote: > > > I see no reason to add stuff to UPDATE/DELETE when a subquery does the > > job just as well. It just seems like bloat. > > That's looks funny but can be useful. > Imagine typical usage of LIMIT/OFFSET: pagination of a web-output. > Say, the output is fetched thru "select id,body from articles limit 10 > offset 20". > Now, content-admin, surfing the content and looking to the page say 2, > wanna drop all info on THAT page 2. > Guess how it could ease the life for programmer?8) I *really* hope noone (mis)uses limit/offset like that. Really. Without an order by there's no guarantee that the rows will be in the same order two statements in a row. It's ugly but marginally ok for selects, doing the same with statements that modify data is frightening. Even with an order by, concurrent modifications will make that iffy in non-serializable transactions. If the select and delete are in separate transactions, oh boy. The delete a random matching row seems okay if that's really what the user wants (if only because it makes deleting duplicates easier).
Re: Implementation of LIMIT on DELETE and UPDATE statements (rel to 7.2.1)
From
srb@cuci.nl (Stephen R. van den Berg)
Date:
Tom Lane wrote: >> srb@cuci.nl (Stephen R. van den Berg) escribi�: >>> Incidentally, using a SELECT without an ORDER BY but with a LIMIT is >>> documented to give unpredictable results, yet users are expected cope with >>> this fact, but are expected to have problems with a similar fact in >>> an UPDATE or DELETE statement? >Well, IMHO there's a big difference in documented unpredictable output >from a documented-unpredictable query, as opposed to >documented-unpredictable changes in the database state. There is not >a lot of use for the latter AFAICS. There is, in the case of duplicate entries where you want to delete just one of them. >I'd have more confidence in the usefulness of the idea if it included >ORDER BY to make the LIMIT predictable. But before you run off and >implement that: does MySQL support such a thing? If not, the argument >of improving compatibility still doesn't hold any water... MySQL supports ORDER BY in conjunction with LIMIT on a DELETE, on an UPDATE it just seems to support LIMIT, no ORDER BY. However, I do concede that a subselect in most cases is able to deal with this problem in standard SQL. The only actual improvement in readability/portability would be the case of the multiple identical tuples of which you only want to delete or update a few. As far as bloat is concerned, the engine supports it already, it basically boils down to a brief yacc-syntax extension. It does allow for a more orthogonal syntax on SELECT/DELETE/UPDATE, which is a plus, I'd say. Anyway, since ctid's solve my problem, I'm not particularly keen on getting the LIMIT support on UPDATE/DELETE anymore. I still think that the ctid solution is ugly and non-portable. But, that's a value-judgement I'm not qualified to make about PostgeSQL. Your call: - Require a ctid non-standard solution. - Or allow for a non-standard-yet-fully-orthogonal LIMIT implementation (with ORDER BY support if you like). Whatever you pick, I'm happy with; I'll provide patches for version two if so desired. -- Sincerely, srb@cuci.nl Stephen R. van den Berg (AKA BuGless). "To err is human, to debug ... divine."
Yury Bokhoncovich <byg@center-f1.ru> writes: > Imagine typical usage of LIMIT/OFFSET: pagination of a web-output. > Say, the output is fetched thru "select id,body from articles limit 10 > offset 20". > Now, content-admin, surfing the content and looking to the page say 2, > wanna drop all info on THAT page 2. > Guess how it could ease the life for programmer?8) Only until the first time he drops the wrong page that way. The above is really a perfect example of why this feature isn't safe: it would lead people to make unwarranted assumptions. The fact that such-and-such a tuple appeared second in the output of LIMIT 10 OFFSET 20 does *not* mean that it would be selected by LIMIT 1 OFFSET 21. The planner is entitled to (and often does) choose different query plans depending on the limit/offset values. regards, tom lane
srb@cuci.nl (Stephen R. van den Berg) writes: > Anyway, since ctid's solve my problem, I'm not particularly keen on > getting the LIMIT support on UPDATE/DELETE anymore. > I still think that the ctid solution is ugly and non-portable. Well, they're both ugly and nonportable, IMHO. To me the deciding argument is that the LIMIT approach is also dangerous, because it encourages people to use incompletely-specified queries to modify their data. See other messages in this thread for examples. regards, tom lane
On Mon, 23 Sep 2002, Stephen R. van den Berg wrote: > Anyway, since ctid's solve my problem, I'm not particularly keen on > getting the LIMIT support on UPDATE/DELETE anymore. > I still think that the ctid solution is ugly and non-portable. > But, that's a value-judgement I'm not qualified to make about PostgeSQL. > Your call: > - Require a ctid non-standard solution. > - Or allow for a non-standard-yet-fully-orthogonal LIMIT implementation > (with ORDER BY support if you like). > Whatever you pick, I'm happy with; I'll provide patches for > version two if so desired. I'm not going to get into the issue of whether it's good or not really, but have you tested your patch with multiple updates? ISTM that it's likely to have the same problem that select for update does when combined with limit (which is that it may return less rows than the limit if a row is modified such that it no longer meets an attached where clause) I noticed this recently due to trying to using limit with fk statements. I haven't actually put in the patch to try it however. Example (on reasonably recent development 7.3): create table test (a int); insert into test values (1); insert into test values (2); insert into test values (3); T1: begin T2: begin T1: update test set a=4 where a=1; T2: select * from test where a<3 for update limit 1; [this blocks] T1: commit; [T2 now returns 0 rows]
Tom Lane dijo: > Yury Bokhoncovich <byg@center-f1.ru> writes: > > Imagine typical usage of LIMIT/OFFSET: pagination of a web-output. > > Say, the output is fetched thru "select id,body from articles limit 10 > > offset 20". > > Now, content-admin, surfing the content and looking to the page say 2, > > wanna drop all info on THAT page 2. > > Guess how it could ease the life for programmer?8) I don't understand. It's somewhat more difficult to grab all the primary keys of the currently-selected items (and you can put a Javascript button with "select all in this page"), this I concede. But how is it better to be unsure if you are really deleting what you want to delete? Suppose another admin is also deleting and the LIMIT/OFFSET shifts between the time the page is presented and the button "delete these" is pressed... "Hey, PostgreSQL is stupid," they'll say. "How can they offer such an unsafe misfeature." -- Alvaro Herrera (<alvherre[a]atentus.com>) "God is real, unless declared as int"
Tom Lane wrote: > > > srb@cuci.nl (Stephen R. van den Berg) escribió: > >> Incidentally, using a SELECT without an ORDER BY but with a LIMIT is > >> documented to give unpredictable results, yet users are expected cope with > >> this fact, but are expected to have problems with a similar fact in > >> an UPDATE or DELETE statement? > > Well, IMHO there's a big difference in documented unpredictable output > from a documented-unpredictable query, as opposed to > documented-unpredictable changes in the database state. There is not > a lot of use for the latter AFAICS. The next thing we could implement is DELETE SOMETHING FROM SOME TABLE [OR NOT]; Very usefull for the type of programmer that needs the proposed LIMIT patch. It's the only way, those pelletheads can for sure blame the error on PostgreSQL. Sarcasm aside, folks, I am 100% with Tom here. No LIMIT on UPDATE or DELETE. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Hello! On Mon, 23 Sep 2002, Tom Lane wrote: > The above is really a perfect example of why this feature isn't safe: > it would lead people to make unwarranted assumptions. The fact that > such-and-such a tuple appeared second in the output of LIMIT 10 OFFSET > 20 does *not* mean that it would be selected by LIMIT 1 OFFSET 21. The Agreed. But if records are inserted by INSERT (times) values (now()) and those are selected ORDER BY times - the result is probably well determined.;) > planner is entitled to (and often does) choose different query plans > depending on the limit/offset values. Hm...I had always supposed that LIMIT/OFFSET is a trick to get the functionality of cursors w/o their overhead. And nobody wonders when there's sequential scan in cursor upto a required bunch of records, then bulk delete/update of those. I don't insist on the feature, just some thoughts. BTW, doing limit/offset w/o ORDER BY is useless anyway IMHO.:) -- WBR, Yury Bokhoncovich, Senior System Administrator, NOC of F1 Group. Phone: +7 (3832) 106228, ext.140, E-mail: byg@center-f1.ru. Unix is like a wigwam -- no Gates, no Windows, and an Apache inside.
On Wed, 25 Sep 2002, Yury Bokhoncovich wrote: > Hello! > > On Mon, 23 Sep 2002, Tom Lane wrote: > > > The above is really a perfect example of why this feature isn't safe: > > it would lead people to make unwarranted assumptions. The fact that > > such-and-such a tuple appeared second in the output of LIMIT 10 OFFSET > > 20 does *not* mean that it would be selected by LIMIT 1 OFFSET 21. The > > Agreed. But if records are inserted by INSERT (times) values (now()) and > those are selected ORDER BY times - the result is probably well > determined.;) Only if you're in serializable isolation mode or if there's never more than one concurrent update transaction. Otherwise concurrent updates that have committed between the select and delete could change the set of rows you see and therefore the rows you delete. TANSTAAFL unfortunately.