Thread: SELECT FOR UPDATE and LIMIT 1 behave oddly
Guys, Summary: SELECT FOR UPDATE and LIMIT behave oddly when combined Affects: 7.4.3 (not tested yet on other versions) Severity: Annoyance Description: If you attempt to lock a row "off the top" of a table by using SELECT ... FOR UPDATE LIMIT 1, any blocked transaction will have no rows returned when the lock ends. This is counter-intuitive and wierd. It is easily worked around, though, since the LIMIT 1 is really superfluous; possibly we don't want to fix it, just put a warning in the docs. Test Case: primer=# create table some_que ( sequence int, done boolean ); CREATE TABLE primer=# insert into some_que values ( 1, false ); primer=# insert into some_que values ( 2, false ); primer=# insert into some_que values ( 3, false ); primer=# insert into some_que values ( 4, false ); primer=# insert into some_que values ( 5, false ); primer=# insert into some_que values ( 6, false ); TRANSACTION A: primer=# begin; BEGIN primer=# select * from some_que where done = false order by sequence limit 1 for update; sequence | done ----------+------ 1 | f TRANSACTION B: primer=# begin; BEGIN primer=# select * from some_que where done = false order by sequence limit 1 for update; TRANSACTION A: primer=# update some_que set done = true where sequence = 1; UPDATE 1 primer=# commit; COMMIT TRANSACTION B: sequence | done ----------+------ (0 rows) ... as you can see, it falsely reports no rows.
Josh Berkus <josh@agliodbs.com> writes: > Summary: SELECT FOR UPDATE and LIMIT behave oddly when combined The FOR UPDATE part executes after the LIMIT part. Arguably this is a bad thing, but I'm concerned about the compatibility issues if we change it. regards, tom lane
Tom, > The FOR UPDATE part executes after the LIMIT part. Arguably this is a > bad thing, but I'm concerned about the compatibility issues if we change > it. In that case, maybe I should do a doc patch warning people not to combine them? Hmmm .... come to think of it, is there any easy way to query "give me the first row which is not locked"? If I tie pg_locks to a query, will I get wierd effects? Just musing .... -- Josh Berkus Aglio Database Solutions San Francisco
On Thu, 2004-10-14 at 14:02, Tom Lane wrote: > The FOR UPDATE part executes after the LIMIT part. Arguably this is a > bad thing, but I'm concerned about the compatibility issues if we change > it. I agree backward compat is a concern, but it seems pretty clear to me that this is not the optimal behavior. If there are any people who actually need the old behavior, they can nest the FOR UPDATE in a FROM-clause subselect: SELECT * FROM foo FOR UPDATE LIMIT 5; -- used to lock the whole table SELECT * FROM (SELECT * FROM foo FOR UPDATE) x LIMIT 5; -- will always do so Would it be sufficient to put a large warning in the "incompatibilities" section of the release notes? -Neil
Neil Conway <neilc@samurai.com> writes: > I agree backward compat is a concern, but it seems pretty clear to me > that this is not the optimal behavior. If there are any people who > actually need the old behavior, they can nest the FOR UPDATE in a > FROM-clause subselect: > SELECT * FROM foo FOR UPDATE LIMIT 5; -- used to lock the whole table > SELECT * FROM (SELECT * FROM foo FOR UPDATE) x LIMIT 5; -- will always > do so Allowing FOR UPDATE in sub-selects opens a can of worms that I do not think we'll be able to re-can (at least not without the proverbial larger size of can). The fundamental question about the above construct is: exactly which rows did it lock? And what's your proof that that set is what it *should* have locked? What if some of the locked rows didn't get returned to the client? Even if LIMIT happens to work in a convenient way, allowing FOR UPDATE inside a subselect would expose us to a lot of other cases (joins and aggregates for instance) that I don't believe we can guarantee pleasant behavior for. My recollection is that the original FOR UPDATE and LIMIT behaviors were both implemented at the top level in execMain.c, and at that time LIMIT effectively executed after FOR UPDATE. We later pushed LIMIT down to become a plan node, which was a good idea in every respect except that it changed the order of application of these two behaviors. I'm afraid of the semantic consequences of pushing down FOR UPDATE into a plan node however. Maybe it can be made to work but I think a lot of very careful thought will need to go into it. regards, tom lane
On Fri, 2004-10-15 at 14:22, Tom Lane wrote: > Allowing FOR UPDATE in sub-selects opens a can of worms that I do not > think we'll be able to re-can (at least not without the proverbial > larger size of can). Ah, I see. I had tried some trivial queries to determine if we supported FOR UPDATE in subqueries, such as: select * from def, abc, (select * from abc for update) x; But of course a more careful examination shows that we don't (I'd guess the planner is transforming the above subquery into a join). I think it would make sense to reject the above query for the sake of consistency. It seems that would be easy to do by rejecting FOR UPDATE of subqueries in the analysis phase, rather than going to the trouble of explicitly allowing them (see analyze.c circa line 2753) and then rejecting them in the planner. BTW, FOR UPDATE's interaction with LIMIT is not undocumented -- we actually document the opposite of what we implement. From the SELECT ref page: FOR UPDATE may appear before LIMIT for compatibility with PostgreSQL versions before 7.3. It effectively executes after LIMIT, however, and so that is the recommended place to write it. > The fundamental question about the above construct is: exactly which > rows did it lock? I'm not sure I understand. The rows the query locks should be the result set of the subquery. Also, I think it only makes sense to allow FOR UPDATE in FROM-clause subselects, and it would also be reasonable to disable some subquery optimizations (e.g. subquery pullup) when FOR UPDATE is specified. > What if some of the locked rows didn't get returned to the client? In the case of SELECT ... FOR UPDATE LIMIT x, exactly the same condition applies: some number of locked rows will not be returned to the client. -Neil
Neil Conway <neilc@samurai.com> writes: > On Fri, 2004-10-15 at 14:22, Tom Lane wrote: >> What if some of the locked rows didn't get returned to the client? > In the case of SELECT ... FOR UPDATE LIMIT x, exactly the same condition > applies: some number of locked rows will not be returned to the client. Au contraire: every row that gets locked will be returned to the client. The gripe at hand is that the number of such rows may be smaller than the client wished, because the LIMIT step is applied before we do the FOR UPDATE step (which has not only the effect of locking selected rows, but of disregarding rows that were deleted since our query snapshot). regards, tom lane
On Fri, 2004-10-15 at 15:30, Tom Lane wrote: > Au contraire: every row that gets locked will be returned to the client. > The gripe at hand is that the number of such rows may be smaller than > the client wished, because the LIMIT step is applied before we do the > FOR UPDATE step Ah, my apologies -- I misunderstood. Clearly not enough coffee this morning :-) Sorry for the noise. -Neil
Tom, Neil, > > Au contraire: every row that gets locked will be returned to the client. > > The gripe at hand is that the number of such rows may be smaller than > > the client wished, because the LIMIT step is applied before we do the > > FOR UPDATE step As I said, I think this can be taken care of with a doc patch. The truth is that FOR UPDATE LIMIT is not really terribly useful (it will still block outer queries to that table with the same LIMIT clause, so why not lock the whole table?). I propose that I add this sentence to the Docs: -------------- Please not that, since LIMIT is applied before FOR UPDATE, rows which disappear from the target set while waiting for a lock may result in less than LIMIT # of rows being returned. This can result in unintuitive behavior, so FOR UPDATE and LIMIT should only be combined after significant testing. --------------- Here's a question, though, for my education: It's possible to query "Please lock the first row which is not already locked" by including pg_locks, pg_class and xmax in the query set. Tom warned that this could result in a race condition. If the query-and-lock were a single statement, how would a race condition result? How could I test for it? -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2004-10-15 at 17:09, Josh Berkus wrote: > I propose that I add this sentence to the Docs: > > -------------- > Please not that, since LIMIT is applied before FOR UPDATE, rows which ^^^ I assume this should be "note". It took me a little time to parse your plaintive appeal correctly. :-) > disappear from the target set while waiting for a lock may result in less > than LIMIT # of rows being returned. This can result in unintuitive > behavior, so FOR UPDATE and LIMIT should only be combined after significant > testing. > --------------- -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "But be ye doers of the word, and not hearers only, deceiving your own selves." James 1:22
Bruce, Ah, yes, which reminds me I need to generate that doc patch. > I am wondering if a documentation warning about the use of FOR UPDATE > and LIMIT is a good idea. If we can't be sure the LIMIT will return a > guaranteed number of rows, should we just disallow that combination? I > realize such a case is rare. Should we emit a warning when it happens? Well, limit+for update can be useful under some circumstances, as long as you understand its limitations. We found a workaround. So I'd oppose disallowing it. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Tom, Neil, > > > > Au contraire: every row that gets locked will be returned to the client. > > > The gripe at hand is that the number of such rows may be smaller than > > > the client wished, because the LIMIT step is applied before we do the > > > FOR UPDATE step > > As I said, I think this can be taken care of with a doc patch. The truth is > that FOR UPDATE LIMIT is not really terribly useful (it will still block > outer queries to that table with the same LIMIT clause, so why not lock the > whole table?). I propose that I add this sentence to the Docs: > > -------------- > Please not that, since LIMIT is applied before FOR UPDATE, rows which > disappear from the target set while waiting for a lock may result in less > than LIMIT # of rows being returned. This can result in unintuitive > behavior, so FOR UPDATE and LIMIT should only be combined after significant > testing. > --------------- > > Here's a question, though, for my education: It's possible to query "Please > lock the first row which is not already locked" by including pg_locks, > pg_class and xmax in the query set. Tom warned that this could result in a > race condition. If the query-and-lock were a single statement, how would a > race condition result? How could I test for it? I am wondering if a documentation warning about the use of FOR UPDATE and LIMIT is a good idea. If we can't be sure the LIMIT will return a guaranteed number of rows, should we just disallow that combination? I realize such a case is rare. Should we emit a warning when it happens? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Andrea, > i'm sorry for the curiosity.... but > could you share, if it's possible, this workaround? ;) > (if it's not the one you describe at the beginning thread > e.g. don't use LIMIT 1) Well, we actually roped in the pg_locks view to do a "SELECT the first row not already locked for update". Then added some code on the client end for error handling, like race conditions and no rows being returned, both of which happen in production. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
hi! Josh Berkus wrote: > Bruce, > [snip] > > > Well, limit+for update can be useful under some circumstances, as long as you > understand its limitations. We found a workaround. i'm sorry for the curiosity.... but could you share, if it's possible, this workaround? ;) andrea
tnks for the hint ;) I'll try something similar here. Andrea Josh Berkus wrote: > Andrea, > > >>i'm sorry for the curiosity.... but >>could you share, if it's possible, this workaround? ;) >>(if it's not the one you describe at the beginning thread >> e.g. don't use LIMIT 1) > > > Well, we actually roped in the pg_locks view to do a "SELECT the first row not > already locked for update". Then added some code on the client end for > error handling, like race conditions and no rows being returned, both of > which happen in production. >
I have documented the possible problem with LIMIT and FOR UPDATE. I also remove the mention that FOR UPDATE can appear before LIMIT for pre-7.3 compatibility. Patch applied to CVS HEAD only. --------------------------------------------------------------------------- Josh Berkus wrote: > Tom, Neil, > > > > Au contraire: every row that gets locked will be returned to the client. > > > The gripe at hand is that the number of such rows may be smaller than > > > the client wished, because the LIMIT step is applied before we do the > > > FOR UPDATE step > > As I said, I think this can be taken care of with a doc patch. The truth is > that FOR UPDATE LIMIT is not really terribly useful (it will still block > outer queries to that table with the same LIMIT clause, so why not lock the > whole table?). I propose that I add this sentence to the Docs: > > -------------- > Please not that, since LIMIT is applied before FOR UPDATE, rows which > disappear from the target set while waiting for a lock may result in less > than LIMIT # of rows being returned. This can result in unintuitive > behavior, so FOR UPDATE and LIMIT should only be combined after significant > testing. > --------------- > > Here's a question, though, for my education: It's possible to query "Please > lock the first row which is not already locked" by including pg_locks, > pg_class and xmax in the query set. Tom warned that this could result in a > race condition. If the query-and-lock were a single statement, how would a > race condition result? How could I test for it? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/select.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v retrieving revision 1.83 diff -c -c -r1.83 select.sgml *** doc/src/sgml/ref/select.sgml 8 Apr 2005 00:59:58 -0000 1.83 --- doc/src/sgml/ref/select.sgml 22 Apr 2005 04:15:06 -0000 *************** *** 830,840 **** </para> <para> ! <literal>FOR UPDATE</literal> may appear before ! <literal>LIMIT</literal> for compatibility with ! <productname>PostgreSQL</productname> versions before 7.3. It ! effectively executes after <literal>LIMIT</literal>, however, and ! so that is the recommended place to write it. </para> </refsect2> </refsect1> --- 830,842 ---- </para> <para> ! It is possible for a <command>SELECT</> command using both ! <literal>LIMIT</literal> and <literal>FOR UPDATE</literal> ! clauses to return fewer rows than specified by <literal>LIMIT</literal>. ! This is because <literal>LIMIT</> selects a number of rows, ! but might then block requesting a <literal>FOR UPDATE</literal> lock. ! Once the <literal>SELECT</> unblocks, the query qualifiation might not ! be met and the row not be returned by <literal>SELECT</>. </para> </refsect2> </refsect1>