Thread: possible row locking bug in 7.0.3 & 7.1
I'm having a problem with functions written in SQL. Specifically, they don't seem to be adhering to Postgres locking rules. For the record, I'm using postgres 7.0.3, installed from RPMs, on Red Hat 6.2. I got the same results with postgres 7.1 beta 6, installed from sources. Here's what I'm seeing: (psql input represented by '<<'; output represented by '>>'.) session1<< create table idseq session1<< ( session1<< name varchar(32) not null, session1<< id int8 not null default 0 session1<< ); session1>> CREATE session1<< insert into idseq values ('myid'); session1>> INSERT 18734 1 Each row in the table is supposed to represent a named numeric sequence, much like the sequences built into postgres. (Mine use an int8 though, so their values can be much higher.) session1<< create function nextid( varchar(32)) returns int8 as ' session1<< select * from idseq where name = $1::text for update; session1<< update idseq set id = id + 1 where name = $1::text; session1<< select id from idseq where name = $1::text; session1<< ' language 'sql'; session1>> CREATE The idea here is that the select...for update within the nextid() function will establish a row level lock, preventing two concurrent function calls from overlapping. Next, I test with two sessions as follows: session1<< begin; session1>> BEGIN session2<< begin; session2>> BEGIN session1<< select nextid('myid'); session1>> nextid session1>> -------- session1>> 1 session1>> (1 row) session2<< select nextid('myid'); (session2 blocks until session1 completes its transaction) session1<< commit; session1>> COMMIT (session2 resumes) session2>> nextid session2>> -------- session2>> 0 session2>> (1 row) What gives??? I expected the second call to nextid() to return 2! session2<< commit; session2>> COMMIT session2<< select * from idseq; session2>> name | id session2>> ------+---- session2>> myid | 2 session2>> (1 row) session1<< select * from idseq; session1>> name | id session1>> ------+---- session1>> myid | 2 session1>> (1 row) As you can see, my nextid() function is not synchronized the way I hoped. I don't know why, though. Can someone help? I'm going to try out some of my SPI functions with 7.1 beta 6, to see if they exhibit a locking problem as well. Thanks, Forest Wilkinson
Forest Wilkinson <fspam@home.com> writes: > session1<< create function nextid( varchar(32)) returns int8 as ' > session1<< select * from idseq where name = $1::text for update; > session1<< update idseq set id = id + 1 where name = $1::text; > session1<< select id from idseq where name = $1::text; > session1<< ' language 'sql'; > [ doesn't work as expected in parallel transactions ] This is a fairly interesting example. What I find is that at the final SELECT, the function can see both the tuple outdated by the other transaction AND the new tuple it has inserted. (You can demonstrate that by doing select count(id) instead of select id.) Whichever one happens to be visited first is the one that gets returned by the function, and that's generally the older one in this example. MVCC seems to be operating as designed here, more or less. The outdated tuple is inserted by a known-committed transaction, and deleted by a transaction that's also committed, but one that committed *since the start of the current transaction*. So its effects should not be visible to the SELECT, and therefore the tuple should be visible. The anomalous behavior is not really in the final SELECT, but in the earlier commands that were able to see the effects of a transaction committed later than the start of the second session's transaction. The workaround for Forest is to make the final SELECT be a SELECT FOR UPDATE, so that it's playing by the same rules as the earlier commands. But I wonder whether we ought to rethink the MVCC rules so that that's not necessary. I have no idea how we might change the rules though. If nothing else, we should document this issue better: SELECT and SELECT FOR UPDATE have different visibility rules, so you probably don't want to intermix them. regards, tom lane
At 18:14 27/03/01 -0500, Tom Lane wrote: >Forest Wilkinson <fspam@home.com> writes: >> session1<< create function nextid( varchar(32)) returns int8 as ' >> session1<< select * from idseq where name = $1::text for update; >> session1<< update idseq set id = id + 1 where name = $1::text; >> session1<< select id from idseq where name = $1::text; >> session1<< ' language 'sql'; >> [ doesn't work as expected in parallel transactions ] > >What I find is that at the final >SELECT, the function can see both the tuple outdated by the other >transaction AND the new tuple it has inserted. Surely we should distinguish between real new tuples, and new tuple versions? I don't think it's ever reasonable behaviour to see two versions of the same row. >(You can demonstrate >that by doing select count(id) instead of select id.) Whichever one >happens to be visited first is the one that gets returned by the >function, and that's generally the older one in this example. > >MVCC seems to be operating as designed here, more or less. The outdated >tuple is inserted by a known-committed transaction, and deleted by a >transaction that's also committed, but one that committed *since the >start of the current transaction*. So its effects should not be visible >to the SELECT, and therefore the tuple should be visible. The anomalous >behavior is not really in the final SELECT, but in the earlier commands >that were able to see the effects of a transaction committed later than >the start of the second session's transaction. Looking at the docs, I see that 'SERIALIZABLE' has the same visibility rules as 'READ COMMITTED', which is very confusing. I expect that a Read Committed TX should see committed changes for a TX that commits during the first TX (although this may need to be limited to TXs started before the first TX, but I'm not sure). If this is not the case, then we never get non-repeatable reads, AFAICT: P2 (��Non-repeatable read��): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that rowand performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover thatthe row has been deleted. which is one of the differences between SERIALIZABLE and READ-COMMITTED. >The workaround for Forest is to make the final SELECT be a SELECT FOR >UPDATE, so that it's playing by the same rules as the earlier commands. Eek. Does this seem good to you? I would expect that SELECT and SELECT...FOR UPDATE should return the same result set. >But I wonder whether we ought to rethink the MVCC rules so that that's >not necessary. I have no idea how we might change the rules though. Disallowing visibility of two versions of the same row would help. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: >> The workaround for Forest is to make the final SELECT be a SELECT FOR >> UPDATE, so that it's playing by the same rules as the earlier commands. > Eek. Does this seem good to you? I did call it a workaround ;-) I don't think that we dare try to make any basic changes in MVCC for 7.1 at this late hour, so Forest is going to have to live with that answer for awhile. But I would like to see a cleaner answer in future releases. As I've opined before, the whole EvalPlanQual mechanism strikes me as essentially bogus in any case... regards, tom lane
On Tuesday 27 March 2001 15:14, Tom Lane wrote: > Forest Wilkinson <fspam@home.com> writes: > > session1<< create function nextid( varchar(32)) returns int8 as ' > > session1<< select * from idseq where name = $1::text for update; > > session1<< update idseq set id = id + 1 where name = $1::text; > > session1<< select id from idseq where name = $1::text; > > session1<< ' language 'sql'; > > [ doesn't work as expected in parallel transactions ] [snip] > The workaround for Forest is to make the final SELECT be a SELECT FOR > UPDATE, so that it's playing by the same rules as the earlier commands. > But I wonder whether we ought to rethink the MVCC rules so that that's > not necessary. I have no idea how we might change the rules though. > If nothing else, we should document this issue better: SELECT and SELECT > FOR UPDATE have different visibility rules, so you probably don't want > to intermix them. My, that's ugly. (But thanks for the workaround.) If I remember correctly, UPDATE establishes a lock on the affected rows, which will block another UPDATE on the same rows for the duration of the transaction. If that's true, shouldn't I be able to achieve my desired behavior by removing the initial as follows: create function nextid( varchar(32)) returns int8 as ' update idseq set id = id + 1 where name = $1::text; select id fromidseq where name = $1::text; ' language 'sql'; Or, would I still have to add FOR UPDATE to that final SELECT? Forest
Forest Wilkinson <fspam@home.com> writes: > If I remember correctly, UPDATE establishes a lock on the affected rows, > which will block another UPDATE on the same rows for the duration of the > transaction. If that's true, shouldn't I be able to achieve my desired > behavior by removing the initial as follows: > create function nextid( varchar(32)) returns int8 as ' > update idseq set id = id + 1 where name = $1::text; > select id from idseq where name = $1::text; > ' language 'sql'; > Or, would I still have to add FOR UPDATE to that final SELECT? You're right, the initial SELECT FOR UPDATE is a waste of cycles considering that you're not using the value it returns. But you'll still need the last select to be FOR UPDATE so that it plays by the same rules as the UPDATE does. regards, tom lane
Tom Lane wrote: > > Philip Warner <pjw@rhyme.com.au> writes: > >> The workaround for Forest is to make the final SELECT be a SELECT FOR > >> UPDATE, so that it's playing by the same rules as the earlier commands. > > > Eek. Does this seem good to you? > > I did call it a workaround ;-) > > I don't think that we dare try to make any basic changes in MVCC for 7.1 > at this late hour, so Forest is going to have to live with that answer > for awhile. But I would like to see a cleaner answer in future > releases. Is it the MVCC's restriction that each query inside a function must use the same snapshot ? > As I've opined before, the whole EvalPlanQual mechanism > strikes me as essentially bogus in any case... > How would you change it ? UPDATE/SELECT FOR UPDATE have to SELECT/UPDATE the latest tuples. I don't think of any simple way for 'SELECT FOR UPDATE' to have the same visibility as simple SELECT. regards, Hiroshi Inoue