Thread: FOR KEY LOCK foreign keys
Hello, As previously commented, here's a proposal with patch to turn foreign key checks into something less intrusive. The basic idea, as proposed by Simon Riggs, was discussed in a previous pgsql-hackers thread here: http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com It goes like this: instead of acquiring a shared lock on the involved tuple, we only acquire a "key lock", that is, something that prevents the tuple from going away entirely but not from updating fields that are not covered by any unique index. As discussed, this is still more restrictive than necessary (we could lock only those columns that are involved in the foreign key being checked), but that has all sorts of implementation level problems, so we settled for this, which is still much better than the current state of affairs. I published about this here: http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ So, as a rough design, 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK 2. This will acquire a new type of lock in the tuple, dubbed a "keylock". 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE. 4. It also conflicts with UPDATE if the UPDATE modifies an attribute indexed by a unique index. Here's a patch for this, on which I need to do some more testing and update docs. Some patch details: 1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010. 2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the Xmax to the locker (just like the other lock marks). If the tuple is already key-locked, a MultiXactId needs to be created from the original locker(s) and the new transaction. 3. The original tuple needs to be marked with the Cmax of the locking command, to prevent it from being seen in the same transaction. 4. A non-conflicting update to the tuple must carry forward some fields from the original tuple into the updated copy. Those include Xmax, XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag. 5. We check for the is-indexed condition early in heap_update. This check is independent of the HOT check, which occurs later in the routine. 6. The relcache entry now keeps two lists of indexed attributes; the new one only covers unique indexes. Both lists are built in a single pass over the index list and saved in the relcache entry, so a heap_update call only does this once. The main difference between the two checks is that the one for HOT is done after the tuple has been toasted. This cannot be done for this check, because the toaster runs too late. This means some work is duplicated. We could optimize this further. Something else that might be of interest: the patch as presented here does NOT solve the deadlock problem originally presented by Joel Jacobson. It does solve the second, simpler example I presented in my blog article referenced above, however. I need to have a closer look at that problem to figure out if we could fix the deadlock too. I need to thank Simon Riggs for the original idea, and Robert Haas for some thoughtful discussion on IM that helped me figure out some roadblocks. Of course, without the pgsql-hackers discussion there wouldn't be any patch at all. I also have to apologize to everyone for the lateness in this. Some severe illness brought me down, then the holiday season slowed everything almost to a halt, then a rushed but very much welcome move to a larger house prevented me from dedicating the time I originally intended. All those things are settled now, hopefully. -- Álvaro Herrera
Attachment
On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote: > Something else that might be of interest: the patch as presented here > does NOT solve the deadlock problem originally presented by Joel > Jacobson. It does solve the second, simpler example I presented in my > blog article referenced above, however. I need to have a closer look at > that problem to figure out if we could fix the deadlock too. Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlockexample? Best, David
On Fri, Jan 14, 2011 at 1:00 PM, David E. Wheeler <david@kineticode.com> wrote: > On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote: > >> Something else that might be of interest: the patch as presented here >> does NOT solve the deadlock problem originally presented by Joel >> Jacobson. It does solve the second, simpler example I presented in my >> blog article referenced above, however. I need to have a closer look at >> that problem to figure out if we could fix the deadlock too. > > Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlockexample? Alvaro, are you planning to add this to the CF? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from David E. Wheeler's message of vie ene 14 15:00:48 -0300 2011: > On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote: > > > Something else that might be of interest: the patch as presented here > > does NOT solve the deadlock problem originally presented by Joel > > Jacobson. It does solve the second, simpler example I presented in my > > blog article referenced above, however. I need to have a closer look at > > that problem to figure out if we could fix the deadlock too. > > Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlockexample? Not necessarily -- we can implement that as a later refinement/improvement. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Robert Haas's message of vie ene 14 15:08:27 -0300 2011: > On Fri, Jan 14, 2011 at 1:00 PM, David E. Wheeler <david@kineticode.com> wrote: > > On Jan 13, 2011, at 1:58 PM, Alvaro Herrera wrote: > > > >> Something else that might be of interest: the patch as presented here > >> does NOT solve the deadlock problem originally presented by Joel > >> Jacobson. It does solve the second, simpler example I presented in my > >> blog article referenced above, however. I need to have a closer look at > >> that problem to figure out if we could fix the deadlock too. > > > > Sounds like a big win already. Should this be considered a WIP patch, though, if you still plan to look at Joel's deadlockexample? > > Alvaro, are you planning to add this to the CF? Eh, yes. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, This is a first level of review for the patch. I finally didn't get as much time as I hoped I would, so couldn't get familiar with the locking internals and machinery… as a result, I can't much comment on the code. The patch applies cleanly (patch moves one hunk all by itself) and compiles with no warning. It includes no docs, and I think it will be required to document the user visible SELECT … FOR KEY LOCK OF x new feature. Code wise, very few comments here. It looks like the new code had been there from the beginning by the reading of the patch. I only have one question about a variable naming: ! COPY_SCALAR_FIELD(forUpdate); ! COPY_SCALAR_FIELD(strength); forUpdate used to be a boolean, strength is now one of LCS_FORUPDATE, LCS_FORSHARE or LCS_FORKEYLOCK. I wonder if that's a fortunate naming here, but IANANS (I Am Not A Native Speaker). Alvaro Herrera <alvherre@commandprompt.com> writes: > As previously commented, here's a proposal with patch to turn foreign > key checks into something less intrusive. > > The basic idea, as proposed by Simon Riggs, was discussed in a previous > pgsql-hackers thread here: > http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com This link here provides a test case that will issue a deadlock, and > Something else that might be of interest: the patch as presented here > does NOT solve the deadlock problem originally presented by Joel Indeed, that's the first thing I tried… I'm not sure about why fixing the deadlock issue wouldn't be in this patch scope? The thing that I'm able to confirm by running this test case is that the RI trigger check is done with the new code from the patch: CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."a" x WHERE "aid" OPERATOR(pg_catalog.=) $1 FOR KEY LOCK OF x" Sorry for not posting more tests yet, but seeing how late I am to find the time for the first level review I figured I might as well send that already. I will try some other test cases, but sure enough, that should be part of the user level documentation… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sat, Jan 22, 2011 at 4:25 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Hi, > > This is a first level of review for the patch. I finally didn't get as > much time as I hoped I would, so couldn't get familiar with the locking > internals and machinery… as a result, I can't much comment on the code. > > The patch applies cleanly (patch moves one hunk all by itself) and > compiles with no warning. It includes no docs, and I think it will be > required to document the user visible SELECT … FOR KEY LOCK OF x new > feature. I feel like this should be called "KEY SHARE" rather than "KEY LOCK". It's essentially a weaker version of the SHARE lock we have now, but that's not clear from the name. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jan 13, 2011 at 23:58, Alvaro Herrera <alvherre@commandprompt.com> wrote: > It goes like this: instead of acquiring a shared lock on the involved > tuple, we only acquire a "key lock", that is, something that prevents > the tuple from going away entirely but not from updating fields that are > not covered by any unique index. > > As discussed, this is still more restrictive than necessary (we could > lock only those columns that are involved in the foreign key being > checked), but that has all sorts of implementation level problems, so we > settled for this, which is still much better than the current state of > affairs. Seems to me that you can go a bit further without much trouble, if you only consider indexes that *can* be referenced by foreign keys -- indexes that don't have expressions or predicates. I frequently create unique indexes on (lower(name)) where I want case-insensitive unique indexes, or use predicates like WHERE deleted=false to allow duplicates after deleting the old item. So, instead of: if (indexInfo->ii_Unique) you can write: if (indexInfo->ii_Unique && indexInfo->ii_Expressions == NIL && indexInfo->ii_Predicate == NIL) This would slightly simplify RelationGetIndexAttrBitmap() because you no longer have to worry about including columns that are part of index expressions/predicates. I guess rd_uindexattr should be renamed to something like rd_keyindexattr or rd_keyattr. Is this worthwhile? I can write and submit a patch if it sounds good. Regards, Marti
Hi Alvaro, On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote: > As previously commented, here's a proposal with patch to turn foreign > key checks into something less intrusive. > > The basic idea, as proposed by Simon Riggs, was discussed in a previous > pgsql-hackers thread here: > http://archives.postgresql.org/message-id/AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com > > It goes like this: instead of acquiring a shared lock on the involved > tuple, we only acquire a "key lock", that is, something that prevents > the tuple from going away entirely but not from updating fields that are > not covered by any unique index. First off, this is highly-valuable work. My experience echoes that of some other commenter (I *think* it was Josh Berkus, but I can't find the original reference now): this is the #1 cause of production deadlocks. To boot, the patch is small and fits cleanly into the current code. The patch had a trivial conflict in planner.c, plus plenty of offsets. I've attached the rebased patch that I used for review. For anyone following along, all the interesting hunks touch heapam.c; the rest is largely mechanical. A "diff -w" patch is also considerably easier to follow. Incidentally, HeapTupleSatisfiesMVCC has some bits of code like this (not new): /* MultiXacts are currently only allowed to lock tuples */ Assert(tuple->t_infomask & HEAP_IS_LOCKED); They're specifically only allowed for SHARE and KEY locks, right? heap_lock_tuple seems to assume as much. Having read [1], I tried to work out what kind of table-level lock we must hold before proceeding with a DDL operation that changes the set of "key" columns. The thing we must prevent is an UPDATE making a concurrent decision about its need to conflict with a FOR KEY LOCK lock. Therefore, it's sufficient for the DDL to take ShareLock. CREATE INDEX does just this, so we're good. [1] http://archives.postgresql.org/message-id/22196.1282757644@sss.pgh.pa.us I observe visibility breakage with this test case: -- Setup BEGIN; DROP TABLE IF EXISTS child, parent; CREATE TABLE parent ( parent_key int PRIMARY KEY, aux text NOT NULL ); CREATE TABLE child ( child_key int PRIMARY KEY, parent_key int NOT NULL REFERENCES parent ); INSERT INTO parent VALUES (1, 'foo'); COMMIT; TABLE parent; -- set hint bit SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0)); to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits| t_oid --------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+------- 902 | 1 | 8160 | 1 | 32 | 1125 | 0 | 33 | (0,1) | 2 | 2306 | 24 | NULL | NULL -- Interleaved part P0: BEGIN; INSERT INTO child VALUES (1, 1); P1: BEGIN; SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0)); to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits| t_oid --------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+------- 112 | 1 | 8160 | 1 | 32 | 1125 | 1126 | 33 | (0,1) | 2 | 274 | 24 | NULL | NULL UPDATE parent SET aux = 'baz'; -- UPDATE 1 TABLE parent; -- 0 rows SELECT to_hex(t_infomask::int), * FROM heap_page_items(get_raw_page('parent', 0)); to_hex | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits| t_oid --------+----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+------- 102 | 1 | 8160 | 1 | 32 | 1125 | 1128 | 0 | (0,2) | 16386 | 258 | 24 | NULL | NULL 2012 | 2 | 8128 | 1 | 32 | 1128 | 1126 | 2249 | (0,2) | -32766 | 8210 | 24 | NULL | NULL The problem seems to be that funny t_cid (2249). Tracing through heap_update, the new code is not setting t_cid during this test case. My own deadlock test case, which is fixed by the patch, uses the same setup. Its interleaved part is as follows: P0: INSERT INTO child VALUES (1, 1); P1: INSERT INTO child VALUES (2, 1); P0: UPDATE parent SET aux = 'bar'; P1: UPDATE parent SET aux = 'baz'; > As discussed, this is still more restrictive than necessary (we could > lock only those columns that are involved in the foreign key being > checked), but that has all sorts of implementation level problems, so we > settled for this, which is still much better than the current state of > affairs. Agreed. What about locking only the columns that are actually used in any incoming foreign key (not just the FK in question at the time)? We'd just have more work to do on a cold relcache, a pg_depend scan per unique index. Usually, each of my tables has no more than one candidate key referenced by FOREIGN KEY constraints: the explicit or notional primary key. I regularly add UNIQUE indexes not used by any foreign key, though. YMMV. Given this optimization, constraining the lock even further by individual FOREIGN KEY constraint would be utterly unimportant for my databases. > I published about this here: > http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ > > So, as a rough design, > > 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK > 2. This will acquire a new type of lock in the tuple, dubbed a "keylock". > 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE. It does not conflict with SELECT FOR SHARE, does it? > 4. It also conflicts with UPDATE if the UPDATE modifies an attribute > indexed by a unique index. This is the per-tuple lock conflict table before your change: FOR SHARE conflicts with FOR UPDATE FOR UPDATE conflicts with FOR UPDATE and FOR SHARE After: FOR KEY LOCK conflicts with FOR UPDATE FOR SHARE conflicts with FOR UPDATE FOR UPDATE conflicts with FOR UPDATE, FOR SHARE, (FOR KEY LOCK if cols <@ keycols) The odd thing here is the checking of an outside condition to decide whether locks conflict. Normally, to get a different conflict list, we add another lock type. What about this? FOR KEY SHARE conflicts with FOR KEY UPDATE FOR SHARE conflicts with FOR KEY UPDATE, FOR UPDATE FOR UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE FOR KEY UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE, FOR KEY SHARE This would also fix Joel's test case. A disadvantage is that we'd check for changes in FK-referenced columns-change even when there's no key lock activity. That seems acceptable, but it's a point for debate. Either way, SELECT ... FOR UPDATE will probably end up different than a true update. The full behavior relies on having an old tuple to bear the UPDATE lock and a new tuple to bear the KEY lock. In the current patch, SELECT ... FOR UPDATE blocks on KEY just like SHARE. So there will be that wart in the conflict lists, no matter what. > Here's a patch for this, on which I need to do some more testing and > update docs. > > Some patch details: > > 1. We use a new bit in t_infomask for HEAP_XMAX_KEY_LOCK, 0x0010. > 2. Key-locking a tuple means setting the XMAX_KEY_LOCK bit, and setting the > Xmax to the locker (just like the other lock marks). If the tuple is > already key-locked, a MultiXactId needs to be created from the > original locker(s) and the new transaction. Makes sense. > 3. The original tuple needs to be marked with the Cmax of the locking > command, to prevent it from being seen in the same transaction. Could you elaborate on this requirement? > 4. A non-conflicting update to the tuple must carry forward some fields > from the original tuple into the updated copy. Those include Xmax, > XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag. HeapTupleHeaderGetCmax() has this assertion: /* We do not store cmax when locking a tuple */ Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED))); Assuming that assertion is still valid, there will never be a HEAP_COMBOCID flag to copy. Right? > 5. We check for the is-indexed condition early in heap_update. This > check is independent of the HOT check, which occurs later in the > routine. > 6. The relcache entry now keeps two lists of indexed attributes; the new > one only covers unique indexes. Both lists are built in a single > pass over the index list and saved in the relcache entry, so a > heap_update call only does this once. The main difference between > the two checks is that the one for HOT is done after the tuple has > been toasted. This cannot be done for this check, because the > toaster runs too late. This means some work is duplicated. We > could optimize this further. Seems reasonable. > Something else that might be of interest: the patch as presented here > does NOT solve the deadlock problem originally presented by Joel > Jacobson. It does solve the second, simpler example I presented in my > blog article referenced above, however. I need to have a closer look at > that problem to figure out if we could fix the deadlock too. One thing that helped me to think through Joel's test case is that the two middle statements take tuple-level locks, but that's inessential. Granted, FOR UPDATE tuple locks are by far the most common kind of blocking in production. Here's another formulation that also still gets a deadlock: P1: BEGIN; P2: BEGIN; P1: UPDATE A SET Col1 = 1 WHERE AID = 1; -- FOR UPDATE tuple lock P2: LOCK TABLE pg_am IN ROW SHARE MODE P1: LOCK TABLE pg_am IN ROW SHARE MODE -- blocks P2: UPDATE B SET Col2 = 1 WHERE BID = 2; -- blocks for KEY => deadlock As best I can tell, the explanation is that this patch only improves things when the FOR KEY LOCK precedes the FOR UPDATE. Splitting out FOR KEY UPDATE fixes that. It would also optimize this complement to your own blog post example, which still blocks needlessly: -- Session 1 CREATE TABLE foo (a int PRIMARY KEY, b text); CREATE TABLE bar (a int NOT NULL REFERENCES foo); INSERT INTO foo VALUES (42); BEGIN; UPDATE foo SET b = 'Hello World' ; -- Session 2 INSERT INTO bar VALUES (42); Automated tests would go a long way toward building confidence that this patch does the right thing. Thanks to the SSI patch, we now have an in-tree test framework for testing interleaved transactions. The only thing it needs to be suitable for this work is a way to handle blocked commands. If you like, I can try to whip something up for that. Hunk-specific comments (based on diff -w version of patch): > *** a/src/backend/access/heap/heapam.c > --- b/src/backend/access/heap/heapam.c > *************** > *** 2484,2489 **** l2: > --- 2487,2508 ---- > xwait = HeapTupleHeaderGetXmax(oldtup.t_data); > infomask = oldtup.t_data->t_infomask; > > + /* > + * if it's only key-locked and we're not updating an indexed column, > + * we can act though MayBeUpdated was returned, but the resulting tuple > + * needs a bunch of fields copied from the original. > + */ > + if ((infomask & HEAP_XMAX_KEY_LOCK) && > + !(infomask & HEAP_XMAX_SHARED_LOCK) && > + HeapSatisfiesHOTUpdate(relation, keylck_attrs, > + &oldtup, newtup)) > + { > + result = HeapTupleMayBeUpdated; > + keylocked_update = true; > + } The condition for getting here is "result == HeapTupleBeingUpdated && wait". If !wait, we'd never get the chance to see if this would avoid the wait. Currently all callers pass wait = true, so this is academic. > + > + if (!keylocked_update) > + { > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > > /* > *************** > *** 2563,2568 **** l2: > --- 2582,2588 ---- > else > result = HeapTupleUpdated; > } > + } > > if (crosscheck != InvalidSnapshot && result == HeapTupleMayBeUpdated) > { > *************** > *** 2609,2621 **** l2: > > newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK); > newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK); > ! newtup->t_data->t_infomask |= (HEAP_XMAX_INVALID | HEAP_UPDATED); > HeapTupleHeaderSetXmin(newtup->t_data, xid); > - HeapTupleHeaderSetCmin(newtup->t_data, cid); > - HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */ > newtup->t_tableOid = RelationGetRelid(relation); > > /* > * Replace cid with a combo cid if necessary. Note that we already put > * the plain cid into the new tuple. > */ > --- 2629,2671 ---- > > newtup->t_data->t_infomask &= ~(HEAP_XACT_MASK); > newtup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK); > ! newtup->t_data->t_infomask |= HEAP_UPDATED; > HeapTupleHeaderSetXmin(newtup->t_data, xid); > newtup->t_tableOid = RelationGetRelid(relation); > > /* > + * If this update is touching a tuple that was key-locked, we need to > + * carry forward some bits from the old tuple into the new copy. > + */ > + if (keylocked_update) > + { > + HeapTupleHeaderSetXmax(newtup->t_data, > + HeapTupleHeaderGetXmax(oldtup.t_data)); > + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask & > + (HEAP_XMAX_IS_MULTI | > + HEAP_XMAX_KEY_LOCK)); > + /* > + * we also need to copy the combo CID stuff, but only if the original > + * tuple was created by us; otherwise the combocid module complains > + * (Alternatively we could use HeapTupleHeaderGetRawCommandId) > + */ This comment should describe why it's correct, not just indicate that another module complains if we do otherwise. > + if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(oldtup.t_data))) > + { > + newtup->t_data->t_infomask |= (oldtup.t_data->t_infomask & > + HEAP_COMBOCID); HeapTupleHeaderSetCmin unsets HEAP_COMBOCID, so this is a no-op. > + HeapTupleHeaderSetCmin(newtup->t_data, > + HeapTupleHeaderGetCmin(oldtup.t_data)); > + } > + > + } > + else > + { > + newtup->t_data->t_infomask |= HEAP_XMAX_INVALID; > + HeapTupleHeaderSetXmax(newtup->t_data, 0); /* for cleanliness */ > + HeapTupleHeaderSetCmin(newtup->t_data, cid); > + } As mentioned above, this code can fail to set Cmin entirely. > + > + /* > * Replace cid with a combo cid if necessary. Note that we already put > * the plain cid into the new tuple. > */ > *************** > *** 3142,3148 **** heap_lock_tuple(Relation relation, HeapTuple tuple, Buffer *buffer, > LOCKMODE tuple_lock_type; > bool have_tuple_lock = false; > > ! tuple_lock_type = (mode == LockTupleShared) ? ShareLock : ExclusiveLock; > > *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > --- 3192,3211 ---- > LOCKMODE tuple_lock_type; > bool have_tuple_lock = false; > > ! /* in FOR KEY LOCK mode, we use a share lock temporarily */ I found this comment confusing. The first several times I read it, I thought it meant that we start out by setting HEAP_XMAX_SHARED_LOCK in the tuple, then downgrade it. However, this is talking about the ephemeral heavyweight lock. Maybe it's just me, but consider deleting this comment. > ! switch (mode) > ! { > ! case LockTupleShared: > ! case LockTupleKeylock: > ! tuple_lock_type = ShareLock; > ! break; > ! case LockTupleExclusive: > ! tuple_lock_type = ExclusiveLock; > ! break; > ! default: > ! elog(ERROR, "invalid tuple lock mode"); > ! tuple_lock_type = 0; /* keep compiler quiet */ > ! } > > *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid)); > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > *************** > *** 3175,3192 **** l3: > LockBuffer(*buffer, BUFFER_LOCK_UNLOCK); > > /* > ! * If we wish to acquire share lock, and the tuple is already > ! * share-locked by a multixact that includes any subtransaction of the > * current top transaction, then we effectively hold the desired lock > * already. We *must* succeed without trying to take the tuple lock, > * else we will deadlock against anyone waiting to acquire exclusive > * lock. We don't need to make any state changes in this case. > */ > ! if (mode == LockTupleShared && > (infomask & HEAP_XMAX_IS_MULTI) && > MultiXactIdIsCurrent((MultiXactId) xwait)) > { > ! Assert(infomask & HEAP_XMAX_SHARED_LOCK); > /* Probably can't hold tuple lock here, but may as well check */ > if (have_tuple_lock) > UnlockTuple(relation, tid, tuple_lock_type); > --- 3238,3255 ---- > LockBuffer(*buffer, BUFFER_LOCK_UNLOCK); > > /* > ! * If we wish to acquire a key or share lock, and the tuple is already > ! * share- or key-locked by a multixact that includes any subtransaction of the > * current top transaction, then we effectively hold the desired lock > * already. We *must* succeed without trying to take the tuple lock, > * else we will deadlock against anyone waiting to acquire exclusive > * lock. We don't need to make any state changes in this case. > */ > ! if ((mode == LockTupleShared || mode == LockTupleKeylock) && > (infomask & HEAP_XMAX_IS_MULTI) && > MultiXactIdIsCurrent((MultiXactId) xwait)) > { > ! Assert(infomask & HEAP_IS_SHARE_LOCKED); > /* Probably can't hold tuple lock here, but may as well check */ > if (have_tuple_lock) > UnlockTuple(relation, tid, tuple_lock_type); If we're upgrading from KEY LOCK to a SHARE, we can't take this shortcut. At a minimum, we need to update t_infomask. Then there's a choice: do we queue up normally and risk deadlock, or do we skip the heavyweight lock queue and risk starvation? Your last blog post suggests a preference for the latter. I haven't formed a strong preference, but given this behavior, ... P0: FOR SHARE -- acquired P1: UPDATE -- blocks P2: FOR SHARE -- blocks ... I'm not sure why making the first lock FOR KEY LOCK ought to change things. Some documentation may be in order about the deadlock hazards of mixing FOR SHARE locks with foreign key usage. > *************** > *** 3217,3226 **** l3: > have_tuple_lock = true; > } > > ! if (mode == LockTupleShared && (infomask & HEAP_XMAX_SHARED_LOCK)) > { > /* > ! * Acquiring sharelock when there's at least one sharelocker > * already. We need not wait for him/them to complete. > */ > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > --- 3280,3290 ---- > have_tuple_lock = true; > } > > ! if ((mode == LockTupleShared || mode == LockTupleKeylock) && > ! (infomask & HEAP_IS_SHARE_LOCKED)) > { > /* > ! * Acquiring sharelock or keylock when there's at least one such locker > * already. We need not wait for him/them to complete. > */ > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); Likewise: we cannot implicitly upgrade someone else's KEY LOCK to SHARE. > *************** > *** 3476,3482 **** l3: > xlrec.target.tid = tuple->t_self; > xlrec.locking_xid = xid; > xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0); > ! xlrec.shared_lock = (mode == LockTupleShared); > rdata[0].data = (char *) &xlrec; > rdata[0].len = SizeOfHeapLock; > rdata[0].buffer = InvalidBuffer; > --- 3543,3549 ---- > xlrec.target.tid = tuple->t_self; > xlrec.locking_xid = xid; > xlrec.xid_is_mxact = ((new_infomask & HEAP_XMAX_IS_MULTI) != 0); > ! xlrec.lock_strength = mode == LockTupleShared ? 's' : mode == LockTupleKeylock ? 'k' : 'x'; Seems strange having these character literals. Why not just cast the mode to a char? Could even set the enum values to the ASCII values of those characters, if you were so inclined. Happily, they fall in the right order. > rdata[0].data = (char *) &xlrec; > rdata[0].len = SizeOfHeapLock; > rdata[0].buffer = InvalidBuffer; > *** a/src/backend/executor/execMain.c > --- b/src/backend/executor/execMain.c > *************** > *** 112,119 **** lnext: > /* okay, try to lock the tuple */ > if (erm->markType == ROW_MARK_EXCLUSIVE) > lockmode = LockTupleExclusive; > ! else > lockmode = LockTupleShared; > > test = heap_lock_tuple(erm->relation, &tuple, &buffer, > &update_ctid, &update_xmax, > --- 112,126 ---- > /* okay, try to lock the tuple */ > if (erm->markType == ROW_MARK_EXCLUSIVE) > lockmode = LockTupleExclusive; > ! else if (erm->markType == ROW_MARK_SHARE) > lockmode = LockTupleShared; > + else if (erm->markType == ROW_MARK_KEYLOCK) > + lockmode = LockTupleKeylock; > + else > + { > + elog(ERROR, "unsupported rowmark type"); > + lockmode = LockTupleExclusive; /* keep compiler quiet */ > + } A switch statement would be more consistent with what you've done elsewhere. > *** a/src/backend/nodes/outfuncs.c > --- b/src/backend/nodes/outfuncs.c > *************** > *** 2181,2187 **** _outRowMarkClause(StringInfo str, RowMarkClause *node) > WRITE_NODE_TYPE("ROWMARKCLAUSE"); > > WRITE_UINT_FIELD(rti); > ! WRITE_BOOL_FIELD(forUpdate); > WRITE_BOOL_FIELD(noWait); > WRITE_BOOL_FIELD(pushedDown); > } > --- 2181,2187 ---- > WRITE_NODE_TYPE("ROWMARKCLAUSE"); > > WRITE_UINT_FIELD(rti); > ! WRITE_BOOL_FIELD(strength); WRITE_ENUM_FIELD? > WRITE_BOOL_FIELD(noWait); > WRITE_BOOL_FIELD(pushedDown); > } > *** a/src/backend/nodes/readfuncs.c > --- b/src/backend/nodes/readfuncs.c > *************** > *** 299,305 **** _readRowMarkClause(void) > READ_LOCALS(RowMarkClause); > > READ_UINT_FIELD(rti); > ! READ_BOOL_FIELD(forUpdate); > READ_BOOL_FIELD(noWait); > READ_BOOL_FIELD(pushedDown); > > --- 299,305 ---- > READ_LOCALS(RowMarkClause); > > READ_UINT_FIELD(rti); > ! READ_BOOL_FIELD(strength); READ_ENUM_FIELD? > *** a/src/backend/optimizer/plan/planner.c > --- b/src/backend/optimizer/plan/planner.c > *************** > *** 1887,1896 **** preprocess_rowmarks(PlannerInfo *root) > newrc = makeNode(PlanRowMark); > newrc->rti = newrc->prti = rc->rti; > newrc->rowmarkId = ++(root->glob->lastRowMarkId); > ! if (rc->forUpdate) > newrc->markType = ROW_MARK_EXCLUSIVE; > ! else > newrc->markType = ROW_MARK_SHARE; > newrc->noWait = rc->noWait; > newrc->isParent = false; > > --- 1887,1904 ---- > newrc = makeNode(PlanRowMark); > newrc->rti = newrc->prti = rc->rti; > newrc->rowmarkId = ++(root->glob->lastRowMarkId); > ! switch (rc->strength) > ! { > ! case LCS_FORUPDATE: > newrc->markType = ROW_MARK_EXCLUSIVE; > ! break; > ! case LCS_FORSHARE: > newrc->markType = ROW_MARK_SHARE; > + break; > + case LCS_FORKEYLOCK: > + newrc->markType = ROW_MARK_KEYLOCK; > + break; > + } This needs a "default" clause throwing an error. (Seems like the default could be in #ifdef USE_ASSERT_CHECKING, but we don't seem to ever do that.) > *** a/src/backend/tcop/utility.c > --- b/src/backend/tcop/utility.c > *************** > *** 2205,2214 **** CreateCommandTag(Node *parsetree) > else if (stmt->rowMarks != NIL) > { > /* not 100% but probably close enough */ > ! if (((RowMarkClause *) linitial(stmt->rowMarks))->forUpdate) > tag = "SELECT FOR UPDATE"; > ! else > tag = "SELECT FOR SHARE"; > } > else > tag = "SELECT"; > --- 2205,2225 ---- > else if (stmt->rowMarks != NIL) > { > /* not 100% but probably close enough */ > ! switch (((RowMarkClause *) linitial(stmt->rowMarks))->strength) > ! { > ! case LCS_FORUPDATE: > tag = "SELECT FOR UPDATE"; > ! break; > ! case LCS_FORSHARE: > tag = "SELECT FOR SHARE"; > + break; > + case LCS_FORKEYLOCK: > + tag = "SELECT FOR KEY LOCK"; > + break; > + default: > + tag = "???"; > + break; elog(ERROR) in the default clause, perhaps? See earlier comment. > *** a/src/backend/utils/adt/ruleutils.c > --- b/src/backend/utils/adt/ruleutils.c > *************** > *** 2837,2848 **** get_select_query_def(Query *query, deparse_context *context, > if (rc->pushedDown) > continue; > > ! if (rc->forUpdate) > ! appendContextKeyword(context, " FOR UPDATE", > -PRETTYINDENT_STD, PRETTYINDENT_STD, 0); > ! else > appendContextKeyword(context, " FOR SHARE", > -PRETTYINDENT_STD, PRETTYINDENT_STD, 0); > appendStringInfo(buf, " OF %s", > quote_identifier(rte->eref->aliasname)); > if (rc->noWait) > --- 2837,2858 ---- > if (rc->pushedDown) > continue; > > ! switch (rc->strength) > ! { > ! case LCS_FORKEYLOCK: > ! appendContextKeyword(context, " FOR KEY LOCK", > -PRETTYINDENT_STD, PRETTYINDENT_STD, 0); > ! break; > ! case LCS_FORSHARE: > appendContextKeyword(context, " FOR SHARE", > -PRETTYINDENT_STD, PRETTYINDENT_STD, 0); > + break; > + case LCS_FORUPDATE: > + appendContextKeyword(context, " FOR UPDATE", > + -PRETTYINDENT_STD, PRETTYINDENT_STD, 0); > + break; > + } Another switch statement; see earlier comment. > *** a/src/backend/utils/cache/relcache.c > --- b/src/backend/utils/cache/relcache.c > *************** > *** 3661,3675 **** RelationGetIndexAttrBitmap(Relation relation) > --- 3665,3688 ---- > int attrnum = indexInfo->ii_KeyAttrNumbers[i]; > > if (attrnum != 0) > + { > indexattrs = bms_add_member(indexattrs, > attrnum - FirstLowInvalidHeapAttributeNumber); > + if (indexInfo->ii_Unique) > + uindexattrs = bms_add_member(uindexattrs, > + attrnum - FirstLowInvalidHeapAttributeNumber); > + } > } > > /* Collect all attributes used in expressions, too */ > pull_varattnos((Node *) indexInfo->ii_Expressions, &indexattrs); > + if (indexInfo->ii_Unique) > + pull_varattnos((Node *) indexInfo->ii_Expressions, &uindexattrs); No need; as Marti mentioned, such indexes are not usable for FOREIGN KEY. > > /* Collect all attributes in the index predicate, too */ > pull_varattnos((Node *) indexInfo->ii_Predicate, &indexattrs); > + if (indexInfo->ii_Unique) > + pull_varattnos((Node *) indexInfo->ii_Predicate, &uindexattrs); Likewise. > *** a/src/include/access/htup.h > --- b/src/include/access/htup.h > *************** > *** 163,174 **** typedef HeapTupleHeaderData *HeapTupleHeader; > #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */ > #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */ > #define HEAP_HASOID 0x0008 /* has an object-id field */ > ! /* bit 0x0010 is available */ > #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */ > #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */ > #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */ > /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */ > ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK) > #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */ > #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */ > #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */ > --- 163,177 ---- > #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */ > #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */ > #define HEAP_HASOID 0x0008 /* has an object-id field */ > ! #define HEAP_XMAX_KEY_LOCK 0x0010 /* xmax is a "key" locker */ > #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */ > #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */ > #define HEAP_XMAX_SHARED_LOCK 0x0080 /* xmax is shared locker */ > + /* if either SHARE or KEY lock bit is set, this is a "shared" lock */ > + #define HEAP_IS_SHARE_LOCKED (HEAP_XMAX_SHARED_LOCK | HEAP_XMAX_KEY_LOCK) > /* if either LOCK bit is set, xmax hasn't deleted the tuple, only locked it */ "either" should now be "any". > ! #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_SHARED_LOCK | \ > ! HEAP_XMAX_KEY_LOCK) > #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */ > #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */ > #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */ > *** a/src/include/nodes/parsenodes.h > --- b/src/include/nodes/parsenodes.h > *************** > *** 554,571 **** typedef struct DefElem > } DefElem; > > /* > ! * LockingClause - raw representation of FOR UPDATE/SHARE options > * > * Note: lockedRels == NIL means "all relations in query". Otherwise it > * is a list of RangeVar nodes. (We use RangeVar mainly because it carries > * a location field --- currently, parse analysis insists on unqualified > * names in LockingClause.) > */ > typedef struct LockingClause > { > NodeTag type; > List *lockedRels; /* FOR UPDATE or FOR SHARE relations */ > ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */ > bool noWait; /* NOWAIT option */ > } LockingClause; > > --- 554,579 ---- > } DefElem; > > /* > ! * LockingClause - raw representation of FOR UPDATE/SHARE/KEY LOCK options > * > * Note: lockedRels == NIL means "all relations in query". Otherwise it > * is a list of RangeVar nodes. (We use RangeVar mainly because it carries > * a location field --- currently, parse analysis insists on unqualified > * names in LockingClause.) > */ > + typedef enum LockClauseStrength > + { > + /* order is important -- see applyLockingClause */ > + LCS_FORKEYLOCK, > + LCS_FORSHARE, > + LCS_FORUPDATE > + } LockClauseStrength; > + It's sure odd having this enum precisely mirror LockTupleMode. Is there precedent for this? They are at opposite ends of processing stack, I suppose. > typedef struct LockingClause > { > NodeTag type; > List *lockedRels; /* FOR UPDATE or FOR SHARE relations */ > ! LockClauseStrength strength; > bool noWait; /* NOWAIT option */ > } LockingClause; > > *************** > *** 839,856 **** typedef struct WindowClause > * parser output representation of FOR UPDATE/SHARE clauses > * > * Query.rowMarks contains a separate RowMarkClause node for each relation > ! * identified as a FOR UPDATE/SHARE target. If FOR UPDATE/SHARE is applied > ! * to a subquery, we generate RowMarkClauses for all normal and subquery rels > ! * in the subquery, but they are marked pushedDown = true to distinguish them > ! * from clauses that were explicitly written at this query level. Also, > ! * Query.hasForUpdate tells whether there were explicit FOR UPDATE/SHARE > ! * clauses in the current query level. > */ > typedef struct RowMarkClause > { > NodeTag type; > Index rti; /* range table index of target relation */ > ! bool forUpdate; /* true = FOR UPDATE, false = FOR SHARE */ > bool noWait; /* NOWAIT option */ > bool pushedDown; /* pushed down from higher query level? */ > } RowMarkClause; > --- 847,864 ---- > * parser output representation of FOR UPDATE/SHARE clauses > * > * Query.rowMarks contains a separate RowMarkClause node for each relation > ! * identified as a FOR UPDATE/SHARE/KEY LOCK target. If one of these clauses > ! * is applied to a subquery, we generate RowMarkClauses for all normal and > ! * subquery rels in the subquery, but they are marked pushedDown = true to > ! * distinguish them from clauses that were explicitly written at this query > ! * level. Also, Query.hasForUpdate tells whether there were explicit FOR > ! * UPDATE/SHARE clauses in the current query level. Need a "/KEY LOCK" in the last sentence. > */ > typedef struct RowMarkClause > { > NodeTag type; > Index rti; /* range table index of target relation */ > ! LockClauseStrength strength; > bool noWait; /* NOWAIT option */ > bool pushedDown; /* pushed down from higher query level? */ > } RowMarkClause; I'd like to do some more testing around HOT and TOAST, plus run performance tests. Figured I should get this much fired off, though. Thanks, nm
Attachment
Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011: Hello, First, thanks for the very thorough review. > On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote: > Incidentally, HeapTupleSatisfiesMVCC has some bits of code like this (not new): > > /* MultiXacts are currently only allowed to lock tuples */ > Assert(tuple->t_infomask & HEAP_IS_LOCKED); > > They're specifically only allowed for SHARE and KEY locks, right? > heap_lock_tuple seems to assume as much. Yeah, since FOR UPDATE acquires an exclusive lock on the tuple, you can't have a multixact there. Maybe we can make the assert more specific; I'll have a look. > [ test case with funny visibility behavior ] Looking into the visibility bug. > > I published about this here: > > http://commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ > > > > So, as a rough design, > > > > 1. Create a new SELECT locking clause. For now, we're calling it SELECT FOR KEY LOCK > > 2. This will acquire a new type of lock in the tuple, dubbed a "keylock". > > 3. This lock will conflict with DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE. > > It does not conflict with SELECT FOR SHARE, does it? It doesn't; I think I copied old text there. (I had originally thought that they would conflict, but I had to change that due to implementation restrictions). > The odd thing here is the checking of an outside condition to decide whether > locks conflict. Normally, to get a different conflict list, we add another lock > type. What about this? > > FOR KEY SHARE conflicts with FOR KEY UPDATE > FOR SHARE conflicts with FOR KEY UPDATE, FOR UPDATE > FOR UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE > FOR KEY UPDATE conflicts with FOR KEY UPDATE, FOR UPDATE, FOR SHARE, FOR KEY SHARE Hmm, let me see about this. > > 3. The original tuple needs to be marked with the Cmax of the locking > > command, to prevent it from being seen in the same transaction. > > Could you elaborate on this requirement? Consider an open cursor with a snapshot prior to the lock. If we leave the old tuple as is, the cursor would see that old tuple as visible. But the locked copy of the tuple is also visible, because the Cmax is just a locker, not an updater. > > 4. A non-conflicting update to the tuple must carry forward some fields > > from the original tuple into the updated copy. Those include Xmax, > > XMAX_IS_MULTI, XMAX_KEY_LOCK, and the CommandId and COMBO_CID flag. > > HeapTupleHeaderGetCmax() has this assertion: > > /* We do not store cmax when locking a tuple */ > Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED))); > > Assuming that assertion is still valid, there will never be a HEAP_COMBOCID flag > to copy. Right? Hmm, I think the assert is wrong, but I'm still paging in the details of the patch after being away from it for so long. Let me think more about it. > [ Lots more stuff ] I'll give careful consideration to all this. Thanks again for the detailed review. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Feb 11, 2011 at 02:15:20PM -0300, Alvaro Herrera wrote: > Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011: > > On Thu, Jan 13, 2011 at 06:58:09PM -0300, Alvaro Herrera wrote: > > > 3. The original tuple needs to be marked with the Cmax of the locking > > > command, to prevent it from being seen in the same transaction. > > > > Could you elaborate on this requirement? > > Consider an open cursor with a snapshot prior to the lock. If we leave > the old tuple as is, the cursor would see that old tuple as visible. > But the locked copy of the tuple is also visible, because the Cmax is > just a locker, not an updater. Thanks. Today, a lock operation leaves t_cid unchanged, and an update fills its own cid into Cmax of the old tuple and Cmin of the new tuple. So, the cursor would only see the old tuple. What will make that no longer sufficient?
Excerpts from Noah Misch's message of vie feb 11 04:13:22 -0300 2011: > I observe visibility breakage with this test case: > > [ ... ] > > The problem seems to be that funny t_cid (2249). Tracing through heap_update, > the new code is not setting t_cid during this test case. So I can fix this problem by simply adding a call to HeapTupleHeaderSetCmin when the stuff about ComboCid does not hold, but seeing that screenful plus the subsequent call to HeapTupleHeaderAdjustCmax feels wrong. I think this needs to be rethought ... -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah@leadboat.com> wrote: > The patch had a trivial conflict in planner.c, plus plenty of offsets. I've > attached the rebased patch that I used for review. For anyone following along, > all the interesting hunks touch heapam.c; the rest is largely mechanical. A > "diff -w" patch is also considerably easier to follow. Here's a simple patch for the RelationGetIndexAttrBitmap() function, as explained in my last post. I don't know if it's any help to you, but since I wrote it I might as well send it up. This applies on top of Noah's rebased patch. I did some tests and it seems to work, although I also hit the same visibility bug as Noah. Test case I used: THREAD A: create table foo (pk int primary key, ak int); create unique index on foo (ak) where ak != 0; create unique index on foo ((-ak)); create table bar (foo_pk int references foo (pk)); insert into foo values(1,1); begin; insert into bar values(1); THREAD B: begin; update foo set ak=2 where ak=1; Regards, Marti
Attachment
Excerpts from Marti Raudsepp's message of lun feb 14 19:39:25 -0300 2011: > On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah@leadboat.com> wrote: > > The patch had a trivial conflict in planner.c, plus plenty of offsets. I've > > attached the rebased patch that I used for review. For anyone following along, > > all the interesting hunks touch heapam.c; the rest is largely mechanical. A > > "diff -w" patch is also considerably easier to follow. > > Here's a simple patch for the RelationGetIndexAttrBitmap() function, > as explained in my last post. I don't know if it's any help to you, > but since I wrote it I might as well send it up. This applies on top > of Noah's rebased patch. Got it, thanks. > I did some tests and it seems to work, although I also hit the same > visibility bug as Noah. Yeah, that bug is fixed with the attached, though I am rethinking this bit. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
On Mon, Feb 14, 2011 at 6:49 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Marti Raudsepp's message of lun feb 14 19:39:25 -0300 2011: >> On Fri, Feb 11, 2011 at 09:13, Noah Misch <noah@leadboat.com> wrote: >> > The patch had a trivial conflict in planner.c, plus plenty of offsets. I've >> > attached the rebased patch that I used for review. For anyone following along, >> > all the interesting hunks touch heapam.c; the rest is largely mechanical. A >> > "diff -w" patch is also considerably easier to follow. >> >> Here's a simple patch for the RelationGetIndexAttrBitmap() function, >> as explained in my last post. I don't know if it's any help to you, >> but since I wrote it I might as well send it up. This applies on top >> of Noah's rebased patch. > > Got it, thanks. > >> I did some tests and it seems to work, although I also hit the same >> visibility bug as Noah. > > Yeah, that bug is fixed with the attached, though I am rethinking this > bit. I am thinking that the statute of limitations has expired on this patch, and that we should mark it Returned with Feedback and continue working on it for 9.2. I know it's a valuable feature, but I think we're out of time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Feb 15, 2011, at 1:15 PM, Robert Haas wrote: >> Yeah, that bug is fixed with the attached, though I am rethinking this >> bit. > > I am thinking that the statute of limitations has expired on this > patch, and that we should mark it Returned with Feedback and continue > working on it for 9.2. I know it's a valuable feature, but I think > we're out of time. How is such a determination made, exactly? Best, David
Excerpts from Robert Haas's message of mar feb 15 18:15:38 -0300 2011: > I am thinking that the statute of limitations has expired on this > patch, and that we should mark it Returned with Feedback and continue > working on it for 9.2. I know it's a valuable feature, but I think > we're out of time. Okay, I've marked it as such in the commitfest app. It'll be in 9.2's first commitfest. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> How is such a determination made, exactly? It's Feb 15th, and portions of the patch need a rework according to the author. I'm with Robert on this one. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote: > Automated tests would go a long way toward building confidence that this patch > does the right thing. Thanks to the SSI patch, we now have an in-tree test > framework for testing interleaved transactions. The only thing it needs to be > suitable for this work is a way to handle blocked commands. If you like, I can > try to whip something up for that. [off-list ACK followed] Here's a patch implementing that. It applies to master, with or without your KEY LOCK patch also applied, though the expected outputs reflect the improvements from your patch. I add three isolation test specs: fk-contention: blocking-only test case from your blog post fk-deadlock: the deadlocking test case I used during patch review fk-deadlock2: Joel Jacobson's deadlocking test case When a spec permutation would have us run a command in a currently-blocked session, we cannot implement that permutation. Such permutations represent impossible real-world scenarios, anyway. For now, I just explicitly name the valid permutations in each spec file. If the test harness detects this problem, we abort the current test spec. It might be nicer to instead cancel all outstanding queries, issue rollbacks in all sessions, and continue with other permutations. I hesitated to do that, because we currently leave all transaction control in the hands of the test spec. I only support one waiting command at a time. As long as one commands continues to wait, I run other commands to completion synchronously. This decision has no impact on the current test specs, which all have two sessions. It avoided a touchy policy decision concerning deadlock detection. If two commands have blocked, it may be that a third command needs to run before they will unblock, or it may be that the two commands have formed a deadlock. We won't know for sure until deadlock_timeout elapses. If it's possible to run the next step in the permutation (i.e., it uses a different session from any blocked command), we can either do so immediately or wait out the deadlock_timeout first. The latter slows the test suite, but it makes the output more natural -- more like what one would typically after running the commands by hand. If anyone can think of a sound general policy, that would be helpful. For now, I've punted. With a default postgresql.conf, deadlock_timeout constitutes most of the run time. Reduce it to 20ms to accelerate things when running the tests repeatedly. Since timing dictates which query participating in a deadlock will be chosen for cancellation, the expected outputs bearing deadlock errors are unstable. I'm not sure how much it will come up in practice, so I have not included expected output variations to address this. I think this will work on Windows as well as pgbench does, but I haven't verified that. Sorry for the delay on this. nm
Attachment
I hope this hasn't been forgotten. But I cant see it has been committed or moved into the commitfest process? Jesper On 2011-03-11 16:51, Noah Misch wrote: > On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote: >> Automated tests would go a long way toward building confidence that this patch >> does the right thing. Thanks to the SSI patch, we now have an in-tree test >> framework for testing interleaved transactions. The only thing it needs to be >> suitable for this work is a way to handle blocked commands. If you like, I can >> try to whip something up for that. > [off-list ACK followed] > > Here's a patch implementing that. It applies to master, with or without your > KEY LOCK patch also applied, though the expected outputs reflect the > improvements from your patch. I add three isolation test specs: > > fk-contention: blocking-only test case from your blog post > fk-deadlock: the deadlocking test case I used during patch review > fk-deadlock2: Joel Jacobson's deadlocking test case > > When a spec permutation would have us run a command in a currently-blocked > session, we cannot implement that permutation. Such permutations represent > impossible real-world scenarios, anyway. For now, I just explicitly name the > valid permutations in each spec file. If the test harness detects this problem, > we abort the current test spec. It might be nicer to instead cancel all > outstanding queries, issue rollbacks in all sessions, and continue with other > permutations. I hesitated to do that, because we currently leave all > transaction control in the hands of the test spec. > > I only support one waiting command at a time. As long as one commands continues > to wait, I run other commands to completion synchronously. This decision has no > impact on the current test specs, which all have two sessions. It avoided a > touchy policy decision concerning deadlock detection. If two commands have > blocked, it may be that a third command needs to run before they will unblock, > or it may be that the two commands have formed a deadlock. We won't know for > sure until deadlock_timeout elapses. If it's possible to run the next step in > the permutation (i.e., it uses a different session from any blocked command), we > can either do so immediately or wait out the deadlock_timeout first. The latter > slows the test suite, but it makes the output more natural -- more like what one > would typically after running the commands by hand. If anyone can think of a > sound general policy, that would be helpful. For now, I've punted. > > With a default postgresql.conf, deadlock_timeout constitutes most of the run > time. Reduce it to 20ms to accelerate things when running the tests repeatedly. > > Since timing dictates which query participating in a deadlock will be chosen for > cancellation, the expected outputs bearing deadlock errors are unstable. I'm > not sure how much it will come up in practice, so I have not included expected > output variations to address this. > > I think this will work on Windows as well as pgbench does, but I haven't > verified that. > > Sorry for the delay on this. >
On Sun, Jun 19, 2011 at 06:30:41PM +0200, Jesper Krogh wrote: > I hope this hasn't been forgotten. But I cant see it has been committed > or moved > into the commitfest process? If you're asking about that main patch for $SUBJECT rather than those isolationtester changes specifically, I can't speak to the plans for it. I wasn't planning to move the test suite work forward independent of the core patch it serves, but we could do that if there's another application. Thanks, nm > On 2011-03-11 16:51, Noah Misch wrote: >> On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote: >>> Automated tests would go a long way toward building confidence that this patch >>> does the right thing. Thanks to the SSI patch, we now have an in-tree test >>> framework for testing interleaved transactions. The only thing it needs to be >>> suitable for this work is a way to handle blocked commands. If you like, I can >>> try to whip something up for that. >> [off-list ACK followed] >> >> Here's a patch implementing that. It applies to master, with or without your >> KEY LOCK patch also applied, though the expected outputs reflect the >> improvements from your patch. I add three isolation test specs: >> >> fk-contention: blocking-only test case from your blog post >> fk-deadlock: the deadlocking test case I used during patch review >> fk-deadlock2: Joel Jacobson's deadlocking test case
On 2011-06-20 22:11, Noah Misch wrote: > On Sun, Jun 19, 2011 at 06:30:41PM +0200, Jesper Krogh wrote: >> I hope this hasn't been forgotten. But I cant see it has been committed >> or moved >> into the commitfest process? > If you're asking about that main patch for $SUBJECT rather than those > isolationtester changes specifically, I can't speak to the plans for it. I > wasn't planning to move the test suite work forward independent of the core > patch it serves, but we could do that if there's another application. Yes, I was actually asking about the main patch for foreign key locks. Jesper -- Jesper
Excerpts from Noah Misch's message of vie mar 11 12:51:14 -0300 2011: > On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote: > > Automated tests would go a long way toward building confidence that this patch > > does the right thing. Thanks to the SSI patch, we now have an in-tree test > > framework for testing interleaved transactions. The only thing it needs to be > > suitable for this work is a way to handle blocked commands. If you like, I can > > try to whip something up for that. > [off-list ACK followed] > > Here's a patch implementing that. It applies to master, with or without your > KEY LOCK patch also applied, though the expected outputs reflect the > improvements from your patch. I add three isolation test specs: > > fk-contention: blocking-only test case from your blog post > fk-deadlock: the deadlocking test case I used during patch review > fk-deadlock2: Joel Jacobson's deadlocking test case Thanks for this patch. I have applied it, adjusting the expected output of these tests to the HEAD code. I'll adjust it when I commit the fklocks patch, I guess, but it seemed simpler to have it out of the way; besides it might end up benefitting other people who might be messing with the locking code. > I only support one waiting command at a time. As long as one commands continues > to wait, I run other commands to completion synchronously. Should be fine for now, I guess. > I think this will work on Windows as well as pgbench does, but I haven't > verified that. We will find out shortly. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Jul 12, 2011 at 05:59:01PM -0400, Alvaro Herrera wrote: > Excerpts from Noah Misch's message of vie mar 11 12:51:14 -0300 2011: > > On Fri, Feb 11, 2011 at 02:13:22AM -0500, Noah Misch wrote: > > > Automated tests would go a long way toward building confidence that this patch > > > does the right thing. Thanks to the SSI patch, we now have an in-tree test > > > framework for testing interleaved transactions. The only thing it needs to be > > > suitable for this work is a way to handle blocked commands. If you like, I can > > > try to whip something up for that. > > [off-list ACK followed] > > > > Here's a patch implementing that. It applies to master, with or without your > > KEY LOCK patch also applied, though the expected outputs reflect the > > improvements from your patch. I add three isolation test specs: > > > > fk-contention: blocking-only test case from your blog post > > fk-deadlock: the deadlocking test case I used during patch review > > fk-deadlock2: Joel Jacobson's deadlocking test case > > Thanks for this patch. I have applied it, adjusting the expected output > of these tests to the HEAD code. I'll adjust it when I commit the > fklocks patch, I guess, but it seemed simpler to have it out of the way; > besides it might end up benefitting other people who might be messing > with the locking code. Great. There have been a few recent patches where I would have used this functionality to provide tests, so I'm glad to have it in. > > I think this will work on Windows as well as pgbench does, but I haven't > > verified that. > > We will find out shortly. I see you've added a fix for the MSVC animals; thanks. coypu failed during the run of the test due to a different session being chosen as the deadlock victim. We can now vary deadlock_timeout to prevent this; see attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much faster on a default postgresql.conf. crake failed when it reported waiting on the first step of an existing isolation test ("two-ids.spec"). I will need to look into that further. Thanks, nm
Attachment
Excerpts from Noah Misch's message of mié jul 13 01:34:10 -0400 2011: > coypu failed during the run of the test due to a different session being chosen > as the deadlock victim. We can now vary deadlock_timeout to prevent this; see > attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much > faster on a default postgresql.conf. I applied your patch, thanks. I couldn't reproduce the failures without it, even running only the three new tests in a loop a few dozen times. > crake failed when it reported waiting on the first step of an existing isolation > test ("two-ids.spec"). I will need to look into that further. Actually, there are four failures in tests other than the two fixed by your patch. These are: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-12%2022:32:02 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2011-07-14%2016:27:00 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pitta&dt=2011-07-15%2015:00:08 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-15%2018:32:02 The last two are an identical failure in multiple-row-versions: *************** *** 1,11 **** Parsed test spec with 4 sessions starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1 ! step rx1: SELECT * FROM t WHERE id = 1000000; id txt 1000000 - step wx2: UPDATE t SET txt = 'b' WHERE id = 1000000; step c2: COMMIT; step wx3: UPDATE t SET txt = 'c' WHERE id =1000000; step ry3: SELECT * FROM t WHERE id = 500000; --- 1,12 ---- Parsed test spec with 4 sessions starting permutation: rx1 wx2 c2 wx3 ry3 wy4 rz4 c4 c3 wz1 c1 ! step rx1: SELECT * FROM t WHERE id = 1000000; <waiting ...> ! step wx2: UPDATE t SET txt = 'b' WHERE id = 1000000; ! step rx1: <... completed> id txt 1000000 step c2: COMMIT; step wx3: UPDATE t SET txt = 'c' WHERE id = 1000000; step ry3: SELECT * FROM t WHERE id = 500000; The other failure by crake in two-ids: *************** *** 440,447 **** step c3: COMMIT; starting permutation: rxwy2 wx1 ry3 c2 c3 c1 ! step rxwy2: update D2 set id = (select id+1 from D1); step wx1: update D1 set id = id + 1; step ry3: select id fromD2; id --- 440,448 ---- step c3: COMMIT; starting permutation: rxwy2 wx1 ry3 c2 c3 c1 ! step rxwy2: update D2 set id = (select id+1 from D1); <waiting ...> step wx1: update D1 set id = id + 1; + step rxwy2: <... completed> step ry3: select id from D2; id And the most problematic one, in nightjar, is a failure to send two async commands, which is not supported by the new code: --- 255,260 ---- ERROR: could not serialize access due to read/write dependencies among transactions starting permutation:ry2 wx2 rx1 wy1 c2 c1 ! step ry2: SELECT count(*) FROM project WHERE project_manager = 1; <waiting ...> ! failed to send query: another command is already in progress -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Jul 15, 2011 at 07:01:26PM -0400, Alvaro Herrera wrote: > Excerpts from Noah Misch's message of mié jul 13 01:34:10 -0400 2011: > > > coypu failed during the run of the test due to a different session being chosen > > as the deadlock victim. We can now vary deadlock_timeout to prevent this; see > > attached fklocks-tests-deadlock_timeout.patch. This also makes the tests much > > faster on a default postgresql.conf. > > I applied your patch, thanks. I couldn't reproduce the failures without > it, even running only the three new tests in a loop a few dozen times. It's probably more likely to crop up on a loaded system. I did not actually reproduce it myself. However, if you swap the timeouts, the opposite session finds the deadlock. From there, I'm convinced that the right timing perturbations could yield the symptom coypu exhibited. > > crake failed when it reported waiting on the first step of an existing isolation > > test ("two-ids.spec"). I will need to look into that further. > > Actually, there are four failures in tests other than the two fixed by > your patch. These are: > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-12%2022:32:02 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2011-07-14%2016:27:00 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pitta&dt=2011-07-15%2015:00:08 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2011-07-15%2018:32:02 Thanks for summarizing. These all boil down to lock waits not anticipated by the test specs. Having pondered this, I've been able to come up with just one explanation. If autovacuum runs VACUUM during the test and finds that it can truncate dead space from the end of a relation, it will acquire an AccessExclusiveLock. When I decrease autovacuum_naptime to 1s, I do see plenty of pg_type and pg_attribute truncations during a test run. When I sought to reproduce this, what I first saw instead was an indefinite test suite hang. That turned out to arise from an unrelated thinko -- I assumed that backend IDs were stable for the life of the backend, but they're only stable for the life of a pgstat snapshot. This fell down when a backend older than one of the test backends exited during the test: 4199 2011-07-16 03:33:28.733 EDT DEBUG: forked new backend, pid=23984 socket=8 23984 2011-07-16 03:33:28.737 EDT LOG: statement: SET client_min_messages = warning; 23984 2011-07-16 03:33:28.739 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i)= pg_backend_pid() 23985 2011-07-16 03:33:28.740 EDT DEBUG: autovacuum: processing database "postgres" 4199 2011-07-16 03:33:28.754 EDT DEBUG: forked new backend, pid=23986 socket=8 23986 2011-07-16 03:33:28.754 EDT LOG: statement: SET client_min_messages = warning; 4199 2011-07-16 03:33:28.755 EDT DEBUG: server process (PID 23985) exited with exit code 0 23986 2011-07-16 03:33:28.755 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i)= pg_backend_pid() 4199 2011-07-16 03:33:28.766 EDT DEBUG: forked new backend, pid=23987 socket=8 23987 2011-07-16 03:33:28.766 EDT LOG: statement: SET client_min_messages = warning; 23987 2011-07-16 03:33:28.767 EDT LOG: statement: SELECT i FROM pg_stat_get_backend_idset() t(i) WHERE pg_stat_get_backend_pid(i)= pg_backend_pid() This led isolationtester to initialize backend_ids = {1,2,2}, making us unable to detect lock waits correctly. That's also consistent with the symptoms Rémi Zara just reported. With that fixed, I was able to reproduce the failure due to autovacuum-truncate-induced transient waiting using this recipe: - autovacuum_naptime = 1s - src/test/isolation/Makefile changed to pass --use-existing during installcheck - Run 'make installcheck' in a loop - A concurrent session running this in a loop: CREATE TABLE churn (a int, b int, c int, d int, e int, f int, g int, h int); DROP TABLE churn; That yields a steady stream of vacuum truncations, and an associated lock wait generally capsized the suite within 5-10 runs. Frankly, I have some difficulty believing that this mechanic alone produced all four failures you cite above; I suspect I'm still missing some more-frequent cause. Any other theories on which system background activities can cause a transient lock wait? It would have to produce a "pgstat_report_waiting(true)" call, so I believe that excludes all LWLock and lighter contention. In any event, I have attached a patch that fixes the problems I have described here. To ignore autovacuum, it only recognizes a wait when one of the backends under test holds a conflicting lock. (It occurs to me that perhaps we should expose a pg_lock_conflicts(lockmode_held text, lockmode_req text) function to simplify this query -- this is a fairly common monitoring need.) With that change in place, my setup survived through about fifty suite runs at a time. The streak would end when session 2 would unexpectedly detect a deadlock that session 1 should have detected. The session 1 deadlock_timeout I chose, 20ms, is too aggressive. When session 2 is to issue the command that completes the deadlock, it must do so before session 1 runs the deadlock detector. Since we burn 10ms just noticing that the previous statement has blocked, that left only 10ms to issue the next statement. This patch bumps the figure from 20s to 100ms; hopefully that will be enough for even a decently-loaded virtual host. We should keep it as low as is reasonable, because it contributes directly to the isolation suite runtime. Each addition to deadlock_timeout slows the suite by 12x that amount. With this patch in its final form, I have completed 180+ suite runs without a failure. In the absence of better theories on the cause for the buildfarm failures, we should give the buildfarm a whirl with this patch. I apologize for the quantity of errata this change is entailing. Thanks, nm -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Noah Misch wrote: > With this patch in its final form, I have completed 180+ suite runs > without a failure. The attached patch allows the tests to pass when default_transaction_isolation is stricter than 'read committed'. This is a slight change from the previously posted version of the files (because of a change in the order of statements, based on the timeouts), and in patch form this time. Since `make installcheck-world` works at all isolation level defaults, as do all previously included isolation tests, it seems like a good idea to keep this up. It will simplify my testing of SSI changes, anyway. -Kevin
Attachment
On Sat, Jul 16, 2011 at 01:03:31PM -0500, Kevin Grittner wrote: > Noah Misch wrote: > > > With this patch in its final form, I have completed 180+ suite runs > > without a failure. > > The attached patch allows the tests to pass when > default_transaction_isolation is stricter than 'read committed'. > This is a slight change from the previously posted version of the > files (because of a change in the order of statements, based on the > timeouts), and in patch form this time. > > Since `make installcheck-world` works at all isolation level > defaults, as do all previously included isolation tests, it seems > like a good idea to keep this up. It will simplify my testing of SSI > changes, anyway. This does seem sensible. Thanks. -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > Noah Misch wrote: > >> With this patch in its final form, I have completed 180+ suite >> runs without a failure. > > The attached patch allows the tests to pass when > default_transaction_isolation is stricter than 'read committed'. Without these two patches the tests fail about one time out of three on my machine at the office at the 'read committed' transaction isolation level, and all the time at stricter levels. On my machine at home I haven't seen the failures at 'read committed'. I don't know if this is Intel (at work) versus AMD (at home) or what. With both Noah's patch and mine I haven't yet seen a failure in either environment, with a few dozen tries.. -Kevin
Excerpts from Kevin Grittner's message of sáb jul 16 14:03:31 -0400 2011: > Noah Misch wrote: > > > With this patch in its final form, I have completed 180+ suite runs > > without a failure. > > The attached patch allows the tests to pass when > default_transaction_isolation is stricter than 'read committed'. > This is a slight change from the previously posted version of the > files (because of a change in the order of statements, based on the > timeouts), and in patch form this time. Thanks, applied. Sorry for the delay. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Kevin Grittner's message: >> Noah Misch wrote: >> >>> With this patch in its final form, I have completed 180+ suite >>> runs without a failure. >> >> The attached patch allows the tests to pass when >> default_transaction_isolation is stricter than 'read committed'. >> This is a slight change from the previously posted version of the >> files (because of a change in the order of statements, based on >> the timeouts), and in patch form this time. > > Thanks, applied. Sorry for the delay. My patch was intended to supplement Noah's patch here: http://archives.postgresql.org/pgsql-hackers/2011-07/msg00867.php Without his patch, there is still random failure on my work machine at all transaction isolation levels. -Kevin
Excerpts from Kevin Grittner's message of mar jul 19 13:49:53 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > Excerpts from Kevin Grittner's message: > >> Noah Misch wrote: > >> > >>> With this patch in its final form, I have completed 180+ suite > >>> runs without a failure. > >> > >> The attached patch allows the tests to pass when > >> default_transaction_isolation is stricter than 'read committed'. > >> This is a slight change from the previously posted version of the > >> files (because of a change in the order of statements, based on > >> the timeouts), and in patch form this time. > > > > Thanks, applied. Sorry for the delay. > > My patch was intended to supplement Noah's patch here: I'm aware of that, thanks. I'm getting that one in too, shortly. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Noah Misch's message of sáb jul 16 13:11:49 -0400 2011: > In any event, I have attached a patch that fixes the problems I have described > here. To ignore autovacuum, it only recognizes a wait when one of the > backends under test holds a conflicting lock. (It occurs to me that perhaps > we should expose a pg_lock_conflicts(lockmode_held text, lockmode_req text) > function to simplify this query -- this is a fairly common monitoring need.) Applied it. I agree that having such an utility function is worthwhile, particularly if we're working on making pg_locks more usable as a whole. (I wasn't able to reproduce Rémi's hangups here, so I wasn't able to reproduce the other bits either.) > With that change in place, my setup survived through about fifty suite runs at > a time. The streak would end when session 2 would unexpectedly detect a > deadlock that session 1 should have detected. The session 1 deadlock_timeout > I chose, 20ms, is too aggressive. When session 2 is to issue the command that > completes the deadlock, it must do so before session 1 runs the deadlock > detector. Since we burn 10ms just noticing that the previous statement has > blocked, that left only 10ms to issue the next statement. This patch bumps > the figure from 20s to 100ms; hopefully that will be enough for even a > decently-loaded virtual host. Committed this too. > With this patch in its final form, I have completed 180+ suite runs without a > failure. In the absence of better theories on the cause for the buildfarm > failures, we should give the buildfarm a whirl with this patch. Great. If there is some other failure mechanism, we'll find out ... > I apologize for the quantity of errata this change is entailing. No need to apologize. I might as well apologize myself because I didn't detect these problems on review. But we don't do that -- we just fix the problems and move on. It's great that you were able to come up with a fix quickly. And this is precisely why I committed this way ahead of the patch that it was written to help: we're now not fixing problems in both simultaneously. By the time we get that other patch in, this test harness will be fully robust. Thanks for all your effort in this. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hackers, This is an updated version of the patch I introduced here: http://archives.postgresql.org/message-id/1294953201-sup-2099@alvh.no-ip.org Mainly, this patch addresses the numerous comments by Noah Misch here: http://archives.postgresql.org/message-id/20110211071322.GB26971@tornado.leadboat.com My thanks to Noah for the very exhaustive review and ideas. I also removed the bit about copying the ComboCid to the new version of the tuple during an update. I think that must have been the result of very fuzzy thinking; I cannot find any reasoning that leads to it being necessary, or even correct. I also included Marti Raudsepp's patch to consider only indexes usable in foreign keys. One thing I have not addressed is Noah's idea about creating a new lock mode, KEY UPDATE, that would let us solve the initial problem that this patch set to resolve in the first place. I am not clear on exactly how that is to be implemented, because currently heap_update and heap_delete do not grab any kind of lock but instead do their own ad-hoc waiting. I think that might need to be reshuffled a bit, to which I haven't gotten yet, and is a radical enough idea that I would like it to be discussed by the hackers community at large before setting sail on developing it. In the meantime, this patch does improve the current situation quite a lot. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > One thing I have not addressed is Noah's idea about creating a new lock > mode, KEY UPDATE, that would let us solve the initial problem that this > patch set to resolve in the first place. I am not clear on exactly how > that is to be implemented, because currently heap_update and heap_delete > do not grab any kind of lock but instead do their own ad-hoc waiting. I > think that might need to be reshuffled a bit, to which I haven't gotten > yet, and is a radical enough idea that I would like it to be discussed > by the hackers community at large before setting sail on developing it. > In the meantime, this patch does improve the current situation quite a > lot. I haven't looked at the patch yet, but do you have a pointer to Noah's proposal? And/or a description of how it differs from what you implemented here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mié ago 03 12:14:15 -0400 2011: > On Wed, Jul 27, 2011 at 7:16 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > One thing I have not addressed is Noah's idea about creating a new lock > > mode, KEY UPDATE, that would let us solve the initial problem that this > > patch set to resolve in the first place. I am not clear on exactly how > > that is to be implemented, because currently heap_update and heap_delete > > do not grab any kind of lock but instead do their own ad-hoc waiting. I > > think that might need to be reshuffled a bit, to which I haven't gotten > > yet, and is a radical enough idea that I would like it to be discussed > > by the hackers community at large before setting sail on developing it. > > In the meantime, this patch does improve the current situation quite a > > lot. > > I haven't looked at the patch yet, but do you have a pointer to Noah's > proposal? And/or a description of how it differs from what you > implemented here? Yes, see his review email here: http://archives.postgresql.org/message-id/20110211071322.GB26971@tornado.leadboat.com It's long, but search for the part where he talks about "KEY UPDATE". The way my patch works is explained by Noah there. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support