Re: foreign key locks, 2nd attempt - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: foreign key locks, 2nd attempt |
Date | |
Msg-id | 20120117093913.GA13462@tornado.leadboat.com Whole thread Raw |
In response to | Re: foreign key locks, 2nd attempt (Alvaro Herrera <alvherre@commandprompt.com>) |
Responses |
Re: foreign key locks, 2nd attempt
|
List | pgsql-hackers |
On Sun, Jan 15, 2012 at 01:49:54AM -0300, Alvaro Herrera wrote: > - I'm not sure that the multixact truncation code is sane on > checkpoints. It might be that I need to tweak more the pg_control info > we keep about truncation. The whole truncation thing needs more > testing, too. My largest outstanding concern involves the possibility of MultiXactId wraparound. From my last review: This raises a notable formal hazard: it's possible to burn through theMultiXactId space faster than the regular TransactionIdspace. We could getinto a situation where pg_clog is covering 2B xids, and yet we need >4BMultiXactId to coverthat period. We had better at least notice this andhalt, if not have autovacuum actively prevent it. (That should have been 2B rather than 4B, since MultiXactId uses the same 2B-in-past, 2B-in-future behavior as regular xids.) Are we willing to guess that this will "never" happen and make recovery minimally possible? If so, we could have GetNewMultiXactId() grow defenses similar to GetNewTransactionId() and leave it at that. If not, we need to involve autovacuum. The other remaining high-level thing is to have key_attrs contain only columns actually referenced by FKs. > - Go over Noah's two reviews again and see if I missed anything; also > make sure I haven't missed anything from other reviewers. There are some, yes. > *** a/src/backend/access/heap/heapam.c > --- b/src/backend/access/heap/heapam.c > *************** > *** 2773,2783 **** l2: > } > else if (result == HeapTupleBeingUpdated && wait) > { > ! TransactionId xwait; > uint16 infomask; > > /* must copy state data before unlocking buffer */ > ! xwait = HeapTupleHeaderGetXmax(oldtup.t_data); > infomask = oldtup.t_data->t_infomask; > > LockBuffer(buffer, BUFFER_LOCK_UNLOCK); > --- 2848,2871 ---- > } > else if (result == HeapTupleBeingUpdated && wait) > { > ! TransactionId xwait; > uint16 infomask; > + bool none_remain = false; Nothing can ever set this variable to anything different. It seems that keep_xact == InvalidTransactionId substitutes well enough, though. > /* > * We may overwrite if previous xmax aborted, or if it committed but > ! * only locked the tuple without updating it, or if we are going to > ! * keep it around in Xmax. > */ The second possibility is just a subset of the third. > ! if (TransactionIdIsValid(keep_xmax) || > ! none_remain || > ! (oldtup.t_data->t_infomask & HEAP_XMAX_INVALID)) > result = HeapTupleMayBeUpdated; > else > result = HeapTupleUpdated; > *************** > *** 3314,3323 **** heap_tuple_attr_equals(TupleDesc tupdesc, int attrnum, > */ > static bool > HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs, > ! HeapTuple oldtup, HeapTuple newtup) > { > int attrnum; > > while ((attrnum = bms_first_member(hot_attrs)) >= 0) > { > /* Adjust for system attributes */ > --- 3537,3549 ---- > */ > static bool > HeapSatisfiesHOTUpdate(Relation relation, Bitmapset *hot_attrs, > ! HeapTuple oldtup, HeapTuple newtup, bool empty_okay) > { > int attrnum; > > + if (!empty_okay && bms_is_empty(hot_attrs)) > + return false; When a table contains no key attributes, it seems arbitrary whether we call the key revoked or not. What is the motivation for this change? > ! /* > ! * If we're requesting KeyShare, and there's no update present, we > ! * don't need to wait. Even if there is an update, we can still > ! * continue if the key hasn't been modified. > ! * > ! * However, if there are updates, we need to walk the update chain > ! * to mark future versions of the row as locked, too. That way, if > ! * somebody deletes that future version, we're protected against the > ! * key going away. This locking of future versions could block > ! * momentarily, if a concurrent transaction is deleting a key; or it > ! * could return a value to the effect that the transaction deleting the > ! * key has already committed. So we do this before re-locking the > ! * buffer; otherwise this would be prone to deadlocks. Note that the TID > ! * we're locking was grabbed before we unlocked the buffer. For it to > ! * change while we're not looking, the other properties we're testing > ! * for below after re-locking the buffer would also change, in which > ! * case we would restart this loop above. > ! */ > ! if ((mode == LockTupleKeyShare) && > ! (HeapTupleHeaderInfomaskIsOnlyLocked(infomask) || > ! !(infomask2 & HEAP_UPDATE_KEY_REVOKED))) Isn't the OnlyLocked test redundant, a subset of the !KEY_REVOKED test? > { > ! /* if there are updates, follow the update chain */ > ! if (!HeapTupleHeaderInfomaskIsOnlyLocked(infomask)) > ! { > ! HTSU_Result res; > ! > ! res = heap_lock_updated_tuple(relation, tid, > ! GetCurrentTransactionId(), > ! mode); > ! if (res != HeapTupleMayBeUpdated) > ! { > ! result = res; > ! /* recovery code expects to have buffer lock held */ > ! LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > ! goto failed; > ! } > ! } > ! > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > > /* > ! * Make sure it's still an appropriate lock, else start over. > */ > ! if (!HeapTupleHeaderIsOnlyLocked(tuple->t_data) && > ! (tuple->t_data->t_infomask2 & HEAP_UPDATE_KEY_REVOKED)) > goto l3; > + require_sleep = false; > + > + /* > + * Note we allow Xmax to change here; other updaters/lockers could > + * have modified it before we grabbed the buffer lock. However, > + * this is not a problem, because with the recheck we just did we > + * ensure that they still don't conflict with the lock we want. > + */ If an updater has appeared in the meantime, don't we need to go back and lock along its update chain? > } > > + /* > + * If we're requesting Share, we can similarly avoid sleeping if > + * there's no update and no exclusive lock present. > + */ > + if (mode == LockTupleShare && > + (infomask & (HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_IS_NOT_UPDATE)) && > + !(infomask & HEAP_XMAX_EXCL_LOCK)) > + { > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > > /* > ! * Make sure it's still an appropriate lock, else start over. See > ! * above about allowing xmax to change. > */ I agree that it should be safe here, though. > ! if (!(tuple->t_data->t_infomask & > ! (HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_IS_NOT_UPDATE)) || > ! (tuple->t_data->t_infomask & HEAP_XMAX_EXCL_LOCK)) > goto l3; > + require_sleep = false; > + } > ! elog(ERROR, "invalid lock mode in heap_tuple_lock"); "heap_lock_tuple" in that message. > ! /* > ! * Make sure there is no forward chain link in t_ctid. Note that in the > ! * cases where the tuple has been updated, we must not overwrite t_ctid, > ! * because it was set by the updater. Moreover, if the tuple has been > ! * updated, we need to follow the update chain to lock the new versions > ! * of the tuple as well. > ! * > ! * FIXME -- not 100% sure of the implications of this. > ! */ > ! if (HeapTupleHeaderInfomaskIsOnlyLocked(new_infomask)) > ! tuple->t_data->t_ctid = *tid; This seems right. > + /* > + * Given an original set of Xmax and infomask, and a transaction acquiring a > + * new lock of some mode, compute the new Xmax and corresponding infomask to > + * use on the tuple. > + * > + * Note that this might have side effects such as creating a new MultiXactId. > + * > + * Most callers will have called HeapTupleSatisfiesUpdate before this function; > + * that will have set the HEAP_XMAX_INVALID bit if the xmax was a MultiXactId > + * but it was not running anymore. There is a race condition, which is that the > + * MultiXactId may have finished since then, but that uncommon case is handled > + * within MultiXactIdExpand. > + * > + * There is a similar race condition possible when the old xmax was a regular > + * TransactionId. We test TransactionIdIsInProgress again just to narrow the > + * window, but it's still possible to end up creating an unnecessary > + * MultiXactId. Fortunately this is harmless. > + */ > + static void > + compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask, > + TransactionId add_to_xmax, LockTupleMode mode, > + TransactionId *result_xmax, uint16 *result_infomask) > + { > + TransactionId new_xmax; > + uint16 new_infomask = old_infomask; > + > + if (old_infomask & (HEAP_XMAX_INVALID | HEAP_XMAX_COMMITTED)) > + { > + /* > + * No previous locker, or it already finished; we just insert our own > + * TransactionId. > + */ > + switch (mode) > + { > + case LockTupleKeyShare: > + new_xmax = add_to_xmax; > + new_infomask |= HEAP_XMAX_KEYSHR_LOCK; > + break; > + case LockTupleShare: > + /* need a multixact here in any case */ > + new_xmax = MultiXactIdCreateSingleton(add_to_xmax, MultiXactStatusForShare); > + new_infomask |= GetMultiXactIdHintBits(new_xmax); > + break; > + case LockTupleUpdate: > + new_infomask |= HEAP_XMAX_EXCL_LOCK; > + new_xmax = xmax; Shouldn't that be "new_xmax = add_to_xmax"? > + break; > + default: > + elog(ERROR, "invalid lock mode"); > + new_xmax = InvalidTransactionId; /* keep compiler quiet */ > + } > + /* no other updater; just add myself */ > + } > + else if (old_infomask & HEAP_XMAX_IS_MULTI) > + { > + MultiXactStatus new_mxact_status; > + > + new_mxact_status = get_mxact_status_for_tuplelock(mode); > + /* > + * If the XMAX is already a MultiXactId, then we need to > + * expand it to include our own TransactionId. > + */ > + new_xmax = MultiXactIdExpand((MultiXactId) xmax, add_to_xmax, new_mxact_status); > + new_infomask |= GetMultiXactIdHintBits(new_xmax); > + } > + else if (TransactionIdIsInProgress(xmax)) > + { > + /* > + * If the XMAX is a valid, in-progress TransactionId, then we need to > + * create a new MultiXactId that includes both the old locker and our > + * own TransactionId. > + */ > + MultiXactStatus status; > + MultiXactStatus new_mxact_status; > + > + new_mxact_status = get_mxact_status_for_tuplelock(mode); > + > + if (old_infomask & HEAP_XMAX_EXCL_LOCK) > + status = MultiXactStatusForUpdate; > + else if (old_infomask & HEAP_XMAX_KEYSHR_LOCK) > + status = MultiXactStatusForKeyShare; > + else > + { > + status = MultiXactStatusUpdate; > + } > + > + /* FIXME need to verify the KEY_REVOKED bit, and block if it's set */ > + > + new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_mxact_status); > + new_infomask |= GetMultiXactIdHintBits(new_xmax); > + /* FIXME -- we need to add bits to the infomask here! */ > + } > + else if (mode == LockTupleShare) > + { > + MultiXactStatus new_mxact_status; > + > + /* > + * There's no hint bit for FOR SHARE, so we need a multixact > + * here no matter what. > + */ > + new_mxact_status = get_mxact_status_for_tuplelock(mode); > + new_xmax = MultiXactIdCreateSingleton(add_to_xmax, new_mxact_status); > + new_infomask |= GetMultiXactIdHintBits(new_xmax); > + } If you remove the conditional block above, the next conditional block will handle it fine. > + else > + { > + /* > + * Can get here iff the updating transaction was running when the > + * infomask was extracted from the tuple, but finished before > + * TransactionIdIsInProgress got to run. Treat it like there's no > + * locker in the tuple. > + */ > + switch (mode) > + { > + case LockTupleKeyShare: > + new_infomask |= HEAP_XMAX_KEYSHR_LOCK; > + new_xmax = xmax; Shouldn't that be add_to_xmax? > + break; > + case LockTupleShare: > + /* need a multixact here in any case */ > + new_xmax = MultiXactIdCreateSingleton(add_to_xmax, MultiXactStatusForShare); > + new_infomask |= GetMultiXactIdHintBits(new_xmax); > + break; > + case LockTupleUpdate: > + new_infomask |= HEAP_XMAX_EXCL_LOCK; > + new_xmax = xmax; > + break; > + default: > + elog(ERROR, "invalid lock mode"); > + new_xmax = InvalidTransactionId; /* keep compiler quiet */ > + } Can you rearrange conditional flow to avoid having two copies of this switch statement? > + } > + > + /* must unset the XMAX_INVALID bit */ > + new_infomask &= ~HEAP_XMAX_INVALID; > + > + *result_infomask = new_infomask; > + *result_xmax = new_xmax; > + } > + > + static HTSU_Result > + heap_lock_updated_tuple(Relation rel, ItemPointer tid, TransactionId xid, > + LockTupleMode mode) > + { This function could use a comment. > + ItemPointerData tupid; > + HeapTupleData mytup; > + Buffer buf; > + uint16 new_infomask, > + old_infomask; > + TransactionId xmax, > + new_xmax; > + > + ItemPointerCopy(tid, &tupid); > + > + restart: > + new_infomask = 0; > + new_xmax = InvalidTransactionId; > + ItemPointerCopy(&tupid, &(mytup.t_self)); > + > + l5: > + if (!heap_fetch(rel, SnapshotAny, &mytup, &buf, false, NULL)) > + elog(ERROR, "unable to fetch updated version of tuple"); > + > + /* > + * XXX we do not lock this tuple here; the theory is that it's sufficient > + * with the buffer lock we're about to grab. Any other code must be able > + * to cope with tuple lock specifics changing while they don't hold buffer > + * lock anyway. > + */ > + > + /* > + * We've got a more recent (updated) version of a tuple we locked. > + * We need to propagate the lock to it; here we don't sleep at all > + * or try to check visibility, we just inconditionally mark it as > + * locked by us. We only need to ensure we have buffer lock. > + */ > + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); > + > + old_infomask = mytup.t_data->t_infomask; > + xmax = HeapTupleHeaderGetRawXmax(mytup.t_data); > + > + if (!(old_infomask & HEAP_XMAX_INVALID) && > + (mytup.t_data->t_infomask & HEAP_UPDATE_KEY_REVOKED)) > + { > + TransactionId xmax; > + > + xmax = HeapTupleHeaderGetUpdateXid(mytup.t_data); > + if (TransactionIdIsCurrentTransactionId(xmax)) > + { > + UnlockReleaseBuffer(buf); > + return HeapTupleSelfUpdated; > + } Is reaching this code indeed possible, with cursors or something? > + else if (TransactionIdIsInProgress(xmax)) > + { > + UnlockReleaseBuffer(buf); > + XactLockTableWait(xmax); > + goto l5; What about just unlocking the buffer here and moving "l5" to after the heap_fetch()? We should not need to refetch. > + } > + else if (TransactionIdDidAbort(xmax)) > + ; /* okay to proceed */ > + else if (TransactionIdDidCommit(xmax)) > + { > + UnlockReleaseBuffer(buf); > + return HeapTupleUpdated; > + } > + } > + > + compute_new_xmax_infomask(xmax, old_infomask, xid, mode, > + &new_xmax, &new_infomask); > + > + START_CRIT_SECTION(); > + > + /* And set them. */ > + HeapTupleHeaderSetXmax(mytup.t_data, new_xmax); > + mytup.t_data->t_infomask = new_infomask; > + > + MarkBufferDirty(buf); > + > + /* > + * FIXME XLOG stuff goes here. Is it really necessary to have this, or > + * would it be sufficient to just WAL log the original tuple lock, and have > + * the replay code follow the update chain too? > + */ A single WAL record referencing the root tuple and a PageSetLSN()/PageSetTLI() on all affected pages should be sufficient. However, that requires a critical section from here until the writing of that WAL record. As the code stands, plenty can fail in the meantime. > + > + END_CRIT_SECTION(); > + > + LockBuffer(buf, BUFFER_LOCK_UNLOCK); > + > + /* found end of update chain? */ > + /* FIXME -- ISTM we must also check that Xmin in the new tuple matches > + * updating Xid of the old, as other routines do. */ Agreed. I suggest mimicing heap_get_latest_tid() here. > + if (ItemPointerEquals(&(mytup.t_self), &(mytup.t_data->t_ctid))) > + { > + ReleaseBuffer(buf); > + return HeapTupleMayBeUpdated; > + } > + > + /* tail recursion */ > + ItemPointerCopy(&(mytup.t_data->t_ctid), &tupid); You normally need at least a shared buffer content lock to read t_ctid. Any concurrent updater who changes t_ctid after we released the lock will also be copying the lock we just added, so there would be no need to continue up the chain. So, if you copied t_ctid outside of any content lock and then verified the xmax/xmin match per above, it might be fine. However, I wouldn't use that cleverness to merely shave a couple of instructions from the locked region. > + ReleaseBuffer(buf); > + goto restart; > + } > /* > * The tuple might be marked either XMAX_INVALID or XMAX_COMMITTED > ! * + LOCKED, possibly with IS_MULTI too. Normalize to INVALID just > ! * to be sure no one gets confused. Also get rid of the > ! * HEAP_UPDATE_KEY_REVOKED bit. > */ > ! tuple->t_infomask &= ~(HEAP_XMAX_COMMITTED | HEAP_LOCK_BITS | > ! HEAP_XMAX_IS_MULTI); > ! tuple->t_infomask &= ~HEAP_LOCK_BITS; The most recent line is redundant with its predecessor. > *** a/src/backend/access/transam/multixact.c > --- b/src/backend/access/transam/multixact.c > + #define MULTIXACT_DEBUG Omit the above line. > + /* > * MultiXactIdCreate > * Construct a MultiXactId representing two TransactionIds. > * > ! * The two XIDs must be different, or be requesting different lock modes. Why is it not sufficient to store the strongest type for a particular xid? > *************** > *** 775,786 **** RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset, > > prev_pageno = -1; > > ! for (i = 0; i < nxids; i++, offset++) > { > TransactionId *memberptr; > > pageno = MXOffsetToMemberPage(offset); > ! entryno = MXOffsetToMemberEntry(offset); > > if (pageno != prev_pageno) > { > --- 768,792 ---- > > prev_pageno = -1; > > ! for (i = 0; i < nmembers; i++, offset++) > { > TransactionId *memberptr; > + uint32 *flagsptr; > + uint32 flagsval; > + int bshift; > + int flagsoff; > + int memberoff; > + > + if (members[i].xid < 900) > + abort(); Leftover from testing? > *************** > *** 1222,1236 **** mXactCachePut(MultiXactId multi, int nxids, TransactionId *xids) > > #ifdef MULTIXACT_DEBUG > static char * > ! mxid_to_string(MultiXactId multi, int nxids, TransactionId *xids) > { > ! char *str = palloc(15 * (nxids + 1) + 4); > int i; > > ! snprintf(str, 47, "%u %d[%u", multi, nxids, xids[0]); > > ! for (i = 1; i < nxids; i++) > ! snprintf(str + strlen(str), 17, ", %u", xids[i]); > > strcat(str, "]"); > return str; > --- 1285,1327 ---- > > #ifdef MULTIXACT_DEBUG > static char * > ! mxstatus_to_string(MultiXactStatus status) > { > ! switch (status) > ! { > ! case MultiXactStatusForKeyShare: > ! return "keysh"; > ! case MultiXactStatusForShare: > ! return "sh"; > ! case MultiXactStatusForUpdate: > ! return "forupd"; > ! case MultiXactStatusUpdate: > ! return "upd"; > ! case MultiXactStatusKeyUpdate: > ! return "keyup"; > ! default: > ! elog(ERROR, "unrecognized multixact status %d", status); > ! return ""; > ! } > ! } > ! > ! static char * > ! mxid_to_string(MultiXactId multi, int nmembers, MultiXactMember *members) > ! { > ! static char *str = NULL; > int i; > > ! if (str != NULL) > ! pfree(str); > ! > ! str = MemoryContextAlloc(TopMemoryContext, 15 * (nmembers + 1) + 4); > > ! snprintf(str, 47, "%u %d[%u (%s)", multi, nmembers, members[0].xid, > ! mxstatus_to_string(members[0].status)); > ! > ! for (i = 1; i < nmembers; i++) > ! snprintf(str + strlen(str), 17, ", %u (%s)", members[i].xid, > ! mxstatus_to_string(members[i].status)); This could truncate: 10 chars from %u, 6 from %s, 5 constant chars. How about using a StringInfoData instead? > > strcat(str, "]"); > return str; > *** a/src/backend/utils/time/combocid.c > --- b/src/backend/utils/time/combocid.c > *************** > *** 118,126 **** HeapTupleHeaderGetCmax(HeapTupleHeader tup) > { > CommandId cid = HeapTupleHeaderGetRawCommandId(tup); > > /* We do not store cmax when locking a tuple */ > ! Assert(!(tup->t_infomask & (HEAP_MOVED | HEAP_IS_LOCKED))); > ! Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tup))); > > if (tup->t_infomask & HEAP_COMBOCID) > return GetRealCmax(cid); > --- 118,128 ---- > { > CommandId cid = HeapTupleHeaderGetRawCommandId(tup); > > + Assert(!(tup->t_infomask & HEAP_MOVED)); > /* We do not store cmax when locking a tuple */ The comment is deceptive now. > ! Assert(!HeapTupleHeaderIsOnlyLocked(tup)); > ! Assert((tup->t_infomask & HEAP_XMAX_IS_MULTI) || > ! TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tup))); How about the more-specific "Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)))" in place of both of these asserts? > > if (tup->t_infomask & HEAP_COMBOCID) > return GetRealCmax(cid); > *** a/src/backend/utils/time/tqual.c > --- b/src/backend/utils/time/tqual.c I still haven't reviewed the tqual.c changes in detail, but I see that is has several FIXMEs. > *** a/src/test/isolation/isolationtester.c > --- b/src/test/isolation/isolationtester.c > *************** > *** 395,401 **** run_named_permutations(TestSpec * testspec) > Permutation *p = testspec->permutations[i]; > Step **steps; > > ! if (p->nsteps != nallsteps) > { > fprintf(stderr, "invalid number of steps in permutation %d\n", i + 1); > exit_nicely(); > --- 395,401 ---- > Permutation *p = testspec->permutations[i]; > Step **steps; > > ! if (p->nsteps > nallsteps) > { > fprintf(stderr, "invalid number of steps in permutation %d\n", i + 1); > exit_nicely(); > *************** > *** 565,570 **** run_permutation(TestSpec * testspec, int nsteps, Step ** steps) > --- 565,571 ---- > * steps from this session can run until it is unblocked, but it > * can only be unblocked by running steps from other sessions. > */ > + fflush(stdout); > fprintf(stderr, "invalid permutation detected\n"); > > /* Cancel the waiting statement from this session. */ Why these isolationtester.c changes? Thanks, nm
pgsql-hackers by date: