Re: Are ctid chaining loops safe without relation size checks? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Are ctid chaining loops safe without relation size checks? |
Date | |
Msg-id | 20190515214401.k3z4npoisek3qfgq@alap3.anarazel.de Whole thread Raw |
In response to | Re: Are ctid chaining loops safe without relation size checks? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2019-05-15 15:09:34 -0400, Tom Lane wrote: > (If we're not checking liveness of x_max before following the link, > we'd have trouble ...) I don't think we do everywhere - e.g. in heap_get_latest_tid() case that made me think about this there's only this as an xmax based loop termination: /* * After following a t_ctid link, we might arrive at an unrelated * tuple. Check for XMIN match. */ if (TransactionIdIsValid(priorXmax) && !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data))) { UnlockReleaseBuffer(buffer); break; } but that's after we already followed the link, and read the page (and it obviously isn't a liveliness check). Additionally, note that heap_get_latest_tid() does *not* terminate when it finds a visible tuple: /* * Check tuple visibility; if visible, set it as the new result * candidate. */ valid = HeapTupleSatisfiesVisibility(&tp, snapshot, buffer); CheckForSerializableConflictOut(valid, relation, &tp, buffer, snapshot); if (valid) *tid = ctid; it just continues to the next tuple version, if any. EvalPlanQualFetch() in <= 11 and heapam_tuple_lock() in master and heap_lock_updated_tuple_rec() don't have the problem that xmax might suddenly abort while following the chain, because they have code like: /* * If tuple is being updated by other transaction then we * have to wait for its commit/abort, or die trying. */ if (TransactionIdIsValid(SnapshotDirty.xmax)) { ReleaseBuffer(buffer); switch (wait_policy) { case LockWaitBlock: XactLockTableWait(SnapshotDirty.xmax, relation, &tuple->t_self, XLTW_FetchUpdated); break; case LockWaitSkip: if (!ConditionalXactLockTableWait(SnapshotDirty.xmax)) /* skip instead of waiting */ return TM_WouldBlock; break; case LockWaitError: if (!ConditionalXactLockTableWait(SnapshotDirty.xmax)) ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); break; } continue; /* loop back to repeat heap_fetch */ } but heap_get_latest_tid() doesn't have that logic. So I think the problem is just that heap_get_latest_tid() is missing this type of check. The reason for which presumably is this piece of intended functionality: * Actually, this gets the latest version that is visible according to * the passed snapshot. You can pass SnapshotDirty to get the very latest, * possibly uncommitted version. which means that neither can it block when xmax is still running, nor can it terminate when HeapTupleSatisfiesVisibility() returns true. There's no core code using a !mvcc snapshot however. Because it's relevant for other work we've talked about for v13, and for a potential fix: > Andres Freund <andres@anarazel.de> writes: > > Which I dutifully rewrote. But I'm actually not sure it's safe at all > > for heap to rely on t_ctid links to be valid. What prevents a ctid link > > to point to a page that's since been truncated away? > > Nothing, but when would the issue come up? The updated tuple must be > newer than the one pointing at it, so if it's dead then the one pointing > at it must be too, no? Well, the current tuple might not be dead, it might be UPDATE_IN_PROGRESS when start following the ctid chain. By the time we get to the next tuple, that UPDATE might have rolled back, vacuum came along, removed the new version of thew tuple (which then becomes DEAD, not RECENTLY_DEAD) and then truncated the relation. Currently that's not possible in the nodeTidscan.c case, because we'll have a lock preventing truncations. But if we were to allow truncations without an AEL, that'd be different. I went through a few possible ways to fix this: 1) Break out of heap_get_latest_tid()/ loop, if the ctid to be chained to is bigger than the block length. It can't be visible by any definition except SnapshotAny, and we could just disallow that. As there's a lock on the relation heap_get_latest_tid() is operating on, we can rely on that value not getting too outdated. But I don't think that's correct, because the newest version of the tuple *actually* might be beyond the end of the table at the beginning of the scan - we *do* allow extension of the table while somebody holds a lock on the table after all. I also don't like adding more assumptions that depend on preventing truncations while any other lock is held. 2) Just disallow SnapshotDirty/SnapshotAny for heap_get_latest_tid(), and break out of the loop if the current tuple is visible. There can't be a newer visible version anyway, and as long as the input tid for heap_get_latest_tid() points to something the calling transaction could see (even if in an earlier snapshot), there has to be a visible version somewhere. That'd not fix the tid.c callers, but they're essentially unused and weird anyway. I'm not sure if WHERE CURRENT OF with a WITH HOLD cursor is possible / would be a problem. In that case we might need to add a XactLockTableWait too. 3) Declare these problems as esoteric, and don't care. Greetings, Andres Freund
pgsql-hackers by date: