Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) |
Date | |
Msg-id | CAMkU=1wknx-aG5k=uyd2PW7nX1HMDoFF_jxOb4mY+jwf815vUg@mail.gmail.com Whole thread Raw |
In response to | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: HeapTupleSatisfiesToast() busted? (was atomic
pin/unpin causing errors)
Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) |
List | pgsql-hackers |
On Tue, May 10, 2016 at 4:02 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-05-10 15:53:38 -0700, Jeff Janes wrote: >> >> But isn't CreateCheckPoint called at the end of the checkpoint, not >> the start of it? > > No, CreateCheckPoint() does it all. > > > CreateCheckPoint(int flags) > { > ... > /* 1) determine redo pointer */ > WALInsertLockAcquireExclusive(); > curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos); > prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos); > WALInsertLockRelease(); > ... > /* 2) determine oid */ > LWLockAcquire(OidGenLock, LW_SHARED); > checkPoint.nextOid = ShmemVariableCache->nextOid; > if (!shutdown) > checkPoint.nextOid += ShmemVariableCache->oidCount; > LWLockRelease(OidGenLock); > ... > /* 3) actually checkpoint shared_buffers et al. */ > CheckPointGuts(checkPoint.redo, flags); > ... > /* 4) log the checkpoint */ > recptr = XLogInsert(RM_XLOG_ID, > shutdown ? XLOG_CHECKPOINT_SHUTDOWN : > XLOG_CHECKPOINT_ONLINE); > ... > } OK, got it. I don' t know how I missed the bigger picture of that function in the first place. But, another perhaps stupid question, why do we care what the value of nextOid was at the start of the last successfully completed checkpoint? Wouldn't it make more sense to populate that part of the record at the end, not the start, of the checkpoint? > > >> I don't understand how it could be out-of-date at that point. But >> obviously it is. > > A checkpoint logically "starts" at 1) in the above abbreviated > CreateCheckPoint(), that's where recovery starts when starting up from > that checkpoint. But inbetween 1) and 4) all other backends can continue > to insert WAL, and it'll be replayed *before* the checkpoint's record > itself. That means that if some backend generates a NEXTOID record > between 2) and 4) (with largers checkpoints we're looking at minutes to > an hour of time), it's effects will temporarily take effect (as in > ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's > replay will overwrite it unconditionally: > void > xlog_redo(XLogReaderState *record) > { > else if (info == XLOG_CHECKPOINT_ONLINE) > { > ... > /* ... but still treat OID counter as exact */ > LWLockAcquire(OidGenLock, LW_EXCLUSIVE); > ShmemVariableCache->nextOid = checkPoint.nextOid; > ShmemVariableCache->oidCount = 0; > LWLockRelease(OidGenLock); > > Makes sense? So it seems like we should unconditionally **not** do that when replaying XLOG_CHECKPOINT_ONLINE, right? If this is the checkpoint record at which we started the system then we would have done that "replay" into ShmemVariableCache->nextOid during StartupXLOG(), there is no reason to do it again when redo passes through that record the 2nd time. But, to round this out, all of this is formally only a hint on where to start OIDs so as to avoid performance problems as you busy-loop looking for an unused OID. The real correctness bug is in the hint-bit problem you discuss elsewhere that turns collisions into corruption, and this bug just makes that other one reachable? Cheers, Jeff
pgsql-hackers by date: