OKADA Satoshi <okada.satoshi@lab.ntt.co.jp> writes:
> I'm testing PITR using pgbench and postgresql ver.8.0bata.
> I think that result of recovery is wrong.
Many thanks for this bug report. I have developed a patch (attached)
that seems to fix it for me. Please test and see if you can still
cause the problem.
BTW, I found that it was much easier to cause the bug to happen if you
force a long commit delay. To do this you need to havefsync = oncommit_siblings = 1commit_delay = 100000
and while the pgbench run is going, start a new psql and dobegin;create table foo(f1 int);
(or any other command that modifies the database). If you are watching
things with "top" you should notice a drastic decrease in the CPU
consumption of the backend connected to pgbench as soon as you do this.
Then proceed with taking the on-line backup. You can end the extra psql
session after the backup is done, to allow pgbench to go back to
normal speed.
Doing it this way I got a failure just about every time (3 failures in 3
tries) whereas without, I had seen only one failure in four tries.
regards, tom lane
Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xact.c,v
retrieving revision 1.177
diff -c -r1.177 xact.c
*** src/backend/access/transam/xact.c 3 Aug 2004 15:57:26 -0000 1.177
--- src/backend/access/transam/xact.c 10 Aug 2004 20:55:48 -0000
***************
*** 574,586 **** START_CRIT_SECTION(); /*
! * We only need to log the commit in XLOG if the transaction made
! * any transaction-controlled XLOG entries or will delete files. * (If it made no
transaction-controlledXLOG entries, its XID * appears nowhere in permanent storage, so no one else will ever
care
! * if it committed.) */ madeTCentries = (MyLastRecPtr.xrecoff != 0); if
(madeTCentries|| nrels > 0) { XLogRecData rdata[3];
--- 574,601 ---- START_CRIT_SECTION(); /*
! * If our transaction made any transaction-controlled XLOG entries,
! * we need to lock out checkpoint start between writing our XLOG
! * record and updating pg_clog. Otherwise it is possible for the
! * checkpoint to set REDO after the XLOG record but fail to flush the
! * pg_clog update to disk, leading to loss of the transaction commit
! * if we crash a little later. Slightly klugy fix for problem
! * discovered 2004-08-10.
! * * (If it made no transaction-controlled XLOG entries, its XID * appears nowhere in
permanentstorage, so no one else will ever care
! * if it committed; so it doesn't matter if we lose the commit flag.)
! *
! * Note we only need a shared lock. */ madeTCentries = (MyLastRecPtr.xrecoff != 0);
+ if (madeTCentries)
+ LWLockAcquire(CheckpointStartLock, LW_SHARED);
+
+ /*
+ * We only need to log the commit in XLOG if the transaction made
+ * any transaction-controlled XLOG entries or will delete files.
+ */ if (madeTCentries || nrels > 0) { XLogRecData rdata[3];
***************
*** 668,673 ****
--- 683,692 ---- TransactionIdCommitTree(nchildren, children); }
+ /* Unlock checkpoint lock if we acquired it */
+ if (madeTCentries)
+ LWLockRelease(CheckpointStartLock);
+ END_CRIT_SECTION(); }
***************
*** 850,855 ****
--- 869,876 ---- * * We do not flush XLOG to disk unless deleting files, since the * default
assumptionafter a crash would be that we aborted, anyway.
+ * For the same reason, we don't need to worry about interlocking
+ * against checkpoint start. */ if (MyLastRecPtr.xrecoff != 0 || nrels > 0) {
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xlog.c,v
retrieving revision 1.158
diff -c -r1.158 xlog.c
*** src/backend/access/transam/xlog.c 9 Aug 2004 16:26:01 -0000 1.158
--- src/backend/access/transam/xlog.c 10 Aug 2004 20:55:49 -0000
***************
*** 4699,4704 ****
--- 4699,4713 ---- checkPoint.ThisTimeLineID = ThisTimeLineID; checkPoint.time = time(NULL);
+ /*
+ * We must hold CheckpointStartLock while determining the checkpoint
+ * REDO pointer. This ensures that any concurrent transaction commits
+ * will be either not yet logged, or logged and recorded in pg_clog.
+ * See notes in RecordTransactionCommit().
+ */
+ LWLockAcquire(CheckpointStartLock, LW_EXCLUSIVE);
+
+ /* And we need WALInsertLock too */ LWLockAcquire(WALInsertLock, LW_EXCLUSIVE); /*
***************
*** 4731,4736 ****
--- 4740,4746 ---- ControlFile->checkPointCopy.redo.xrecoff) {
LWLockRelease(WALInsertLock);
+ LWLockRelease(CheckpointStartLock); LWLockRelease(CheckpointLock);
END_CRIT_SECTION(); return;
***************
*** 4789,4794 ****
--- 4799,4807 ---- * GetSnapshotData needs to get XidGenLock while holding SInvalLock, * so there's a risk of
deadlock.Need to find a better solution. See * pgsql-hackers discussion of 17-Dec-01.
+ *
+ * XXX actually, the whole UNDO code is dead code and unlikely to ever
+ * be revived, so the lack of a good solution here is not troubling. */ #ifdef NOT_USED checkPoint.undo
=GetUndoRecPtr();
***************
*** 4798,4808 **** #endif /*
! * Now we can release insert lock, allowing other xacts to proceed
! * even while we are flushing disk buffers. */ LWLockRelease(WALInsertLock); /* * Get the
otherinfo we need for the checkpoint record. */
--- 4811,4823 ---- #endif /*
! * Now we can release insert lock and checkpoint start lock, allowing
! * other xacts to proceed even while we are flushing disk buffers. */ LWLockRelease(WALInsertLock);
+ LWLockRelease(CheckpointStartLock);
+ /* * Get the other info we need for the checkpoint record. */
Index: src/include/storage/lwlock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/lwlock.h,v
retrieving revision 1.12
diff -c -r1.12 lwlock.h
*** src/include/storage/lwlock.h 11 Jun 2004 16:43:24 -0000 1.12
--- src/include/storage/lwlock.h 10 Aug 2004 20:55:49 -0000
***************
*** 36,41 ****
--- 36,42 ---- WALWriteLock, ControlFileLock, CheckpointLock,
+ CheckpointStartLock, RelCacheInitLock, BgWriterCommLock,