Re: PITR - Some data is not recovered. - Mailing list pgsql-hackers

From Tom Lane
Subject Re: PITR - Some data is not recovered.
Date
Msg-id 1690.1092173467@sss.pgh.pa.us
Whole thread Raw
In response to PITR - Some data is not recovered.  (OKADA Satoshi <okada.satoshi@lab.ntt.co.jp>)
Responses Re: PITR - Some data is not recovered.
List pgsql-hackers
OKADA Satoshi <okada.satoshi@lab.ntt.co.jp> writes:
> I'm testing PITR using pgbench and postgresql ver.8.0bata. 
> I think that result of recovery is wrong.

Many thanks for this bug report.  I have developed a patch (attached)
that seems to fix it for me.  Please test and see if you can still
cause the problem.

BTW, I found that it was much easier to cause the bug to happen if you
force a long commit delay.  To do this you need to havefsync = oncommit_siblings = 1commit_delay = 100000
and while the pgbench run is going, start a new psql and dobegin;create table foo(f1 int);
(or any other command that modifies the database).  If you are watching
things with "top" you should notice a drastic decrease in the CPU
consumption of the backend connected to pgbench as soon as you do this.
Then proceed with taking the on-line backup.  You can end the extra psql
session after the backup is done, to allow pgbench to go back to
normal speed.

Doing it this way I got a failure just about every time (3 failures in 3
tries) whereas without, I had seen only one failure in four tries.
        regards, tom lane


Index: src/backend/access/transam/xact.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xact.c,v
retrieving revision 1.177
diff -c -r1.177 xact.c
*** src/backend/access/transam/xact.c    3 Aug 2004 15:57:26 -0000    1.177
--- src/backend/access/transam/xact.c    10 Aug 2004 20:55:48 -0000
***************
*** 574,586 ****         START_CRIT_SECTION();          /*
!          * We only need to log the commit in XLOG if the transaction made
!          * any transaction-controlled XLOG entries or will delete files.          * (If it made no
transaction-controlledXLOG entries, its XID          * appears nowhere in permanent storage, so no one else will ever
care
!          * if it committed.)          */         madeTCentries = (MyLastRecPtr.xrecoff != 0);         if
(madeTCentries|| nrels > 0)         {             XLogRecData rdata[3];
 
--- 574,601 ----         START_CRIT_SECTION();          /*
!          * If our transaction made any transaction-controlled XLOG entries,
!          * we need to lock out checkpoint start between writing our XLOG
!          * record and updating pg_clog.  Otherwise it is possible for the
!          * checkpoint to set REDO after the XLOG record but fail to flush the
!          * pg_clog update to disk, leading to loss of the transaction commit
!          * if we crash a little later.  Slightly klugy fix for problem
!          * discovered 2004-08-10.
!          *          * (If it made no transaction-controlled XLOG entries, its XID          * appears nowhere in
permanentstorage, so no one else will ever care
 
!          * if it committed; so it doesn't matter if we lose the commit flag.)
!          *
!          * Note we only need a shared lock.          */         madeTCentries = (MyLastRecPtr.xrecoff != 0);
+         if (madeTCentries)
+             LWLockAcquire(CheckpointStartLock, LW_SHARED);
+ 
+         /*
+          * We only need to log the commit in XLOG if the transaction made
+          * any transaction-controlled XLOG entries or will delete files.
+          */         if (madeTCentries || nrels > 0)         {             XLogRecData rdata[3];
***************
*** 668,673 ****
--- 683,692 ----             TransactionIdCommitTree(nchildren, children);         } 
+         /* Unlock checkpoint lock if we acquired it */
+         if (madeTCentries)
+             LWLockRelease(CheckpointStartLock);
+          END_CRIT_SECTION();     } 
***************
*** 850,855 ****
--- 869,876 ----          *          * We do not flush XLOG to disk unless deleting files, since the          * default
assumptionafter a crash would be that we aborted, anyway.
 
+          * For the same reason, we don't need to worry about interlocking
+          * against checkpoint start.          */         if (MyLastRecPtr.xrecoff != 0 || nrels > 0)         {
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/access/transam/xlog.c,v
retrieving revision 1.158
diff -c -r1.158 xlog.c
*** src/backend/access/transam/xlog.c    9 Aug 2004 16:26:01 -0000    1.158
--- src/backend/access/transam/xlog.c    10 Aug 2004 20:55:49 -0000
***************
*** 4699,4704 ****
--- 4699,4713 ----     checkPoint.ThisTimeLineID = ThisTimeLineID;     checkPoint.time = time(NULL); 
+     /*
+      * We must hold CheckpointStartLock while determining the checkpoint
+      * REDO pointer.  This ensures that any concurrent transaction commits
+      * will be either not yet logged, or logged and recorded in pg_clog.
+      * See notes in RecordTransactionCommit().
+      */
+     LWLockAcquire(CheckpointStartLock, LW_EXCLUSIVE);
+ 
+     /* And we need WALInsertLock too */     LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);      /*
***************
*** 4731,4736 ****
--- 4740,4746 ----             ControlFile->checkPointCopy.redo.xrecoff)         {
LWLockRelease(WALInsertLock);
+             LWLockRelease(CheckpointStartLock);             LWLockRelease(CheckpointLock);
END_CRIT_SECTION();            return;
 
***************
*** 4789,4794 ****
--- 4799,4807 ----      * GetSnapshotData needs to get XidGenLock while holding SInvalLock,      * so there's a risk of
deadlock.Need to find a better solution.  See      * pgsql-hackers discussion of 17-Dec-01.
 
+      *
+      * XXX actually, the whole UNDO code is dead code and unlikely to ever
+      * be revived, so the lack of a good solution here is not troubling.      */ #ifdef NOT_USED     checkPoint.undo
=GetUndoRecPtr();
 
***************
*** 4798,4808 **** #endif      /*
!      * Now we can release insert lock, allowing other xacts to proceed
!      * even while we are flushing disk buffers.      */     LWLockRelease(WALInsertLock);      /*      * Get the
otherinfo we need for the checkpoint record.      */
 
--- 4811,4823 ---- #endif      /*
!      * Now we can release insert lock and checkpoint start lock, allowing
!      * other xacts to proceed even while we are flushing disk buffers.      */     LWLockRelease(WALInsertLock); 
+     LWLockRelease(CheckpointStartLock);
+      /*      * Get the other info we need for the checkpoint record.      */
Index: src/include/storage/lwlock.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/storage/lwlock.h,v
retrieving revision 1.12
diff -c -r1.12 lwlock.h
*** src/include/storage/lwlock.h    11 Jun 2004 16:43:24 -0000    1.12
--- src/include/storage/lwlock.h    10 Aug 2004 20:55:49 -0000
***************
*** 36,41 ****
--- 36,42 ----     WALWriteLock,     ControlFileLock,     CheckpointLock,
+     CheckpointStartLock,     RelCacheInitLock,     BgWriterCommLock, 


pgsql-hackers by date:

Previous
From: Gavin Sherry
Date:
Subject: Re: ErrorContextCallback
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: Missing French backend translations in the HEAD