Re: Skip checkpoint on promoting from streaming replication - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Skip checkpoint on promoting from streaming replication
Date
Msg-id 20120618.174217.155445557.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Skip checkpoint on promoting from streaming replication  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Skip checkpoint on promoting from streaming replication
List pgsql-hackers
Hello, This is the new version of the patch.

Your patch introduced new WAL record type XLOG_END_OF_RECOVERY to
mark the chenge point of TLI. But I think the information is
already stored in history files and already ready to use in
current code.

I looked into your first patch and looked over the discussion on
it, and find that my understanding that TLI switch is operable
also for crash recovery besides archive recovery was half
wrong. The correct half was that it can be operable for crash
recovery if we properly set TimeLineID in StartupXLOG().

To achieve this, I added a new field 'latestTLI' (more proper
name is welcome) and make it always catch up with the latest TLI
with no relation to checkpoints. Then set the recovery target in
StartupXLOG() referring it. Additionaly, in previous patch, I
checked only checkpoint intervals but this ended with no effect
as you said. Because the WAL files in pg_xlog are preserved as
many as required for crash recovery, as I knew...


The new patch seems to work correctly for changing of TLI without
checkpoint following.  And archive recovery and PITR also seems
to work correctly. The test script for the former is attached
too.

The new patch consists of two parts. These might should be
treated as two separate ones..

1. Allow_TLI_Increment_without_Checkpoint_20120618.patch
 Removes the assumption after the 'convension' that TLI should be incremented only on shutdown checkpoint. This seems
actuallyhas no problem as the commnet(This is not particularly critical).
 

2. Skip_Checkpoint_on_Promotion_20120618.patch
 Skips checkpoint if redo record can be read in-place.

3. Test script for TLI increment patch.
 This is only to show how the patch is tested. The point is creating TLI increment point not followed by any kind of
checkpoints. pg_controldata shows like following after running this test script. Latest timeline ID is the new field.
 
  > pg_control version number:            923  > Database cluster state:               in production !> Latest timeline
ID:                  2  > Latest checkpoint location:           0/2000058  > Prior checkpoint location:
0/2000058 > Latest checkpoint's REDO location:    0/2000020 !> Latest checkpoint's TimeLineID:       1
 
 We will see this changing as follows after crash recovery,
  > Latest timeline ID:                   2  > Latest checkpoint location:           0/54D9918  > Prior checkpoint
location:           0/2000058  > Latest checkpoint's REDO location:    0/54D9918  > Latest checkpoint's TimeLineID:
 2
 
 Then, we should see both two 'ABCDE...'s and two 'VWXYZ...'s in the table after the crash recovery.

What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d68760..70b4972 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5276,6 +5276,7 @@ BootStrapXLOG(void)    ControlFile->system_identifier = sysidentifier;    ControlFile->state =
DB_SHUTDOWNED;   ControlFile->time = checkPoint.time;
 
+    ControlFile->latestTLI = ThisTimeLineID;    ControlFile->checkPoint = checkPoint.redo;
ControlFile->checkPointCopy= checkPoint;
 
@@ -6083,7 +6084,7 @@ StartupXLOG(void)     * Initialize on the assumption we want to recover to the same timeline
*that's active according to pg_control.     */
 
-    recoveryTargetTLI = ControlFile->checkPointCopy.ThisTimeLineID;
+    recoveryTargetTLI = ControlFile->latestTLI;    /*     * Check for recovery control file, and if so set up state
foroffline
 
@@ -6100,11 +6101,11 @@ StartupXLOG(void)     * timeline.     */    if (!list_member_int(expectedTLIs,
-                         (int) ControlFile->checkPointCopy.ThisTimeLineID))
+                         (int) ControlFile->latestTLI))        ereport(FATAL,                (errmsg("requested
timeline%u is not a child of database system timeline %u",                        recoveryTargetTLI,
 
-                        ControlFile->checkPointCopy.ThisTimeLineID)));
+                        ControlFile->latestTLI)));    /*     * Save the selected recovery target timeline ID and
@@ -6791,9 +6792,12 @@ StartupXLOG(void)     *     * In a normal crash recovery, we can just extend the timeline we
werein.     */
 
+
+    ThisTimeLineID = findNewestTimeLine(recoveryTargetTLI);
+        if (InArchiveRecovery)    {
-        ThisTimeLineID = findNewestTimeLine(recoveryTargetTLI) + 1;
+        ThisTimeLineID++;        ereport(LOG,                (errmsg("selected new timeline ID: %u",
ThisTimeLineID)));       writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI,
 
@@ -6946,6 +6950,7 @@ StartupXLOG(void)    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);    ControlFile->state =
DB_IN_PRODUCTION;   ControlFile->time = (pg_time_t) time(NULL);
 
+    ControlFile->latestTLI = ThisTimeLineID;    UpdateControlFile();    LWLockRelease(ControlFileLock);
@@ -8710,12 +8715,6 @@ xlog_redo(XLogRecPtr lsn, XLogRecord *record)            SpinLockRelease(&xlogctl->info_lck);
   }
 
-        /* TLI should not change in an on-line checkpoint */
-        if (checkPoint.ThisTimeLineID != ThisTimeLineID)
-            ereport(PANIC,
-                    (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record",
-                            checkPoint.ThisTimeLineID, ThisTimeLineID)));
-        RecoveryRestartPoint(&checkPoint);    }    else if (info == XLOG_NOOP)
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 38c263c..7f2cdb8 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -192,6 +192,8 @@ main(int argc, char *argv[])           dbState(ControlFile.state));    printf(_("pg_control last
modified:            %s\n"),           pgctime_str);
 
+    printf(_("Latest timeline ID:                   %d\n"),
+           ControlFile.latestTLI);    printf(_("Latest checkpoint location:           %X/%X\n"),
ControlFile.checkPoint.xlogid,          ControlFile.checkPoint.xrecoff);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 5cff396..c78d483 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -21,7 +21,7 @@/* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION    922
+#define PG_CONTROL_VERSION    923/* * Body of CheckPoint XLOG records.  This is declared here because we keep
@@ -116,6 +116,7 @@ typedef struct ControlFileData     */    DBState        state;            /* see enum above */
pg_time_t   time;            /* time stamp of last pg_control update */
 
+    TimeLineID  latestTLI;      /* latest TLI we reached */    XLogRecPtr    checkPoint;        /* last check point
recordptr */    XLogRecPtr    prevCheckPoint; /* previous check point record ptr */ 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 70b4972..574ecfb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6914,9 +6914,17 @@ StartupXLOG(void)         * allows some extra error checking in xlog_redo.         */        if
(bgwriterLaunched)
-            RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-                              CHECKPOINT_IMMEDIATE |
-                              CHECKPOINT_WAIT);
+        {
+            checkPointLoc = ControlFile->prevCheckPoint;
+            record = ReadCheckpointRecord(checkPointLoc, 2);
+            if (record != NULL)
+                ereport(LOG,
+                        (errmsg("Checkpoint on recovery end was skipped")));
+            else
+                RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+                                  CHECKPOINT_IMMEDIATE |
+                                  CHECKPOINT_WAIT);
+        }        else            CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
#! /bin/sh
export PGDATA1=/ext/horiguti/pgdata1
export PGDATA2=/ext/horiguti/pgdata2
echo Shutting down servers
pg_ctl -D $PGDATA2 stop -m i
pg_ctl -D $PGDATA1 stop -m i
sleep 5
echo Remove old clusters
rm -rf $PGDATA1 $PGDATA2 /tmp/hoge /tmp/hoge1
echo Creating master database cluster
initdb -D $PGDATA1 --no-locale --encoding=utf8
cp ~/work_repl/mast_conf/* $PGDATA
echo Starting master
pg_ctl -D $PGDATA1 start
sleep 5
echo Taking base backup for slave
pg_basebackup -h /tmp -p 5432 -D $PGDATA2 -X stream
cp ~/work_repl/repl_conf/* $PGDATA2
echo Done, starting slave
pg_controldata $PGDATA2 > ~/control_01_before_slave_start
pg_ctl -D $PGDATA2 start
sleep 5
pg_controldata $PGDATA2 > ~/control_02_after_slave_start
echo creating database.
createdb $USER
echo Proceeding WALS
psql -h /tmp -p 5432 -c "create table foo (a text)";
psql -h /tmp -p 5432 -c "insert into foo (select repeat('abcde', 1000) from generate_series(1, 200000)); delete from
foo;"
psql -h /tmp -p 5432 -c "insert into foo (select repeat('ABCDE', 10) from generate_series(1, 2));"
pg_controldata $PGDATA2 > ~/control_03_WAL_proceeded
echo Promoting slave
pg_ctl -D $PGDATA2 promote
sleep 5
pg_controldata $PGDATA2 > ~/control_04_After_promoted
echo "Killing PostgreSQL's without taking checkpoint"
psql -h /tmp -p 5433 -c "insert into foo (select repeat('VWXYZ', 10) from generate_series(1, 2));"
killall -9 postgres
pg_controldata $PGDATA2 > ~/control_05_Killed_without_checkpoint
rm -f /tmp/hoge /tmp/hoge1
echo DONE

pgsql-hackers by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: [COMMITTERS] pgsql: New SQL functons pg_backup_in_progress() and pg_backup_start_tim
Next
From: Talha Bin Rizwan
Date:
Subject: Libxml2 load error on Windows