BUG #14109: pg_rewind fails to update target control file in one scenario - Mailing list pgsql-bugs

From johnlumby@hotmail.com
Subject BUG #14109: pg_rewind fails to update target control file in one scenario
Date
Msg-id 20160424192549.2725.71787@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #14109: pg_rewind fails to update target control file in one scenario
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14109
Logged by:          John Lumby
Email address:      johnlumby@hotmail.com
PostgreSQL version: 9.5.1
Operating system:   linux 64-bit
Description:

scenario :
 two systems currently in an operating streaming replication relationship :
       Primary systemA             Standby SystemB
 with no WAL queued and no inserts/updates/deletes now being performed on
systemA

  then in chronological sequence :
   .  shut down SystemA
   .  pg_ctl promote SystemB
              and verify systemB is running correctly stand-alone
   .  pg_rewind SystemA
              output is something like
                    connected to server
                    fetched file "global/pg_control", length 8192
                    fetched file "pg_xlog/0000000D.history", length 388
                    servers diverged at WAL position 9/A90002A8 on timeline
12
                    no rewind required

   .  set up correct recovery.conf on SystemA
   .  start SystemA postgres server

 At this point,  both systemB and systemA appear to be running correctly,
 but any insert/update/delete now performed on systemB is not replicated to
systemA.
 Also pg_stat_replication view on systemB shows state 'startup' ,  not
'streaming'

I believe there is a bug in pg_rewind for this scenario,   where it finds
that
the following conditions are true :
  1 - source and target cluster are not on the same timeline
  2 - the histories diverged exactly at the end of the
      shutdown checkpoint record on the target,
      so there are no WAL records in the target
      that don't belong in the source's history

The code then concludes that no rewind is needed.

Which is true  --
However,  what I believe *is* needed is to update the target control file
with the new timeline and other information from the source.

This patch seems to fix the problem on my system :

--- src/bin/pg_rewind/pg_rewind.c.orig    2016-02-08 16:12:28.000000000 -0500
+++ src/bin/pg_rewind/pg_rewind.c    2016-04-24 14:50:52.646737233 -0400
@@ -247,7 +247,14 @@ main(int argc, char **argv)
              * needed.
              */
             if (chkptendrec == divergerec)
+            {
                 rewind_needed = false;
+                /*  however we must still copy the control file from source
to target
+                 *  because of the timeline change.
+                 */
+                printf(_("no rewind required but will update global control file from
source for increase in timeline.\n"));
+                goto updateControlFile;
+            }
             else
                 rewind_needed = true;
         }
@@ -318,6 +325,7 @@ main(int argc, char **argv)
     pg_log(PG_PROGRESS, "\ncreating backup label and updating control
file\n");
     createBackupLabel(chkptredo, chkpttli, chkptrec);

+  updateControlFile:
     /*
      * Update control file of target. Make it ready to perform archive
      * recovery when restarting.

pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: Re: BUG #14081: System LC_COLLATE changed
Next
From: Michael Paquier
Date:
Subject: Re: BUG #14109: pg_rewind fails to update target control file in one scenario