Re: pg_rewind exiting with error code 1 when source and target are on the same timeline - Mailing list pgsql-bugs

From Tom Lane
Subject Re: pg_rewind exiting with error code 1 when source and target are on the same timeline
Date
Msg-id 937.1450134675@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_rewind exiting with error code 1 when source and target are on the same timeline  (Peter Eisentraut <peter_e@gmx.net>)
Responses Re: pg_rewind exiting with error code 1 when source and target are on the same timeline  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-bugs
Peter Eisentraut <peter_e@gmx.net> writes:
> On 12/3/15 11:10 PM, Michael Paquier wrote:
>> On Fri, Dec 4, 2015 at 12:22 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>>> After playing with this a bit, I think your patch is correct.  The code
>>> has drifted a bit in the meantime, so attached is an updated patch.

>> Thanks for looking at it.

> I committed this to master.  It's also on the 9.5 open item list, but if
> I backport it then the tests don't pass.  Still looking.  Not sure yet
> if this is because of code changes in pg_rewind master or test
> infrastructure changes in master.

I poked into this and found that the problem is that 9.5 is lacking the
hunks of commit e50cda78 that teach sanityChecks() to allow the control
file state to be DB_SHUTDOWNED_IN_RECOVERY, to wit

@@ -374,10 +380,11 @@ sanityChecks(void)
    /*
     * Target cluster better not be running. This doesn't guard against
     * someone starting the cluster concurrently. Also, this is probably more
-    * strict than necessary; it's OK if the master was not shut down cleanly,
-    * as long as it isn't running at the moment.
+    * strict than necessary; it's OK if the target node was not shut down
+    * cleanly, as long as it isn't running at the moment.
     */
-   if (ControlFile_target.state != DB_SHUTDOWNED)
+   if (ControlFile_target.state != DB_SHUTDOWNED &&
+       ControlFile_target.state != DB_SHUTDOWNED_IN_RECOVERY)
        pg_fatal("target server must be shut down cleanly\n");

    /*
@@ -385,75 +392,149 @@ sanityChecks(void)
     * server is shut down. There isn't any very strong reason for this
     * limitation, but better safe than sorry.
     */
-   if (datadir_source && ControlFile_source.state != DB_SHUTDOWNED)
+   if (datadir_source &&
+       ControlFile_source.state != DB_SHUTDOWNED &&
+       ControlFile_source.state != DB_SHUTDOWNED_IN_RECOVERY)
        pg_fatal("source data directory must be shut down cleanly\n");
 }

(Actually, it's only the second of these that is critical to make the
test pass, but I should think we should apply both of them if either.)

If I apply these, without any of the rest of e50cda78, everything seems
fine.  I'm going to go ahead and push that in the interests of getting
some buildfarm cycles on it; but if someone could confirm that this
is not an insane thing to do, it'd help.

            regards, tom lane

pgsql-bugs by date:

Previous
From: sienkomarcin@gmail.com
Date:
Subject: BUG #13817: Query planner strange choose while select/count small part of big table - complete
Next
From: Michael Paquier
Date:
Subject: Re: pg_rewind exiting with error code 1 when source and target are on the same timeline