Re: Assertion failure when promoting node by deleting recovery.conf and restart node - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Assertion failure when promoting node by deleting recovery.conf and restart node
Date
Msg-id 5150A231.30702@vmware.com
Whole thread Raw
In response to Assertion failure when promoting node by deleting recovery.conf and restart node  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Assertion failure when promoting node by deleting recovery.conf and restart node  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Assertion failure when promoting node by deleting recovery.conf and restart node  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 15.03.2013 04:25, Michael Paquier wrote:
> Hi,
>
> When trying to *promote* a slave as master by removing recovery.conf and
> restarting node, I found an assertion failure on master branch:
> LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
> "xlog.c", Line: 4954)
> (gdb) bt
> #0  0x00007f95af03b2c5 in raise () from /usr/lib/libc.so.6
> #1  0x00007f95af03c748 in abort () from /usr/lib/libc.so.6
> #2  0x000000000086ce71 in ExceptionalCondition (conditionName=0x8f2af0
> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
> "FailedAssertion", fileName=0x8f076b "xlog.c",
>      lineNumber=4954) at assert.c:54
> #3  0x00000000004fe499 in StartupXLOG () at xlog.c:4954
> #4  0x00000000006f9d34 in StartupProcessMain () at startup.c:224
> #5  0x000000000050ef92 in AuxiliaryProcessMain (argc=2,
> argv=0x7fffa6fc3d20) at bootstrap.c:423
> #6  0x00000000006f8816 in StartChildProcess (type=StartupProcess) at
> postmaster.c:4956
> #7  0x00000000006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
> postmaster.c:1237
> #8  0x000000000065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
> Ok, this is not the cleanest way to promote a node as it doesn't do any
> safety checks relation at promotion but 9.2 and previous versions allowed
> to do that properly.
>
> The assertion has been introduced by commit 3f0ab05 in order to record
> properly minRecoveryPointTLI in control file at the end of recovery in the
> case of a crash.
> However, in the case of a slave node properly shutdown in recovery which is
> then restarted as a master, the code path of this assertion is taken.
> What do you think of the patch attached? It avoids the update of
> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
> while in recovery.
> Another possibility could be to add in the assertion some conditions based
> on the state of controlFile but I think it is more consistent simply not to
> update those fields.

Simon, can you comment on this? ISTM we could just remove the assertion 
and update the comment to mention that this can happen. If there is a 
min recovery point, surely we always need to recover to the timeline 
containing that point, so setting recoveryTargetTLI to 
minRecoveryPointTLI seems sensible.

- Heikki



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: backward incompatible pg_basebackup and pg_receivexlog
Next
From: Brendan Jurd
Date:
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)