Re: Assertion failure when promoting node by deleting recovery.conf and restart node - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Assertion failure when promoting node by deleting recovery.conf and restart node
Date
Msg-id CA+U5nM+O+30Y9=+e42dAB1Pef1WP8dSUsp_C41-sMVMe5F3NdQ@mail.gmail.com
Whole thread Raw
In response to Re: Assertion failure when promoting node by deleting recovery.conf and restart node  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 25 March 2013 19:14, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 15.03.2013 04:25, Michael Paquier wrote:
>>
>> Hi,
>>
>> When trying to *promote* a slave as master by removing recovery.conf and
>> restarting node, I found an assertion failure on master branch:
>> LOG:  database system was shut down in recovery at 2013-03-15 10:22:27 JST
>> TRAP: FailedAssertion("!(ControlFile->minRecoveryPointTLI != 1)", File:
>> "xlog.c", Line: 4954)
>> (gdb) bt
>> #0  0x00007f95af03b2c5 in raise () from /usr/lib/libc.so.6
>> #1  0x00007f95af03c748 in abort () from /usr/lib/libc.so.6
>> #2  0x000000000086ce71 in ExceptionalCondition (conditionName=0x8f2af0
>> "!(ControlFile->minRecoveryPointTLI != 1)", errorType=0x8f0813
>> "FailedAssertion", fileName=0x8f076b "xlog.c",
>>      lineNumber=4954) at assert.c:54
>> #3  0x00000000004fe499 in StartupXLOG () at xlog.c:4954
>> #4  0x00000000006f9d34 in StartupProcessMain () at startup.c:224
>> #5  0x000000000050ef92 in AuxiliaryProcessMain (argc=2,
>> argv=0x7fffa6fc3d20) at bootstrap.c:423
>> #6  0x00000000006f8816 in StartChildProcess (type=StartupProcess) at
>> postmaster.c:4956
>> #7  0x00000000006f39e9 in PostmasterMain (argc=6, argv=0x1c950a0) at
>> postmaster.c:1237
>> #8  0x000000000065d59b in main (argc=6, argv=0x1c950a0) at main.c:197
>> Ok, this is not the cleanest way to promote a node as it doesn't do any
>> safety checks relation at promotion but 9.2 and previous versions allowed
>> to do that properly.
>>
>> The assertion has been introduced by commit 3f0ab05 in order to record
>> properly minRecoveryPointTLI in control file at the end of recovery in the
>> case of a crash.
>> However, in the case of a slave node properly shutdown in recovery which
>> is
>> then restarted as a master, the code path of this assertion is taken.
>> What do you think of the patch attached? It avoids the update of
>> recoveryTargetTLI and recoveryTargetIsLatest if the node has been shutdown
>> while in recovery.
>> Another possibility could be to add in the assertion some conditions based
>> on the state of controlFile but I think it is more consistent simply not
>> to
>> update those fields.
>
>
> Simon, can you comment on this? ISTM we could just remove the assertion and
> update the comment to mention that this can happen. If there is a min
> recovery point, surely we always need to recover to the timeline containing
> that point, so setting recoveryTargetTLI to minRecoveryPointTLI seems
> sensible.

Fixed using the latest TLI available and removing the assertion.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Fast promotion failure
Next
From: Hitoshi Harada
Date:
Subject: Re: Parallel Sort