Re: Fast promotion failure - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Fast promotion failure
Date
Msg-id 519A619A.7070609@vmware.com
Whole thread Raw
In response to Re: Fast promotion failure  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Fast promotion failure  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 19.05.2013 17:25, Simon Riggs wrote:
> However, there is a call to RecoveryInProgress() at the top of the
> main loop of the checkpointer, which does explicitly state that it
> "initializes TimeLineID if it's not set yet." The checkpointer makes
> the decision about whether to run a restartpoint or a checkpoint
> directly from the answer to that, modified only in the case of an
> CHECKPOINT_END_OF_RECOVERY.
>
> So the appropiate call is made and I don't agree with the statement
> that this "doesn't get executed with fast promotion", or the title of
> the thread.

Hmm, I see.

> So while I believe that the checkpointer might have an incorrect TLI
> and that you've seen a bug, what isn't clear is that the checkpointer
> is the only process that would see an incorrect TLI, or why such
> processes see an incorrect TLI. It seems equally likely at this point
> that the TLI may be set incorrectly somehow and that is why it is
> being read incorrectly.

Yeah. I spent some more time debugging this, and found the culprit. In 
CreateRestartPoint, we do this:

>         /*
>          * Update ThisTimeLineID to the timeline we're currently replaying,
>          * so that we install any recycled segments on that timeline.
>          *
>          * There is no guarantee that the WAL segments will be useful on the
>          * current timeline; if recovery proceeds to a new timeline right
>          * after this, the pre-allocated WAL segments on this timeline will
>          * not be used, and will go wasted until recycled on the next
>          * restartpoint. We'll live with that.
>          */
>         (void) GetXLogReplayRecPtr(&ThisTimeLineID);

Here's what happens:

0. System is in recovery
1. checkpointer process starts creating a restartpoint.
2. Startup process ends recovery, creates end-of-recovery record, sets 
the shared ThisTimeLineID value to 2, and exits.
3. checkpointer process calls RecoveryInProgess() while performing the 
restartpoint (there are RecoveryInProgress() calls in many places, e.g 
in XLogNeedsFlush()). It sets LocalRecoveryInProgress = true, and 
initializes ThisTimeLineID to 2.
4. At the end of restartpoint, the checkpointer process does the above 
GetXLogReplayRecPtr(&ThisTimeLineID). That overwrites ThisTimeLineID 
back to 1.
5. Checkpointer calls RecoveryInProgress, which returns true 
immediately, as LocalRecoveryInProgress is already set.
5. Checkpointer performs the first checkpoint, with ThisTimeLineID set 
incorrectly to 1.

Not sure what the best fix would be. Perhaps change the code in 
CreateRestartPoint() to do something like this instead:

GetXLogReplayRecPtr(&replayTLI);
if (RecoveryInProgress())  ThisTimeLineID = replayTLI;


> I see that the comment in InitXLOGAccess() is incorrect
> "ThisTimeLineID doesn't change so we need no lock to copy it", which
> seems worth correcting since there's no need to save time in a once
> only function.

Hmm, the code seems right to me, XLogCtl->ThisTimeLineID indeed doesn't 
change, once it's set. And InitXLOGAccess() isn't called until it is 
set. The comment could explain that better, though.

- Heikki



pgsql-hackers by date:

Previous
From: 李海龙
Date:
Subject: Re: I s this a bug of spgist index in a heavy write condition?
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench vs. SERIALIZABLE