Re: Skip checkpoint on promoting from streaming replication - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Skip checkpoint on promoting from streaming replication
Date
Msg-id CAHGQGwH8-Ju018zo6KNakqFZCDans_jU5AjF+JUTR7YsjUjD0g@mail.gmail.com
Whole thread Raw
In response to Re: Skip checkpoint on promoting from streaming replication  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Skip checkpoint on promoting from streaming replication  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Tue, Jun 19, 2012 at 5:30 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Thank you.
>
>> What happens if the server skips an end-of-recovery checkpoint,
>> is promoted to the master, runs some write transactions,
>> crashes and restarts automatically before it completes
>> checkpoint? In this case, the server needs to do crash recovery
>> from the last checkpoint record with old timeline ID to the
>> latest WAL record with new timeline ID. How does crash recovery
>> do recovery beyond timeline?
>
> Basically the same as archive recovery as far as I saw. It is
> already implemented to work in that way.
>
> After this patch applied, StartupXLOG() gets its
> recoveryTargetTLI from the new field lastestTLI in the control
> file instead of the latest checkpoint. And the latest checkpoint
> record informs its TLI and WAL location as before, but its TLI
> does not have a significant meaning in the recovery sequence.
>
> Suggest the case following,
>
>      |seg 1     | seg 2    |
> TLI 1 |...c......|....000000|
>          C           P  X
> TLI 2            |........00|
>
> * C - checkpoint, P - promotion, X - crash just after here
>
> This shows the situation that the latest checkpoint(restartpoint)
> has been taken at TLI=1/SEG=1/OFF=4 and promoted at
> TLI=1/SEG=2/OFF=5, then crashed just after TLI=2/SEG=2/OFF=8.
> Promotion itself inserts no wal records but creates a copy of the
> current segment for new TLI. the file for TLI=2/SEG=1 should not
> exist. (Who will create it?)
>
> The control file will looks as follows
>
> latest checkpoint : TLI=1/SEG=1/OFF=4
> latest TLI        : 2
>
> So the crash recovery sequence starts from SEG=1/LOC=4.
> expectedTLIs will be (2, 1) so 1 will naturally be selected as
> the TLI for SEG1 and 2 for SEG2 in XLogFileReadAnyTLI().
>
> In the closer view, startup constructs expectedTLIs reading the
> timeline hisotry file corresponds to the recoveryTargetTLI. Then
> runs the recovery sequence from the redo point of the latest
> checkpoint using WALs with the largest TLI - which is
> distinguised by its file name, not header - within the
> expectedTLIs in XLogPageRead(). The only difference to archive
> recovery is XLogFileReadAnyTLI() reads only the WAL files already
> sit in pg_xlog directory, and not reaches archive. The pages with
> the new TLI will be naturally picked up as mentioned above in
> this sequence and then will stop at the last readable record.
>
> latestTLI field in the control file is updated just after the TLI
> was incremented and the new WAL files with the new TLI was
> created. So the crash recovery sequence won't stop before
> reaching the WAL with new TLI disignated in the control file.

Is it guaranteed that all the files (e.g., the latest timeline history file)
required for such crash recovery exist in pg_xlog? If yes, your
approach might work well.

Regards,

--
Fujii Masao


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WAL format changes
Next
From: Robert Haas
Date:
Subject: Re: Transactions over pathological TCP connections