Re: Point in Time Recovery - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: Point in Time Recovery
Date
Msg-id 40F7176E.4000001@coretech.co.nz
Whole thread Raw
In response to Re: Point in Time Recovery  (markw@osdl.org)
Responses Re: Point in Time Recovery  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs wrote:

>
>So far:
>
>I've tried to re-create the problem as exactly as I can, but it works
>for me. 
>
>This is clearly an important case to chase down.
>
>I assume that this is the very first time you tried recovery? Second and
>subsequent recoveries using the same set have a potential loophole,
>which we have been discussing.
>
>Right now, I'm thinking that the "exactly 2 logs worth" of data has
>brought you very close to the end of the log file (FFFFE0) ending with 1
>and the shutdown checkpoint that is then subsequently written is
>failing.
>
>Can you repeat this your end?
>
>  
>
It is repeatable at my end. It is actually fairly easy to recreate the 
example I am using, download 

http://sourceforge.net/projects/benchw

and generate the dataset for Pg - but trim the large "fact0.dat" dump 
file using head -100000.
Thus step 7 consists of creating the 4 tables and COPYing in the data 
for them.

>The nearest I can get to the exact record pointers you show are to start
>recovery at A4807C and to end at with FFFF88.
>
>Overall, PITR changes the recovery process very little, if at all. The
>main areas of effect are to do with sequencing of actions and matching
>up the right logs with the right backup. I'm not looking for bugs in the
>code but in subtle side-effects and "edge" cases. Everything you can
>tell me will help me greatly in chasing that down. 
>
>  
>
I agree - I will try this sort of example again, but will change the 
number of rows I am COPYing (currently 100000) and see if that helps.

>Best Regards, Simon Riggs
>
>  
>

By way of contrast, using the *same* procedure (1-11), but generating 2 
logs worth of INSERTS/UPDATES using 10 concurrent process *works fine* - 
e.g :

LOG:  database system was interrupted at 2004-07-16 11:17:52 NZST
LOG:  recovery command file found...
LOG:  restore_program = cp %s/%s %s
LOG:  recovery_target_inclusive = true
LOG:  recovery_debug_log = true
LOG:  starting archive recovery
LOG:  restored log file "0000000000000000" from archive
LOG:  checkpoint record is at 0/A4803C
LOG:  redo record is at 0/A4803C; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 496; next OID: 25419
LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  redo starts at 0/A4807C
postmaster starting
[postgres@shroudeater 7.5]$ LOG:  restored log file "0000000000000001" 
from archive
cp: cannot stat `/data1/pgdata/7.5-archive/0000000000000002': No such 
file or directory
LOG:  could not restore "0000000000000002" from archive
LOG:  could not open file "/data1/pgdata/7.5/pg_xlog/0000000000000002" 
(log file 0, segment 2): No such file or directory
LOG:  redo done at 0/1FFFFD4
LOG:  archive recovery complete
LOG:  database system is ready
LOG:  archiver started




pgsql-hackers by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Point in Time Recovery
Next
From: Simon Riggs
Date:
Subject: Re: Point in Time Recovery