I'd like to take a crack at implementing point in time recovery. My
plan is to do this as gently as possible in a number of small
self-contained steps. I'd appreciate lots of critcial feedback.
Alternatively, if someone else is looking into this please let me know
so I can either back off or help out.
Stage 1
Add hooks for begin_backup and end_backup at a data file level. Between
the calls begin_backup(myfile) and end_backup(myfile), writes to myfile
will be disabled allowing the file to be safely copied.
Unfortunately, it seems that this approach is fundamentally in conflict
with the principle of a checkpoint, which, as I understand it, is
supposed to flush all unwritten changes to disk.
My solution is to allow the checkpoint to complete without flushing
myfiles buffers, and then on end_backup(myfile) either perform another
checkpoint or just flush the buffers for myfile. As long as you cannot
shut down the database untile end_backup has completed its checkpoint I
think all should be well.
So, have I missed something? Would it instead be better to block the
checkpoint until end_backup(myfile) is complete? Any other ideas?
Stage 2
Add a move_file function. This will, use the begin/end_backup hooks,
make a copy of a datafile, create a symlink for it where the original
file used to be, and close and re-open the file without altering any
buffers.
There have been a few mutterings on the mailing lists recently that
suggest this would be useful, and it's a relatively easy test case for
the begin/end_backup hooks.
Stage 3
Add a pgtar executable that will create a tar of all data files using
the begin/end_backup hooks.
Stage 4
Provide some sort of archiving mechanism for WAL files.
Stage 5
Provide some control over the recovery process. This has to deal with
recovery using both up to date control files, and control files from a
backup set.
Sorry for cross posting this to pgsql-general but my last post to
hackers never made it.
--
Marc marc@bloodnok.com