Re: Point in Time Recovery - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Point in Time Recovery |
Date | |
Msg-id | 1090015265.17493.10536.camel@stromboli Whole thread Raw |
In response to | Re: Point in Time Recovery (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
On Fri, 2004-07-16 at 19:30, Bruce Momjian wrote: > Simon Riggs wrote: > > On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: > > > > >> Do we need a checkpoint after the archiving > > > > >> starts but before the backup begins? > > > > > > > > > No. > > > > > > > > Actually yes. > > > > > > Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( > > > So yes, you need one checkpoint after archiving starts. Imho turning on xlog > > > archiving should issue such a checkpoint just to be sure. > > > > > > > By agreement, archive_mode can only be turned on at postmaster startup, > > which means you always have a checkpoint - either because you shut it > > down cleanly, or you didn't and it recovers, then writes one. > > > > There is always something to start the rollforward. > > > > So, non-issue. > I was discussing the claim that there might not be a checkpoint to begin the rollforward from. There always is: if you are in archive_mode=true then you will always have a checkpoint that can be used for recovery. It may be "a long way in the past", if there has been no write activity, but the rollforward will very very quick, since there will be no log records. > I don't think so. I can imagine many cases where you want to do a > nightly tar backup without turning archiving on/off or restarting the > postmaster. This is a misunderstanding. I strongly agree with what you say: the whole system has been designed to avoid any benefit from turning on/off archiving and there is no requirement to restart postmaster to take backups. > In those cases, a manual checkpoint would have to be issued > before the backup begins. A manual checkpoint doesn't HAVE TO be issued. Presumably most systems will be running checkpoint every few minutes. Wherever the last one was is where the rollforward would start from. But you can if thats the way you want to do things, just wait long enough for the checkpoint to have completed, otherwise your objective of reducing rollforward time will not be met. (please note my earlier reported rollback performance of approximately x10 rate of recovery v elapsed time - will require testing on your own systems). > Imagine a system that is up for a month, and they don't have enough > archive space to keep a months worth of WAL files. They would probably > do nightly or weekend tar backups, and then discard the WAL archives. > Yes, that would be normal practice. I would recommend keeping at least the last 3 full backups and all of the WAL logs to cover that period. > What procedure would they use? I assume they would copy all their old > WAL files to a save directory, issue a checkpoint, do a tar backup, then > they can delete the saved WAL files. Is that correct? PITR is designed to interface with a wide range of systems, through the extensible archive/recovery program interface. We shouldn't focus on just tar backups - if you do, then the whole thing seems less feature-rich. The current design allows interfacing with tape, remote backup, internet backup providers, automated standby servers and the dozen major storage/archive vendors' solutions. Writing a procedure to backup, assign filenames, keep track of stuff isn't too difficult if you're a competent DBA with a mild knowledge of shell or perl scripting. But if data is important, people will want to invest the time and trouble to adopt one of the open source or licenced vendors that provide solutions in this area. Systems management is a discipline and procedures should be in place for everything. I fully agree with the "automate everything" dictum, but just don't want to constrain people too much to a particular way of doing things. -o-o- Overall, for first release, I think the complexity of this design is acceptable. PITR is similar to Oracle7 Backup/Recovery, and easily recognisable to any DBA with current experience of current SQL Server, DB2 (MVS, UDB) or Teradata systems. [I can't comment much on Ingres, Informix, Sybase etc] My main areas of concern are: - the formal correctness of the recovery process As a result of this concern, PITR makes ZERO alterations to the recovery code itself. The trick is to feed it the right xlog files and to stop, if required, at the right place and allow normal work to resume. - the robustness and quality of my implementation This requires quality checking of the code and full beta testing -o-o- We've raised a couple of valid points on the lists in the last few days: - its probably a desirable feature (but not essential) to implement a write suspend feature on the bgwriter, if nothing else it will be a confidence building feature...as said previously, for many people, this will not be required, but people will no doubt keep asking - there is a small window of risk around the possibility that a recovery target might be set by the user that doesn't rollforward all the way past the end of the backup. That is real, but in general, people aren't likely to be performing archive recovery within minutes of a backup being taken - and if they are, they can always start from the previous one to that. This is a gap we should close, but its just something to be aware of...just like pg_dump not sorting things in the correct order in its first release. Not for now, but soon, I would propose: - a command to suspend/resume bgwriter to allow backups. - use the suspend/resume feature to write a log record "backup end marker" which shows when this took place. Ensure that any rollforward goes through AT LEAST ONE "backup end marker" on its way. (If a Point in Time is specified too early, we can check this immediately against the checkpoint record. We can then refuse to stop at eny point in time earlier than the backup end marker. I've written a todo list and will post this again soon. Best Regards, Simon Riggs
pgsql-hackers by date: