Archival API - Mailing list pgsql-hackers-pitr
From | Simon Riggs |
---|---|
Subject | Archival API |
Date | |
Msg-id | 003401c3f5a7$2887fc50$d9f887d9@LaptopDellXP Whole thread Raw |
In response to | Re: Proposals for PITR (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers-pitr |
>Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Simon Riggs wrote > > - Write application to archive WAL files to tape, disk, or network > > Probably need to do first part, but I'm arguing not to do the copy to > > tape.. > > I'd like to somehow see this handled by a user-supplied program or > script. What we mainly need is to define a good API that lets the > archiver program understand which WAL segment files to archive when. > > > B - Backing up WAL log files > > -Ordinarily, when old log segment files are no longer needed, they are > > recycled (renamed to become the next segments in the numbered sequence). > > This means that the data within them must be copied from there to > > another location > > AFTER postgres has closed that file > > BEFORE it is renamed and recycled > > My inclination would be to change the backend code so that as soon as a > WAL segment is completed, it is flagged as being ready to dump to tape > (or wherever). Possibly the easiest way to do this is to rename the > segment file somehow, perhaps "nnn" becomes "nnn.full". Then, after the > archiver process has properly dumped the file, reflag it as being dumped > (perhaps rename to "nnn.done"). Obviously there are any number of ways > we could do this flagging, and depending on an OS rename facility might > not be the best. Yes, that would be the correct time to begin archive. The way the code is currently written there is a slot in MoveOfflineLogs which looks to see if XLOG_archive_dir is set before entering a section which is empty apart from a message. That routine doesn't get called until we're about to recycle the files, which means we've lost our window of opportunity to archive them. Making the number of files larger doesn't effect that being called last.... I'm going to ignore that "hint" and any patch will include deletion of that code to avoid later confusion. The log switch and close occurs during XLogWrite, when it is established that there is no more room in the current log file for the current segment. The file-flagging mechanism only allows a single archiver program to operate, so I'll structure it as a new function XLogArchiveNotify() so we can add in extra stuff later to improve/change things. That way we have a home for the API. > A segment then can be recycled when it is both (a) older than the latest > checkpoint and (b) flagged as dumped. Note that this approach allows > dumping of a file to start before the first time at which it could be > recycled. In the event of a crash and restart, WAL replay has to be > able to find the flagged segments, so the flagging mechanism can't be > one that would make this impossible. The number of WAL logs is effectively tunable anyway because it depends on the number of checkpoint segments, so we can increase that if there are issues with archival speed v txn rate. The rename is always safe because the log file names never wrap. However, I'm loathe to touch the files, in case something crashes somewhere and we are left with recovery failing because of an unlocatable file. (To paraphrase one of the existing code comments, only the truly paranoid survive). A similar way is to have a "buddy" file, which indicates whether it is full and ready for archival. i.e. when we close file "nnn" we also write an nearly/empty file called "nnn.full". That file can then be deleted later BY THE archiver once archival has finished, allowing it to be recycled by InstallXLogFileSegment(). (Would require at least 6 more file descriptors, but I'm not sure if that's an issue). InstallXLogFileSegment() can check for XLogArchiveBusy() to see whether it is allowed to reuse or allocate a new one. In initial implementation this would just test to see whether "nnn.full" still exists. This will allow a range of behaviour to be catered for, such as long waits while manual tape mounts are requested by the archiver etc.. So in summary, the API is: Archiver initialises and waits on notify Postgresql initialises ...then Postgresql fills log, switches and close it, then calls XLogArchiveNotify() Archiver moves log somewhere safe, then sets state such that... ...sometime later Postgresql checks XLogArchiveBusy() to see if its safe to recycle file and discovers the state set by API is completely unintrusive on current tried and tested operation, and leaves the archiver(s) free to act as they choose, outside of the address space of PostgreSQL. That way we don't have to update regession tests with some destructive non-manual crash tests to show that works. Clearly, we wouldn't want WAL logs to hang around too long, so we need an initiation method for the archival process. Otherwise, we'll be writing "nnn.full" notifications yet without anybody ever deleting them. Either this could be set at startup with an archive_log_mode parameter (OK, the names been used before, but if the cap fits, wear it) or setting a maximum limit to number of archive logs and a few other ideas, none of which I like. Hmmmm...any listeners got any ideas here? How do we want this to work? Anybody want to write a more complex archiver process to act as more than just a test harness? Best regards, Simon Riggs
pgsql-hackers-pitr by date: