Archival API - Mailing list pgsql-hackers-pitr

From Simon Riggs
Subject Archival API
Date
Msg-id 003401c3f5a7$2887fc50$d9f887d9@LaptopDellXP
Whole thread Raw
In response to Re: Proposals for PITR  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers-pitr
>Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Simon Riggs wrote
> > - Write application to archive WAL files to tape, disk, or network
> > Probably need to do first part, but I'm arguing not to do the copy
to
> > tape..
>
> I'd like to somehow see this handled by a user-supplied program or
> script.  What we mainly need is to define a good API that lets the
> archiver program understand which WAL segment files to archive when.
>
> > B - Backing up WAL log files
> > -Ordinarily, when old log segment files are no longer needed, they
are
> > recycled (renamed to become the next segments in the numbered
sequence).
> > This means that the data within them must be copied from there to
> > another location
> >     AFTER postgres has closed that file
> >     BEFORE it is renamed and recycled
>
> My inclination would be to change the backend code so that as soon as
a
> WAL segment is completed, it is flagged as being ready to dump to tape
> (or wherever).  Possibly the easiest way to do this is to rename the
> segment file somehow, perhaps "nnn" becomes "nnn.full".  Then, after
the
> archiver process has properly dumped the file, reflag it as being
dumped
> (perhaps rename to "nnn.done").  Obviously there are any number of
ways
> we could do this flagging, and depending on an OS rename facility
might
> not be the best.

Yes, that would be the correct time to begin archive.

The way the code is currently written there is a slot in MoveOfflineLogs
which looks to see if XLOG_archive_dir is set before entering a section
which is empty apart from a message. That routine doesn't get called
until we're about to recycle the files, which means we've lost our
window of opportunity to archive them. Making the number of files larger
doesn't effect that being called last.... I'm going to ignore that
"hint" and any patch will include deletion of that code to avoid later
confusion.

The log switch and close occurs during XLogWrite, when it is established
that there is no more room in the current log file for the current
segment.

The file-flagging mechanism only allows a single archiver program to
operate, so I'll structure it as a new function XLogArchiveNotify() so
we can add in extra stuff later to improve/change things. That way we
have a home for the API.

> A segment then can be recycled when it is both (a) older than the
latest
> checkpoint and (b) flagged as dumped.  Note that this approach allows
> dumping of a file to start before the first time at which it could be
> recycled.  In the event of a crash and restart, WAL replay has to be
> able to find the flagged segments, so the flagging mechanism can't be
> one that would make this impossible.

The number of WAL logs is effectively tunable anyway because it depends
on the number of checkpoint segments, so we can increase that if there
are issues with archival speed v txn rate.

The rename is always safe because the log file names never wrap.

However, I'm loathe to touch the files, in case something crashes
somewhere and we are left with recovery failing because of an
unlocatable file. (To paraphrase one of the existing code comments, only
the truly paranoid survive). A similar way is to have a "buddy" file,
which indicates whether it is full and ready for archival. i.e. when we
close file "nnn" we also write an nearly/empty file called "nnn.full".
That file can then be deleted later BY THE archiver once archival has
finished, allowing it to be recycled by InstallXLogFileSegment(). (Would
require at least 6 more file descriptors, but I'm not sure if that's an
issue).

InstallXLogFileSegment() can check for XLogArchiveBusy() to see whether
it is allowed to reuse or allocate a new one. In initial implementation
this would just test to see whether "nnn.full" still exists. This will
allow a range of behaviour to be catered for, such as long waits while
manual tape mounts are requested by the archiver etc..

So in summary, the API is:

Archiver initialises and waits on notify
Postgresql initialises
...then
Postgresql fills log, switches and close it, then calls
XLogArchiveNotify()
Archiver moves log somewhere safe, then sets state such that...
...sometime later
Postgresql checks XLogArchiveBusy() to see if its safe to recycle file
and discovers the state set by

API is completely unintrusive on current tried and tested operation, and
leaves the archiver(s) free to act as they choose, outside of the
address space of PostgreSQL. That way we don't have to update regession
tests with some destructive non-manual crash tests to show that works.

Clearly, we wouldn't want WAL logs to hang around too long, so we need
an initiation method for the archival process. Otherwise, we'll be
writing "nnn.full" notifications yet without anybody ever deleting them.
Either this could be set at startup with an archive_log_mode parameter
(OK, the names been used before, but if the cap fits, wear it) or
setting a maximum limit to number of archive logs and a few other ideas,
none of which I like.

Hmmmm...any listeners got any ideas here? How do we want this to work?

Anybody want to write a more complex archiver process to act as more
than just a test harness?

Best regards,

Simon Riggs



pgsql-hackers-pitr by date:

Previous
From: Bruce Momjian
Date:
Subject: Re:
Next
From: raymond.siebert@mobilcom.de
Date:
Subject: Re: Archival API