PITR Phase 1 - Code Overview (1) - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | PITR Phase 1 - Code Overview (1) |
Date | |
Msg-id | 1083013886.3018.218.camel@stromboli Whole thread Raw |
In response to | PITR Phase 1 - Test results (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: PITR Phase 1 - Code Overview (1)
Re: PITR Phase 1 - Code Overview (1) |
List | pgsql-hackers |
On Mon, 2004-04-26 at 16:37, Simon Riggs wrote: > I've now completed the coding of Phase 1 of PITR. > > This allows a backup to be recovered and then rolled forward (all the > way) on transaction logs. This proves the code and the design works, but > also validates a lot of the earlier assumptions that were the subject of > much earlier debate. > > As noted in the previous designs, PostgreSQL talks to an external > archiver using the XLogArchive API. > I've now completed: > - changes to PostgreSQL > - written a simple archiving utility, pg_arch > This will be on HACKERS not PATCHES for a while... OVERVIEW : Various code changes. Not all included here...but I want to prove this is real, rather than have you waiting for my patch release skills to improve. PostgreSQL changes include: ============================ - guc.c New GUC called wal_archive to control archival logging/not. - xlog.h GUC added here - xlog.c The most critical parts of the code live here. The way things currently work can be thought of as a circular set of logs, with the current log position sweeping around the circle like a clock. In order to archive an xlog, you must start just AFTER the file has been closed and BEFORE the pointer sweeps round again. The code here tries to spot the right moment to notify the archive that its time to archive. That point is critical, too early and the archive may yet be incomplete, too late and a window of failure creeps into the system. Finding that point is more complicated than it seems because every backend has the same file open and decides to close it at different times - nearly the same time if you're running pgbench, but could vary considerably otherwise. That timing difference is the source of Bug#1. My solution is to use the piece of code that first updates pg_control, since there is a similar need to only-do-it-once. My understanding is that the other backends eventually discover they are supposed to be looking at a different file now and reset themselves - so that the xlog gets fsynced only once. It's taken me a week to consider the alternatives...this point is critical, so please suggest if you know/think differently. When the pointer sweeps round again, if we are still archiving, we simply increase the number of logs in the cycle to defer when we can recycle the xlog. The code doesn't yet handle a failure condition we discussed previously: running out of disk space and how we handle that (there was detailed debate, noted for future implementation). New utility aimed at being located in src/bin/pg_arch ======================================================= - pg_arch.c The idea of pg_arch is that it is a functioning archival tool and at the same time is the reference implementation of the XLogArchive API. The API is all wrapped up in the same file currently, to make it easier to implement, but I envisage separating these out into two parts after it passes initial inspection - shouldn't take too much work given that was its design goal. This will then allow the API to be used for wider applications that want to backup PostgreSQL. - src/bin/Makefile has been updated to include pg_arch, so that this then gets made as part of the full system rather than an add-on. I'm sure somebody has feelings on this...my thinking was that it ought to be available without too much effort. What's NOT included (YET!) ========================== -changes to initdb -changes to postgresql.conf -changes to wal_debug -related changes -user documentation - changes to initdb XLogArchive API implementation relies on the existence of $PGDATA/pg_rlog That would be relatively simple to add to initdb, but its also a no brainer to add without it, so I thought I'd leave it for discussion in case anybody has good reasons to put elsewhere/rename it etc. More importantly, this effects the security model used by XLogArchive. The way I had originally envisaged this, the directory permissions would be opened up for group level read/write thus: pg_xlog rwxr-x--- pg_rlog rwxrwx--- though this of course relies on $PGDATA being opened up also. That then would allow the archiving tool to be in its own account also, yet with a shared group. (Thinking that a standard Legato install (for instance) is unlikely to recommend sharing a UNIX userid with PostgreSQL). I was unaware that PostgreSQL checks the permissions of PGDATA before it starts and does not allow you to proceed if group permissions exist. We have two options:-related changes -user documentation i) alter all things that rely on security being userlevel-only - initdb - startup - most other security features? ii) encourage (i.e. force) people using XLogArchive API to run as the PostgreSQL owning-user (postgres). I've avoided this issue in the general implementation, thinking that there'll be some strong feelings either way, or an alternative that I haven't thought of yet (please...) -changes to postgresql.conf The parameter setting wal_archive=true needs to be added to make XLogArchive work or not. I've not added this to the install template (yet), in case we had some further suggestions for what this might be called. -related changes -user documentation -changes to wal_debug The XLOG_DEBUG flag is set as a value between 1 and 16, though the code only ever treats this as a boolean. For my development, I partially implemented an earlier suggestion of mine: set the flag to 1 in the config file, then set the more verbose portions of debug output to trigger when its set to 16. That effected a couple of places in xlog.c. That may not be needed, so thats not included either. -user documentation Not yet...but it will be. > Bugs > - two bugs currently occur during some tests: > 1. the notification mechanism as originally designed causes ALL backends > to report that a log file has closed. That works most of the time, > though does give rise to occasional timing errors - nothing too > serious, but this inexactness could lead to later errors. > 2. After restore, the notification system doesn't recover fully - this > is a straightforward one
Attachment
pgsql-hackers by date: