PITR Phase 1 - Code Overview (1) - Mailing list pgsql-hackers

From Simon Riggs
Subject PITR Phase 1 - Code Overview (1)
Date
Msg-id 1083013886.3018.218.camel@stromboli
Whole thread Raw
In response to PITR Phase 1 - Test results  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: PITR Phase 1 - Code Overview (1)
Re: PITR Phase 1 - Code Overview (1)
List pgsql-hackers
On Mon, 2004-04-26 at 16:37, Simon Riggs wrote:
> I've now completed the coding of Phase 1 of PITR.
>
> This allows a backup to be recovered and then rolled forward (all the
> way) on transaction logs. This proves the code and the design works, but
> also validates a lot of the earlier assumptions that were the subject of
> much earlier debate.
>
> As noted in the previous designs, PostgreSQL talks to an external
> archiver using the XLogArchive API.
> I've now completed:
> - changes to PostgreSQL
> - written a simple archiving utility, pg_arch
>
This will be on HACKERS not PATCHES for a while...


OVERVIEW :

Various code changes. Not all included here...but I want to prove this
is real, rather than have you waiting for my patch release skills to
improve.

PostgreSQL changes include:
============================
- guc.c
New GUC called wal_archive to control archival logging/not.

- xlog.h
GUC added here

- xlog.c
The most critical parts of the code live here. The way things currently
work can be thought of as a circular set of logs, with the current log
position sweeping around the circle like a clock. In order to archive an
xlog, you must start just AFTER the file has been closed and BEFORE the
pointer sweeps round again.
The code here tries to spot the right moment to notify the archive that
its time to archive. That point is critical, too early and the archive
may yet be incomplete, too late and a window of failure creeps into the
system.
Finding that point is more complicated than it seems because every
backend has the same file open and decides to close it at different
times - nearly the same time if you're running pgbench, but could vary
considerably otherwise. That timing difference is the source of Bug#1.
My solution is to use the piece of code that first updates pg_control,
since there is a similar need to only-do-it-once. My understanding is
that the other backends eventually discover they are supposed to be
looking at a different file now and reset themselves - so that the xlog
gets fsynced only once.
It's taken me a week to consider the alternatives...this point is
critical, so please suggest if you know/think differently.
When the pointer sweeps round again, if we are still archiving, we
simply increase the number of logs in the cycle to defer when we can
recycle the xlog. The code doesn't yet handle a failure condition we
discussed previously: running out of disk space and how we handle that
(there was detailed debate, noted for future implementation).

New utility aimed at being located in src/bin/pg_arch
=======================================================
- pg_arch.c
The idea of pg_arch is that it is a functioning archival tool and at the
same time is the reference implementation of the XLogArchive API. The
API is all wrapped up in the same file currently, to make it easier to
implement, but I envisage separating these out into two parts after it
passes initial inspection - shouldn't take too much work given that was
its design goal. This will then allow the API to be used for wider
applications that want to backup PostgreSQL.

- src/bin/Makefile has been updated to include pg_arch, so that this
then gets made as part of the full system rather than an add-on. I'm
sure somebody has feelings on this...my thinking was that it ought to be
available without too much effort.

What's NOT included (YET!)
==========================
-changes to initdb
-changes to postgresql.conf
-changes to wal_debug
-related changes
-user documentation

- changes to initdb
XLogArchive API implementation relies on the existence of
    $PGDATA/pg_rlog

That would be relatively simple to add to initdb, but its also a no
brainer to add without it, so I thought I'd leave it for discussion in
case anybody has good reasons to put elsewhere/rename it etc.

More importantly, this effects the security model used by XLogArchive.
The way I had originally envisaged this, the directory permissions would
be opened up for group level read/write thus:
    pg_xlog        rwxr-x---
    pg_rlog        rwxrwx---
though this of course relies on $PGDATA being opened up also. That then
would allow the archiving tool to be in its own account also, yet with a
shared group. (Thinking that a standard Legato install (for instance) is
unlikely to recommend sharing a UNIX userid with PostgreSQL). I was
unaware that PostgreSQL checks the permissions of PGDATA before it
starts and does not allow you to proceed if group permissions exist.

We have two options:-related changes
-user documentation

i) alter all things that rely on security being userlevel-only
- initdb
- startup
- most other security features?
ii) encourage (i.e. force) people using XLogArchive API to run as the
PostgreSQL owning-user (postgres).

I've avoided this issue in the general implementation, thinking that
there'll be some strong feelings either way, or an alternative that I
haven't thought of yet (please...)

-changes to postgresql.conf
The parameter setting
    wal_archive=true
needs to be added to make XLogArchive work or not.
I've not added this to the install template (yet), in case we had some
further suggestions for what this might be called.
-related changes
-user documentation

-changes to wal_debug
The XLOG_DEBUG flag is set as a value between 1 and 16, though the code
only ever treats this as a boolean. For my development, I partially
implemented an earlier suggestion of mine: set the flag to 1 in the
config file, then set the more verbose portions of debug output to
trigger when its set to 16. That effected a couple of places in xlog.c.
That may not be needed, so thats not included either.

-user documentation
Not yet...but it will be.

> Bugs
> - two bugs currently occur during some tests:
> 1. the notification mechanism as originally designed causes ALL backends
> to report that a log file has closed. That works most of the time,
> though does give rise to occasional timing errors - nothing too
> serious, but this inexactness could lead to later errors.
> 2. After restore, the notification system doesn't recover fully - this
> is a straightforward one


Attachment

pgsql-hackers by date:

Previous
From: Shachar Shemesh
Date:
Subject: Re: Bringing PostgreSQL torwards the standard regarding case folding
Next
From: Bruce Momjian
Date:
Subject: Re: PITR Phase 2 - Design Planning