Re: PITR logging control program - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: PITR logging control program
Date
Msg-id 1083707404.3100.457.camel@stromboli
Whole thread Raw
In response to Re: PITR logging control program  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
On Fri, 2004-04-30 at 04:02, Bruce Momjian wrote:
> Simon Riggs wrote:
> > > Agreed we want to allow the superuser control over writing of the
> > > archive logs.  The question is how do they get access to that.  Is it by
> > > running a client program continuously or calling an interface script
> > > from the backend?
> > > 
> > > My point was that having the backend call the program has improved
> > > reliablity and control over when to write, and easier administration.
> > > 
> > 
> > Agreed. We've both suggested ways that can occur, though I suggest this
> > is much less of a priority, for now. Not "no", just not "now".
> > 
> > > Another case is server start/stop.  You want to start/stop the archive
> > > logger to match the database server, particularly if you reboot the
> > > server.  I know Informix used a client program for logging, and it was a
> > > pain to administer.
> > > 
> > 
> > pg_arch is just icing on top of the API. The API is the real deal here.
> > I'm not bothered if pg_arch is not accepted, as long as we can adopt the
> > API. As noted previously, my original mind was to split the API away
> > from the pg_arch application to make it clearer what was what. Once that
> > has been done, I encourage others to improve pg_arch - but also to use
> > the API to interface with other BAR prodiucts.
> > 
> > If you're using PostgreSQL for serious business then you will be using a
> > serious BAR product as well. There are many FOSS alternatives...
> > 
> > The API's purpose is to allow larger, pre-existing BAR products to know
> > when and how to retrieve data from PostgreSQL. Those products don't and
> > won't run underneath postmaster, so although I agree with Peter's
> > original train of thought, I also agree with Tom's suggestion that we
> > need an API more than we need an archiver process. 
> > 
> > I would be happy with an exteral program if it was started/stoped by the
> > > postmaster (or via GUC change) and received a signal when a WAL file was
> > > written.  
> > 
> > That is exactly what has been written.
> > 
> > The PostgreSQL side of the API is written directly into the backend, in
> > xlog.c and is therefore activated by postmaster controlled code. That
> > then sends "a signal" to the process that will do the archiving - the
> > Archiver side of the XLogArchive API has it as an in-process library.
> > (The "signal" is, in fact, a zero-length file written to disk because
> > there are many reasons why an external archiver may not be ready to
> > archive or even up and running to receive a signal).
> > 
> > The only difference is that there is some confusion as to the role and
> > importance of pg_arch.
> 
> OK, I have finalized my thinking on this.
> 
> We both agree that a pg_arch client-side program certainly works for
> PITR logging.  The big question in my mind is whether a client-side
> program is what we want to use long-term, and whether we want to release
> a 7.5 that uses it and then change it in 7.6 to something more
> integrated into the backend.
> 
> Let me add this is a little different from pg_autovacuum.  With that,
> you could put it in cron and be done with it.  With pg_arch, there is a
> routine that has to be used to do PITR, and if we change the process in
> 7.6, I am afraid there will be confusion.
> 
> Let me also add that I am not terribly worried about having the feature
> to restore to an arbitrary point in time for 7.5.  I would much rather
> have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
> than to have retore to an arbitrary point but have a strained
> implementation that we have to revisit for 7.6.
> 
> Here are my ideas.  (I talked to Tom about this and am including his
> ideas too.)  Basically, the archiver that scans the xlog directory to
> identify files to be archived should be a subprocess of the postmaster. 
> You already have that code and it can be moved into the backend.
> 
> Here is my implementation idea.  First, your pg_arch code runs in the
> backend and is started just like the statistics process.  It has to be
> started whether PITR is being used or not, but will be inactive if PITR
> isn't enabled.  This must be done because we can't have a backend start
> this process later in case they turn on PITR after server start.
> 
> The process id of the archive process is stored in shared memory.  When
> PITR is turned on, each backend that complete a WAL file sends a signal
> to the archiver process.  The archiver wakes up on the signal and scans
> the directory, finds files that need archiving, and either does a 'cp'
> or runs a user-defined program (like scp) to transfer the file to the
> archive location.
> 
> In GUC we add:
> 
>     pitr = true/false
>     pitr_location = 'directory, user@host:/dir, etc'
>     pitr_transfer = 'cp, scp, etc'
> 
> The archiver program updates its config values when someone changes
> these values via postgresql.conf (and uses pg_ctl reload).  These can
> only be modified from postgresql.conf.  Changing them via SET has to be
> disabled because they are cluster-level settings, not per session, like
> port number or checkpoint_segments.
> 
> Basically, I think that we need to push user-level control of this
> process down beyond the directory scanning code (that is pretty
> standard), and allow them to call an arbitrary program to transfer the
> logs.  My idea is that the pitr_transfer program will get $1=WAL file
> name and $2=pitr_location and the program can use those arguments to do
> the transfer.  We can even put a pitr_transfer.sample program in share
> and document $1 and $2.

...Bruce and I have just discussed this in some detail and reached a
good understanding of the design proposals as a whole. It looks like all
of this can happen in the next few weeks, with a worst case time
estimate of mid-June. TGFT!

I'll write this up and post this shortly, with a rough roadmap for
further development of recovery-related features.

Best Regards,

Simon Riggs
2nd Quadrant




pgsql-hackers by date:

Previous
From: jearl@bullysports.com
Date:
Subject: Re: [pgsql-advocacy] What can we learn from MySQL?
Next
From: Greg Stark
Date:
Subject: More Hashing questions