Re: PITR logging control program - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: PITR logging control program |
Date | |
Msg-id | 1083707404.3100.457.camel@stromboli Whole thread Raw |
In response to | Re: PITR logging control program (Bruce Momjian <pgman@candle.pha.pa.us>) |
List | pgsql-hackers |
On Fri, 2004-04-30 at 04:02, Bruce Momjian wrote: > Simon Riggs wrote: > > > Agreed we want to allow the superuser control over writing of the > > > archive logs. The question is how do they get access to that. Is it by > > > running a client program continuously or calling an interface script > > > from the backend? > > > > > > My point was that having the backend call the program has improved > > > reliablity and control over when to write, and easier administration. > > > > > > > Agreed. We've both suggested ways that can occur, though I suggest this > > is much less of a priority, for now. Not "no", just not "now". > > > > > Another case is server start/stop. You want to start/stop the archive > > > logger to match the database server, particularly if you reboot the > > > server. I know Informix used a client program for logging, and it was a > > > pain to administer. > > > > > > > pg_arch is just icing on top of the API. The API is the real deal here. > > I'm not bothered if pg_arch is not accepted, as long as we can adopt the > > API. As noted previously, my original mind was to split the API away > > from the pg_arch application to make it clearer what was what. Once that > > has been done, I encourage others to improve pg_arch - but also to use > > the API to interface with other BAR prodiucts. > > > > If you're using PostgreSQL for serious business then you will be using a > > serious BAR product as well. There are many FOSS alternatives... > > > > The API's purpose is to allow larger, pre-existing BAR products to know > > when and how to retrieve data from PostgreSQL. Those products don't and > > won't run underneath postmaster, so although I agree with Peter's > > original train of thought, I also agree with Tom's suggestion that we > > need an API more than we need an archiver process. > > > > I would be happy with an exteral program if it was started/stoped by the > > > postmaster (or via GUC change) and received a signal when a WAL file was > > > written. > > > > That is exactly what has been written. > > > > The PostgreSQL side of the API is written directly into the backend, in > > xlog.c and is therefore activated by postmaster controlled code. That > > then sends "a signal" to the process that will do the archiving - the > > Archiver side of the XLogArchive API has it as an in-process library. > > (The "signal" is, in fact, a zero-length file written to disk because > > there are many reasons why an external archiver may not be ready to > > archive or even up and running to receive a signal). > > > > The only difference is that there is some confusion as to the role and > > importance of pg_arch. > > OK, I have finalized my thinking on this. > > We both agree that a pg_arch client-side program certainly works for > PITR logging. The big question in my mind is whether a client-side > program is what we want to use long-term, and whether we want to release > a 7.5 that uses it and then change it in 7.6 to something more > integrated into the backend. > > Let me add this is a little different from pg_autovacuum. With that, > you could put it in cron and be done with it. With pg_arch, there is a > routine that has to be used to do PITR, and if we change the process in > 7.6, I am afraid there will be confusion. > > Let me also add that I am not terribly worried about having the feature > to restore to an arbitrary point in time for 7.5. I would much rather > have a good PITR solution that works cleanly in 7.5 and add it to 7.6, > than to have retore to an arbitrary point but have a strained > implementation that we have to revisit for 7.6. > > Here are my ideas. (I talked to Tom about this and am including his > ideas too.) Basically, the archiver that scans the xlog directory to > identify files to be archived should be a subprocess of the postmaster. > You already have that code and it can be moved into the backend. > > Here is my implementation idea. First, your pg_arch code runs in the > backend and is started just like the statistics process. It has to be > started whether PITR is being used or not, but will be inactive if PITR > isn't enabled. This must be done because we can't have a backend start > this process later in case they turn on PITR after server start. > > The process id of the archive process is stored in shared memory. When > PITR is turned on, each backend that complete a WAL file sends a signal > to the archiver process. The archiver wakes up on the signal and scans > the directory, finds files that need archiving, and either does a 'cp' > or runs a user-defined program (like scp) to transfer the file to the > archive location. > > In GUC we add: > > pitr = true/false > pitr_location = 'directory, user@host:/dir, etc' > pitr_transfer = 'cp, scp, etc' > > The archiver program updates its config values when someone changes > these values via postgresql.conf (and uses pg_ctl reload). These can > only be modified from postgresql.conf. Changing them via SET has to be > disabled because they are cluster-level settings, not per session, like > port number or checkpoint_segments. > > Basically, I think that we need to push user-level control of this > process down beyond the directory scanning code (that is pretty > standard), and allow them to call an arbitrary program to transfer the > logs. My idea is that the pitr_transfer program will get $1=WAL file > name and $2=pitr_location and the program can use those arguments to do > the transfer. We can even put a pitr_transfer.sample program in share > and document $1 and $2. ...Bruce and I have just discussed this in some detail and reached a good understanding of the design proposals as a whole. It looks like all of this can happen in the next few weeks, with a worst case time estimate of mid-June. TGFT! I'll write this up and post this shortly, with a rough roadmap for further development of recovery-related features. Best Regards, Simon Riggs 2nd Quadrant
pgsql-hackers by date: