Re: will PITR in 8.0 be usable for "hot spare"/"log - Mailing list pgsql-hackers
From | Eric Kerin |
---|---|
Subject | Re: will PITR in 8.0 be usable for "hot spare"/"log |
Date | |
Msg-id | 1092523844.8485.36.camel@auh5-0478 Whole thread Raw |
In response to | Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: will PITR in 8.0 be usable for "hot spare"/"log
|
List | pgsql-hackers |
On Sat, 2004-08-14 at 01:11, Tom Lane wrote: > Eric Kerin <eric@bootseg.com> writes: > > The issues I've seen are: > > 1. Knowing when the master has finished the file transfer transfer to > > the backup. > > The "standard" solution to this is you write to a temporary file name > (generated off your process PID, or some other convenient reasonably- > unique random name) and rename() into place only after you've finished > the transfer. Yup, much easier this way. Done. > > 2. Handling the meta-files, (.history, .backup) (eg: not sleeping if > > they don't exist) > > Yeah, this is an area that needs more thought. At the moment I believe > both of these will only be asked for during the initial microseconds of > slave-postmaster start. If they are not there I don't think you need to > wait for them. It's only plain ol' WAL segments that you want to wait > for. (Anyone see a hole in that analysis?) > Seems to be working fine this way, I'm now just returning ENOENT if they don't exist. > > 3. Keeping the backup from coming online before the replay has fully > > finished in the event of a failure to copy a file, or other strange > > errors (out of memory, etc). > > Right, also an area that needs thought. Some other people opined that > they want the switchover to occur only on manual command. I'd go with > that too if you have anything close to 24x7 availability of admins. > If you *must* have automatic switchover, what's the safest criterion? > Dunno, but let's think ... I'm not even really talking about automatic startup on fail over. Right now, if the recovery_command returns anything but 0, the database will finish recovery, and come online. This would cause you to have to re-build your backup system from a copy of master unnecessarily. Sounds kinda messy to me, especially if it's a false trigger (temporary io error, out of memory) What I think might be a better long term approach (but probably more of an 8.1 thing). Have the database go in to a read-only/replay mode, accept only read-only commands from users. A replay program opens a connection to the backup system's postmaster, and tells it to replay a given file when it becomes available. Once you want the system to come online, the DBA will call a different function that will instruct the system to come fully online, and start accepting updates from users. This could be quite complex, but provides two things: proper log shipping with status, (without the false fail->db online possibility) and a read-only replicated backup system(s), which would also be good for a reporting database. Thoughts? Anyway, here's a re-written program for my implementation of log shipping: http://www.bootseg.com/log_ship.c It operates mostly the same, but most of the stupid bugs are fixed. The old one was renamed to http://www.bootseg.com/log_ship.c.ver1 if you really want it. Thanks, Eric
pgsql-hackers by date: