Re: will PITR in 8.0 be usable for "hot spare"/"log - Mailing list pgsql-hackers

From Eric Kerin
Subject Re: will PITR in 8.0 be usable for "hot spare"/"log
Date
Msg-id 1092523844.8485.36.camel@auh5-0478
Whole thread Raw
In response to Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: will PITR in 8.0 be usable for "hot spare"/"log
List pgsql-hackers
On Sat, 2004-08-14 at 01:11, Tom Lane wrote:
> Eric Kerin <eric@bootseg.com> writes:
> > The issues I've seen are:
> > 1. Knowing when the master has finished the file transfer transfer to
> > the backup.
> 
> The "standard" solution to this is you write to a temporary file name
> (generated off your process PID, or some other convenient reasonably-
> unique random name) and rename() into place only after you've finished
> the transfer.  
Yup, much easier this way.  Done.

> > 2. Handling the meta-files, (.history, .backup) (eg: not sleeping if
> > they don't exist)
> 
> Yeah, this is an area that needs more thought.  At the moment I believe
> both of these will only be asked for during the initial microseconds of
> slave-postmaster start.  If they are not there I don't think you need to
> wait for them.  It's only plain ol' WAL segments that you want to wait
> for.  (Anyone see a hole in that analysis?)
> 
Seems to be working fine this way, I'm now just returning ENOENT if they
don't exist.  

> > 3. Keeping the backup from coming online before the replay has fully
> > finished in the event of a failure to copy a file, or other strange
> > errors (out of memory, etc).
> 
> Right, also an area that needs thought.  Some other people opined that
> they want the switchover to occur only on manual command.  I'd go with
> that too if you have anything close to 24x7 availability of admins.
> If you *must* have automatic switchover, what's the safest criterion?
> Dunno, but let's think ...

I'm not even really talking about automatic startup on fail over.  Right
now, if the recovery_command returns anything but 0, the database will
finish recovery, and come online.  This would cause you to have to
re-build your backup system from a copy of master unnecessarily.  Sounds
kinda messy to me, especially if it's a false trigger (temporary io
error, out of memory)


What I think might be a better long term approach (but probably more of
an 8.1 thing).  Have the database go in to a read-only/replay mode,
accept only read-only commands from users.  A replay program opens a
connection to the backup system's postmaster, and tells it to replay a
given file when it becomes available. Once you want the system to come
online, the DBA will call a different function that will instruct the
system to come fully online, and start accepting updates from users.

This could be quite complex, but provides two things: proper log
shipping with status, (without the false fail->db online possibility)
and a read-only replicated backup system(s), which would also be good
for a reporting database.

Thoughts?


Anyway, here's a re-written program for my implementation of log
shipping:  http://www.bootseg.com/log_ship.c It operates mostly the
same, but most of the stupid bugs are fixed.  The old one was renamed to
http://www.bootseg.com/log_ship.c.ver1 if you really want it.

Thanks, 
Eric




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [Fwd: Re: [pgsql-hackers-win32] Import from Linux to
Next
From: Tom Lane
Date:
Subject: Re: [Fwd: Re: [pgsql-hackers-win32] Import from Linux to