Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication - Mailing list pgsql-hackers

From Simon@2ndquadrant.com
Subject Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication
Date
Msg-id NOEFLCFHBPDAFHEIPGBOKEHPCCAA.simon@2ndquadrant.com
Whole thread Raw
In response to Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: will PITR in 8.0 be usable for "hot spare"/"log shipping" type of replication  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> Tom Lane
> Eric Kerin <eric@bootseg.com> writes:
> > The issues I've seen are:
> > 1. Knowing when the master has finished the file transfer transfer to
> > the backup.
>
> The "standard" solution to this is you write to a temporary file name
> (generated off your process PID, or some other convenient reasonably-
> unique random name) and rename() into place only after you've finished
> the transfer.  If you are paranoid you can try to fsync the file before
> renaming, too.  File rename is a reasonably atomic process on all modern
> OSes.
>
> > 2. Handling the meta-files, (.history, .backup) (eg: not sleeping if
> > they don't exist)
>
> Yeah, this is an area that needs more thought.  At the moment I believe
> both of these will only be asked for during the initial microseconds of
> slave-postmaster start.  If they are not there I don't think you need to
> wait for them.  It's only plain ol' WAL segments that you want to wait
> for.  (Anyone see a hole in that analysis?)

Agreed.

> > 3. Keeping the backup from coming online before the replay has fully
> > finished in the event of a failure to copy a file, or other strange
> > errors (out of memory, etc).
>
> Right, also an area that needs thought.  Some other people opined that
> they want the switchover to occur only on manual command.  I'd go with
> that too if you have anything close to 24x7 availability of admins.
> If you *must* have automatic switchover, what's the safest criterion?
> Dunno, but let's think ...
>

That's fairly straightforward.

You use a recovery_command that sleeps when it discovers a full log file
isn't available - i.e. it has requested the "last" or master-current WAL
file. The program wakes when the decision/operator command to switchover is
taken.

That way, when switchover occurs, you're straight up. No code changes...

This is important because it will allow us to test recovery for many systems
by creating a continuously rolling copy. Implementing this will be the best
way to stress-test the recovery code.

I'm not hugely in favour of copying partially filled log files, but if
that's what people want...as long as we don't change the basic code to
implement it, because then we'll have just created another code path that
will leave PITR untested for most people.

[I discussed all of this before as Automatic Standby Database functionality]

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Oliver Jowett
Date:
Subject: Re: Calling PL functions with named parameters
Next
From: Andrew Dunstan
Date:
Subject: Re: [Fwd: Re: [pgsql-hackers-win32] Import from Linux to