Re: will PITR in 8.0 be usable for "hot spare"/"log - Mailing list pgsql-hackers

From Gaetano Mendola
Subject Re: will PITR in 8.0 be usable for "hot spare"/"log
Date
Msg-id 411FEC60.6000607@bigfoot.com
Whole thread Raw
In response to Re: will PITR in 8.0 be usable for "hot spare"/"log  (Eric Kerin <eric@bootseg.com>)
List pgsql-hackers
Eric Kerin wrote:

> On Sun, 2004-08-15 at 16:22, Gaetano Mendola wrote:
> 
>>Eric Kerin wrote:
>>
>>>On Sat, 2004-08-14 at 01:11, Tom Lane wrote:
>>>
>>>
>>>>Eric Kerin <eric@bootseg.com> writes:
>>>>
>>>>
>>>>>The issues I've seen are:
>>>>>1. Knowing when the master has finished the file transfer transfer to
>>>>>the backup.
>>>>
>>>>The "standard" solution to this is you write to a temporary file name
>>>>(generated off your process PID, or some other convenient reasonably-
>>>>unique random name) and rename() into place only after you've finished
>>>>the transfer.  
>>>
>>>Yup, much easier this way.  Done.
>>>
>>>
>>>
>>>>>2. Handling the meta-files, (.history, .backup) (eg: not sleeping if
>>>>>they don't exist)
>>>>
>>>>Yeah, this is an area that needs more thought.  At the moment I believe
>>>>both of these will only be asked for during the initial microseconds of
>>>>slave-postmaster start.  If they are not there I don't think you need to
>>>>wait for them.  It's only plain ol' WAL segments that you want to wait
>>>>for.  (Anyone see a hole in that analysis?)
>>>>
>>>
>>>Seems to be working fine this way, I'm now just returning ENOENT if they
>>>don't exist.  
>>>
>>>
>>>
>>>>>3. Keeping the backup from coming online before the replay has fully
>>>>>finished in the event of a failure to copy a file, or other strange
>>>>>errors (out of memory, etc).
>>>>
>>>>Right, also an area that needs thought.  Some other people opined that
>>>>they want the switchover to occur only on manual command.  I'd go with
>>>>that too if you have anything close to 24x7 availability of admins.
>>>>If you *must* have automatic switchover, what's the safest criterion?
>>>>Dunno, but let's think ...
>>>
>>>
>>>I'm not even really talking about automatic startup on fail over.  Right
>>>now, if the recovery_command returns anything but 0, the database will
>>>finish recovery, and come online.  This would cause you to have to
>>>re-build your backup system from a copy of master unnecessarily.  Sounds
>>>kinda messy to me, especially if it's a false trigger (temporary io
>>>error, out of memory)
>>
>>Well, this is the way most of HA cluster solution are working, in my experience
>>the RH cluster solution rely on a common partition between the two nodes
>>and on a serial connection between them.
>>For sure for a 24x7 service is a compulsory requirement have an automatic procedure
>>that handle the failures without uman intervention.
>>
>>
>>Regards
>>Gaetano Mendola
>>
> 
> 
> Already sent this to Gaetano, didn't realize the mail was on list too:
> 
> Redhat's HA stuff is a fail over cluster, not a log shipping cluster.
> Once the Backup detects a failure of the master, it powers the master off, > and takes over all devices, and network
names/IPaddresses.
 

We are using RH HA stuff since long time and is not necessary have the master
powered off ( our setup don't ).


> In log shipping, you can't even be sure that both nodes will be close
> enough together to have multiple communication methods.  At work, we
> have an Oracle log shipping setup where the backup cluster is a
> thousand or so miles away from the master cluster, separated by a T3
> link.
>
> For a 24x7 zero-downtime type of system, you would have 2 Fail over
> clusters, separated by a few miles(or a few thousand). Then setup log
> shipping from the master to the backup.  That keeps the system online
> incase of a single node hardware failure, without having to transfer to
> the backup log shipping system.  The backup is there incase the master
> is completely destroyed (by fire, hardware corruption, etc) Hence the
> reason for the remote location.

I totally agree with you but not all people can set up a RH HA cluster or
equivalent solutions ( is needed very expensive SAN with double port ) and
this software version could help in a low cost setup. The scripts that I posted
do the failover between master and slave in automatic way delivering also
the partial WAL ( I could increase the robusteness checking also a serial
connection ) without need expensive HW.

For sure this way to proceed ( the log shipping activity ) will increase
the availability in case of total disaster ( actualy I transfer to another
location a plain dump each 3 hours :-( ).


Regards
Gaetano Mendola














pgsql-hackers by date:

Previous
From: Eric Kerin
Date:
Subject: Re: will PITR in 8.0 be usable for "hot spare"/"log
Next
From: Philip Warner
Date:
Subject: Re: pg_dump 'die_on_errors'