Re: will PITR in 8.0 be usable for "hot spare"/"log - Mailing list pgsql-hackers
From | Gaetano Mendola |
---|---|
Subject | Re: will PITR in 8.0 be usable for "hot spare"/"log |
Date | |
Msg-id | 411FEC60.6000607@bigfoot.com Whole thread Raw |
In response to | Re: will PITR in 8.0 be usable for "hot spare"/"log (Eric Kerin <eric@bootseg.com>) |
List | pgsql-hackers |
Eric Kerin wrote: > On Sun, 2004-08-15 at 16:22, Gaetano Mendola wrote: > >>Eric Kerin wrote: >> >>>On Sat, 2004-08-14 at 01:11, Tom Lane wrote: >>> >>> >>>>Eric Kerin <eric@bootseg.com> writes: >>>> >>>> >>>>>The issues I've seen are: >>>>>1. Knowing when the master has finished the file transfer transfer to >>>>>the backup. >>>> >>>>The "standard" solution to this is you write to a temporary file name >>>>(generated off your process PID, or some other convenient reasonably- >>>>unique random name) and rename() into place only after you've finished >>>>the transfer. >>> >>>Yup, much easier this way. Done. >>> >>> >>> >>>>>2. Handling the meta-files, (.history, .backup) (eg: not sleeping if >>>>>they don't exist) >>>> >>>>Yeah, this is an area that needs more thought. At the moment I believe >>>>both of these will only be asked for during the initial microseconds of >>>>slave-postmaster start. If they are not there I don't think you need to >>>>wait for them. It's only plain ol' WAL segments that you want to wait >>>>for. (Anyone see a hole in that analysis?) >>>> >>> >>>Seems to be working fine this way, I'm now just returning ENOENT if they >>>don't exist. >>> >>> >>> >>>>>3. Keeping the backup from coming online before the replay has fully >>>>>finished in the event of a failure to copy a file, or other strange >>>>>errors (out of memory, etc). >>>> >>>>Right, also an area that needs thought. Some other people opined that >>>>they want the switchover to occur only on manual command. I'd go with >>>>that too if you have anything close to 24x7 availability of admins. >>>>If you *must* have automatic switchover, what's the safest criterion? >>>>Dunno, but let's think ... >>> >>> >>>I'm not even really talking about automatic startup on fail over. Right >>>now, if the recovery_command returns anything but 0, the database will >>>finish recovery, and come online. This would cause you to have to >>>re-build your backup system from a copy of master unnecessarily. Sounds >>>kinda messy to me, especially if it's a false trigger (temporary io >>>error, out of memory) >> >>Well, this is the way most of HA cluster solution are working, in my experience >>the RH cluster solution rely on a common partition between the two nodes >>and on a serial connection between them. >>For sure for a 24x7 service is a compulsory requirement have an automatic procedure >>that handle the failures without uman intervention. >> >> >>Regards >>Gaetano Mendola >> > > > Already sent this to Gaetano, didn't realize the mail was on list too: > > Redhat's HA stuff is a fail over cluster, not a log shipping cluster. > Once the Backup detects a failure of the master, it powers the master off, > and takes over all devices, and network names/IPaddresses. We are using RH HA stuff since long time and is not necessary have the master powered off ( our setup don't ). > In log shipping, you can't even be sure that both nodes will be close > enough together to have multiple communication methods. At work, we > have an Oracle log shipping setup where the backup cluster is a > thousand or so miles away from the master cluster, separated by a T3 > link. > > For a 24x7 zero-downtime type of system, you would have 2 Fail over > clusters, separated by a few miles(or a few thousand). Then setup log > shipping from the master to the backup. That keeps the system online > incase of a single node hardware failure, without having to transfer to > the backup log shipping system. The backup is there incase the master > is completely destroyed (by fire, hardware corruption, etc) Hence the > reason for the remote location. I totally agree with you but not all people can set up a RH HA cluster or equivalent solutions ( is needed very expensive SAN with double port ) and this software version could help in a low cost setup. The scripts that I posted do the failover between master and slave in automatic way delivering also the partial WAL ( I could increase the robusteness checking also a serial connection ) without need expensive HW. For sure this way to proceed ( the log shipping activity ) will increase the availability in case of total disaster ( actualy I transfer to another location a plain dump each 3 hours :-( ). Regards Gaetano Mendola
pgsql-hackers by date: