Re: will PITR in 8.0 be usable for "hot spare"/"log - Mailing list pgsql-hackers

From Gaetano Mendola
Subject Re: will PITR in 8.0 be usable for "hot spare"/"log
Date
Msg-id 411FC5EA.4080209@bigfoot.com
Whole thread Raw
In response to Re: will PITR in 8.0 be usable for "hot spare"/"log  (Eric Kerin <eric@bootseg.com>)
Responses Re: will PITR in 8.0 be usable for "hot spare"/"log
List pgsql-hackers
Eric Kerin wrote:
> On Sat, 2004-08-14 at 01:11, Tom Lane wrote:
> 
>>Eric Kerin <eric@bootseg.com> writes:
>>
>>>The issues I've seen are:
>>>1. Knowing when the master has finished the file transfer transfer to
>>>the backup.
>>
>>The "standard" solution to this is you write to a temporary file name
>>(generated off your process PID, or some other convenient reasonably-
>>unique random name) and rename() into place only after you've finished
>>the transfer.  
> 
> Yup, much easier this way.  Done.
> 
> 
>>>2. Handling the meta-files, (.history, .backup) (eg: not sleeping if
>>>they don't exist)
>>
>>Yeah, this is an area that needs more thought.  At the moment I believe
>>both of these will only be asked for during the initial microseconds of
>>slave-postmaster start.  If they are not there I don't think you need to
>>wait for them.  It's only plain ol' WAL segments that you want to wait
>>for.  (Anyone see a hole in that analysis?)
>>
> 
> Seems to be working fine this way, I'm now just returning ENOENT if they
> don't exist.  
> 
> 
>>>3. Keeping the backup from coming online before the replay has fully
>>>finished in the event of a failure to copy a file, or other strange
>>>errors (out of memory, etc).
>>
>>Right, also an area that needs thought.  Some other people opined that
>>they want the switchover to occur only on manual command.  I'd go with
>>that too if you have anything close to 24x7 availability of admins.
>>If you *must* have automatic switchover, what's the safest criterion?
>>Dunno, but let's think ...
> 
> 
> I'm not even really talking about automatic startup on fail over.  Right
> now, if the recovery_command returns anything but 0, the database will
> finish recovery, and come online.  This would cause you to have to
> re-build your backup system from a copy of master unnecessarily.  Sounds
> kinda messy to me, especially if it's a false trigger (temporary io
> error, out of memory)

Well, this is the way most of HA cluster solution are working, in my experience
the RH cluster solution rely on a common partition between the two nodes
and on a serial connection between them.
For sure for a 24x7 service is a compulsory requirement have an automatic procedure
that handle the failures without uman intervention.


Regards
Gaetano Mendola























pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Savepoint weirdness
Next
From: Gavin Sherry
Date:
Subject: Re: Savepoint weirdness