Re: Sync Rep: First Thoughts on Code - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: Sync Rep: First Thoughts on Code
Date
Msg-id 495110C4.4070109@mark.mielke.cc
Whole thread Raw
In response to Re: Sync Rep: First Thoughts on Code  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> You scare me that you see failover as sufficiently frequent that you are
> worried that being without one of the servers for an extra 60 seconds
> during a failover is a problem. And then say you're not going to add the
> feature after all. I really don't understand. If its important, add the
> feature, the whole feature that is. If not, don't.
>
> My expectation is that most failovers are serious ones, that the primary
> system is down and not coming back very fast. Your worries seem to come
> from a scenario where the primary system is still up but Postgres
> bounces/crashes, we can diagnose the cause of the crash, decide the
> crashed server is safe and then wish to recommence operations on it
> again as quickly as possible, where seconds count it doing so.
>
> Are failovers going to be common? Why?
>   

Hi Simon:

I agree with most of your criticism to the "fail over only approach" - 
but don't agree that fail over frequency should really impact 
expectations for the failed system to return to service. I see "soft" 
fails (*not* serious) to potentially be common - somewhere on the 
network, something went down or some packet was lost, and the system 
took a few too many seconds to respond. My expectation is that the 
system can quickly  detect that the node is out of service, be removed 
from the pool, when the situation is resolved (often automatically 
outside of my control) automatically "catch up" and be put back into the 
pool. Having to run some other process such as rsync seems unreliable as 
we already have a mechanism for streaming the data. All that is missing 
is streaming from an earlier point in time to catch up efficiently and 
reliably.

I think I'm talking more about the complete solution though which is in 
line with what you are saying? :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: incoherent view of serializable transactions
Next
From: Simon Riggs
Date:
Subject: Re: Synchronous replication, network protocol