Re: streaming replication breaks horribly if master crashes - Mailing list pgsql-hackers

From Greg Stark
Subject Re: streaming replication breaks horribly if master crashes
Date
Msg-id AANLkTinbiu_r0cYp1BNsdRNQXnwcEPWbymQQUrB_ZLWk@mail.gmail.com
Whole thread Raw
In response to Re: streaming replication breaks horribly if master crashes  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Thu, Jun 17, 2010 at 12:22 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>
>> It sounds like it behaves just fine except for not detecting a
>> broken connection.
>
> Of course I meant in terms of the slave's attempts at retrieving
> more WAL, not in terms of it applying a second time line.  TCP
> keepalive timeouts don't help with that part of it, just the failure
> to recognize the broken connection.  I suppose someone could argue
> that's a *feature*, since it gives you two hours to manually
> intervene before it does something stupid, but that hardly seems
> like a solution....

It's certainly a design goal of TCP that you should be able to
disconnect the network and reconnect it everything should recover. If
no data was sent it should be able to withstand arbitrarily long
disconnections. TCP Keepalives break that but they should only break
it in the case where the network connection has definitely exceeded
the retry timeouts, not when it merely hasn't responded fast enough
for the application requirements.


--
greg


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: hstore ==> and deprecate =>
Next
From: KaiGai Kohei
Date:
Subject: Re: [v9.1] Add security hook on initialization of instance