Re: streaming replication breaks horribly if master crashes - Mailing list pgsql-hackers

From Greg Stark
Subject Re: streaming replication breaks horribly if master crashes
Date
Msg-id AANLkTinPCrNGbdxhxpqyyDthOyS4Na3UKDPYdyFX70BY@mail.gmail.com
Whole thread Raw
In response to Re: streaming replication breaks horribly if master crashes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: streaming replication breaks horribly if master crashes
List pgsql-hackers
On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away.  Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat there waiting, either forever or at least for
>> longer than I was willing to wait.
>
> TCP timeout is the answer there.

If you mean TCP Keepalives, I disagree quite strongly. If you want the
application to guarantee any particular timing constraints then you
have to implement that in the application using timers and data
packets. TCP keepalives are for detecting broken network connections,
not enforcing application rules. Using TCP timeouts would have a
number of problems: On many systems they are impossible or difficult
to adjust and worse, it would make it impossible to distinguish an
postgres master crash from a transient or permanent network outage.


>> More seriously, I was able to demonstrate that the problem linked in
>> the thread above is real: if the master crashes after streaming WAL
>> that it hasn't yet fsync'd, then on recovery the slave's xlog position
>> is ahead of the master.
>
> So indeed we'd better change walsender to not get ahead of the fsync'd
> position.  And probably also warn people to not disable fsync on the
> master, unless they're willing to write it off and fail over at any
> system crash.
>
>> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.
>
> Doesn't seem tremendously insoluble from here ...

For the case of fsync=off I can't get terribly excited about the slave
being ahead of the master after a crash. After all the master is toast
anyways. It seems to me in this situation the slave should detect that
the master has failed and automatically come up in master mode. Or
perhaps it should just shut down and then refuse to come up as a slave
again on the basis that it would be unsafe precisely because it might
be ahead of the (corrupt) master. At some point we should consider
having a server set to fsync=off refuse to come back up unless it was
shut down cleanly anyways. Perhaps we should put a strongly worded
warning now.

For the case of fsync=on it does seem to me to be terribly obvious
that the master should never send records to the slave that aren't
fsynced on the master. For 9.1 the other option proposed would work as
well but would be more complex -- to send and store records
immediately but not replay them on the slave until they're either
fsynced on the master or failover occurs.

--
greg


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: ANNOUNCE list (was Re: New PGXN Extension site)
Next
From: "Joshua D. Drake"
Date:
Subject: Re: ANNOUNCE list (was Re: New PGXN Extension site)