Re: streaming replication breaks horribly if master crashes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: streaming replication breaks horribly if master crashes
Date
Msg-id AANLkTilzOkAH35ViRb_2-YHrdKQLevx9p9biFVxzKen3@mail.gmail.com
Whole thread Raw
In response to Re: streaming replication breaks horribly if master crashes  (Josh Berkus <josh@agliodbs.com>)
Responses Re: streaming replication breaks horribly if master crashes  (Magnus Hagander <magnus@hagander.net>)
Re: streaming replication breaks horribly if master crashes  (Josh Berkus <josh@agliodbs.com>)
Re: streaming replication breaks horribly if master crashes  ("Pierre C" <lists@peufeu.com>)
Re: streaming replication breaks horribly if master crashes  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away.  Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat there waiting, either forever or at least for
>> longer than I was willing to wait.
>
> Yes, I've noticed this.  That was the reason for forcing walreceiver to
> shut down on a restart per prior discussion and patches.  This needs to
> be on the open items list ... possibly it'll be fixed by Simon's
> keepalive patch?  Or is it just a tcp_keeplalive issue?

I think a TCP keepalive might be enough, but I have not tried to code
or test it.

>> More seriously, I was able to demonstrate that the problem linked in
>> the thread above is real: if the master crashes after streaming WAL
>> that it hasn't yet fsync'd, then on recovery the slave's xlog position
>> is ahead of the master.  So far I've only been able to reproduce this
>> with fsync=off, but I believe it's possible anyway,
>
> ... and some users will turn fsync off.  This is, in fact, one of the
> primary uses for streaming replication: Durability via replicas.

Yep.

>> and this just
>> makes it more likely.  After the most recent crash, the master thought
>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
>> the master, the slave then thought that
>> pg_last_xlog_receive_location() was 1/87000000.
>
> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
> have actually prevented the slave from being corrupted.
>
> My question, though, is detecting out-of-sequence xlogs *enough*?  Are
> there any crash conditions on the master which would cause the master to
> reuse the same locations for different records, for example?  I don't
> think so, but I'd like to be certain.

The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots.  I
believe that what we need to do is make sure that the master only
sends WAL it has already fsync'd (Tom suggested on another thread that
this might be necessary, and I think it's now clear that it is 100%
necessary).  But I'm not sure how this will play with fsync=off - if
we never fsync, then we can't ever really send any WAL without risking
this failure mode.  Similarly with synchronous_commit=off, I believe
that the next checkpoint will still fsync WAL, but the lag might be
long.

I think we should also change the slave to panic and shut down
immediately if its xlog position is ahead of the master.  That can
never be a watertight solution because you can always advance the xlog
position on them master and mask the problem.  But I think we should
do it anyway, so that we at least have a chance of noticing that we're
hosed.  I wish I could think of something a little more watertight...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: streaming replication breaks horribly if master crashes
Next
From: "Kevin Grittner"
Date:
Subject: Re: streaming replication breaks horribly if master crashes