Re: streaming replication breaks horribly if master crashes - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: streaming replication breaks horribly if master crashes
Date
Msg-id 4C19309C.1090703@agliodbs.com
Whole thread Raw
In response to streaming replication breaks horribly if master crashes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: streaming replication breaks horribly if master crashes  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away.  Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.

Yes, I've noticed this.  That was the reason for forcing walreceiver to
shut down on a restart per prior discussion and patches.  This needs to
be on the open items list ... possibly it'll be fixed by Simon's
keepalive patch?  Or is it just a tcp_keeplalive issue?

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.  So far I've only been able to reproduce this
> with fsync=off, but I believe it's possible anyway, 

... and some users will turn fsync off.  This is, in fact, one of the
primary uses for streaming replication: Durability via replicas.

> and this just
> makes it more likely.  After the most recent crash, the master thought
> pg_current_xlog_location() was 1/86CD4000; the slave thought
> pg_last_xlog_receive_location() was 1/8733C000.  After reconnecting to
> the master, the slave then thought that
> pg_last_xlog_receive_location() was 1/87000000.  

So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
have actually prevented the slave from being corrupted.

My question, though, is detecting out-of-sequence xlogs *enough*?  Are
there any crash conditions on the master which would cause the master to
reuse the same locations for different records, for example?  I don't
think so, but I'd like to be certain.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: streaming replication breaks horribly if master crashes
Next
From: Robert Haas
Date:
Subject: Re: streaming replication breaks horribly if master crashes