Re: streaming replication breaks horribly if master crashes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: streaming replication breaks horribly if master crashes
Date
Msg-id 14597.1276721813@sss.pgh.pa.us
Whole thread Raw
In response to streaming replication breaks horribly if master crashes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: streaming replication breaks horribly if master crashes  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away.  Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer than I was willing to wait.

TCP timeout is the answer there.

> More seriously, I was able to demonstrate that the problem linked in
> the thread above is real: if the master crashes after streaming WAL
> that it hasn't yet fsync'd, then on recovery the slave's xlog position
> is ahead of the master.

So indeed we'd better change walsender to not get ahead of the fsync'd
position.  And probably also warn people to not disable fsync on the
master, unless they're willing to write it off and fail over at any
system crash.

> I don't know what to do about this, but I'm pretty sure we can't ship it as-is.

Doesn't seem tremendously insoluble from here ...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Rafael Martinez
Date:
Subject: Re: streaming replication breaks horribly if master crashes
Next
From: Amir Abdollahi
Date:
Subject: Add new backend process