Home > mailing lists

Re: Replication failure, slave requesting old segments - Mailing list pgsql-general

From	Stephen Frost
Subject	Re: Replication failure, slave requesting old segments
Date	August 13, 2018 20:11:54
Msg-id	20180813141154.GM3326@tamriel.snowman.net Whole thread Raw
In response to	Re: Replication failure, slave requesting old segments (Adrian Klaver <adrian.klaver@aklaver.com>)
List	pgsql-general

Tree view

Greetings,

* Adrian Klaver (adrian.klaver@aklaver.com) wrote:
> On 08/13/2018 05:08 AM, Phil Endecott wrote:
> >Adrian Klaver wrote:
> >Really?  I thought the intention was that the system should be
> >able to recover reliably when the slave reconnects after a
> >period of downtime, subject only to there being sufficient
> >network/CPU/disk bandwidth etc. for it to eventually catch up.

That's correct.

> See also my reply to Stephen earlier. Basically you are trying to coordinate
> two different operations. They start from the same source pg_xlog(pg_wal
> 10+) but arrive on a different time scale and from different locations.
> Without sufficient sanity checks it is possible they diverge enough on one
> or both paths to render the process unstable.

This isn't what's happening.  We're not talking about a timeline change
here or a replica being promoted to be a primary in general.  There's no
diverging happening- it's the same consistent WAL stream, just coming
from two different sources, which PG is specifically designed to handle
and should be handling seamlessly.

> I would say that:
>
> "If you set up a WAL archive that's accessible from the standby, these
> solutions are not required, since the standby can always use the archive to
> catch up provided it retains enough segments."
>
> should be more like:
>
> "If you set up a WAL archive that's accessible from the standby, these
> solutions are not required, since the standby can always use the archive to
> catch up provided it retains enough segments. *This is dependent on
> verification that the archiving is working properly. A belt and suspenders
> approach would be to set wal_keep_segments to a value > 0 in the event
> archiving is not properly functioning*"
> "

I don't think I can disagree more with this additional wording, and I
*really* don't think we should be encouraging people to set a high
wal_keep_segments.  The specific case here looks like it just need to be
set to, exactly, '1', to ensure that the primary hasn't removed the last
WAL file that it archived.

Thanks!

Stephen

Attachment

signature.asc

pgsql-general by date:

From: Adrian Klaver
Date: 13 August 2018, 20:06:16
Subject: Re: Replication failure, slave requesting old segments

From: "Phil Endecott"
Date: 13 August 2018, 20:17:14
Subject: Re: Replication failure, slave requesting old segments

Re: Replication failure, slave requesting old segments - Mailing list pgsql-general

Attachment

Previous

Next