Home > mailing lists

Re: [HACKERS] Replication origins and timelines - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: [HACKERS] Replication origins and timelines
Date	June 1, 2017 07:36:22
Msg-id	CAMsr+YE1SZL4oSB9V5U=tS=U6AxXNH8N8xFx-0hNq-BcOCSXeg@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Replication origins and timelines (Andres Freund <andres@anarazel.de>)
List	pgsql-hackers

Tree view

On 1 June 2017 at 09:23, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2017-06-01 09:12:04 +0800, Craig Ringer wrote:
>> TL;DR: replication origins track LSN without timeline. This is
>> ambiguous when physical failover is present since XXXXXXXX/XXXXXXXX
>> can now represent more than one state due to timeline forks with
>> promotions. Replication origins should track timelines so we can tell
>> the difference, I propose to patch them accordingly for pg11.
>
> I'm not quite convinced that this should be tracked at the origin level.
> If you fail over physically, shouldn't we also reconfigure logical
> replication?
>
> Even if we decide this is necessary, I *strongly* suggest trying to get
> the existing standby decoding etc wrapped up before starting something
> nontrival afresh.

Yeah, I'm not thinking of leaping straight to a patch before we've got
the rep on standby stuff nailed down. I just wanted to raise early
discussion to make sure it's not entirely the wrong path and/or
totally hopeless for core.

Logical decoding output plugins would need to keep track of the
timeline and send an extra message informing the downstream of a
timeline change whenever they see a new timeline. Or include it in all
messages (see: extra overhead). Since we don't stop a decoding session
when we hit a timeline boundary and force re-connection. (Nor can we,
since at some point our restart_lsn will be on the old timeline but
the first commits will be on the new timeline). I'll need to think
about if/how the decoding plugin can reliably do that.

>> Take master A, its physical replica B, and logical decoding client X
>> streaming changes from A. B is lagging. A is at lsn 1/1000, B is only
>> at 1/500. C has replicated from A up to 1/1000, when A fails. We
>> promote B to replace A. Now C connects to B, and requests to resume at
>> LSN 1/1000.
>
> Wouldn't it be better to solve this by querying the new master's
> timeline history, and checking whether the current replay point is
> pre/post fork?

That could work.

The decoding client would need to track the last-commit timeline in
its own metadata if we're not letting it put it in the replication
origin. Manageable, if awkward.

Clients would need to know how to fetch and parse timeline history
files, which is an irritating thing for every decoding client that
wants to support failover to have to support. But I guess it's
manageable, if not friendly. And non-Pg-based downstreams would have
to do it anyway.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Stephen Frost
Date: 01 June 2017, 07:36:14
Subject: Re: [HACKERS] Replication origins and timelines

From: Stephen Frost
Date: 01 June 2017, 07:37:29
Subject: Re: [HACKERS] Replication origins and timelines

Re: [HACKERS] Replication origins and timelines - Mailing list pgsql-hackers

Previous

Next