Re: [HACKERS] Logical decoding on standby - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] Logical decoding on standby
Date
Msg-id CAMsr+YFnLvWH9zF0Z4SmrMzOnsOCsJC2tA=JrdsrB478OnfmuA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Logical decoding on standby  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Logical decoding on standby
List pgsql-hackers
On 23 March 2017 at 09:39, Andres Freund <andres@anarazel.de> wrote:

> We can't just assume that snapbuild is going to work correctly when it's
> prerequisites - pinned xmin horizon - isn't working.

Makes sense.

>> What do _you_ see as the minimum acceptable way to achieve the ability
>> for a logical decoding client to follow failover of an upstream to a
>> physical standby? In the end, you're one of the main people whose view
>> carries weight in this area, and I don't want to develop yet another
>
> I think your approach here wasn't that bad? There's a lot of cleaning
> up/shoring up needed, and we probably need a smarter feedback system.  I
> don't think anybody here has objected to the fundamental approach?

That's useful, thanks.

I'm not arguing that the patch as it stands is ready, and appreciate
the input re the general design.


> I still think decoding-on-standby is simply not the right approach as
> the basic/first HA approach for logical rep.  It's a nice later-on
> feature.  But that's an irrelevant aside.

I don't really agree that it's irrelevant.

Right now Pg has no HA capability for logical decoding clients. We've
now added logical replication, but it has no way to provide for
upstream node failure and ensure a consistent switch-over, whether to
a logical or physical replica. Since real world servers fail or need
maintenance, this is kind of a problem for practical production use.

Because of transaction serialization for commit-time order replay,
logical replication experiences saw-tooth replication lag, where large
or long xacts such as batch jobs effectively stall all later xacts
until they are fully replicated. We cannot currently start replicating
a big xact until it commits on the upstream, so that lag can easily be
~2x the runtime on the upstream.

So while you can do sync rep on a logical standby, it tends to result
in big delays on COMMITs relative to physical rep, even if app are
careful to keep transactions small. When the app DR planning people
come and ask you what the max data loss window / max sync rep lag is,
you have to say ".... dunno? depends on what else was running on the
server at the time."

AFAICT, changing those things will require the ability to begin
streaming reorder buffers for big xacts before commit, which as the
logical decoding on 2PC thread shows is not exactly trivial. We'll
also need to be able to apply them concurrently with other xacts on
the other end. Those are both big and complex things IMO, and I'll be
surprised if we can do either in Pg11 given that AFAIK nobody has even
started work on either of them or has a detailed plan.

Presuming we get some kind of failover to logical replica upstreams
into Pg11, it'll have significant limitations relative to what we can
deliver to users by using physical replication. Especially when it
comes to bounded-time lag for HA, sync rep, etc. And I haven't seen a
design for it, though Petr and I have discussed some with regards to
pglogical.

That's why I think we need to do HA on the physical side first.
Because it's going to take a long time to get equivalent functionality
for logical rep based upstreams, and when it is we'll still have to
teach management tools and other non-logical-rep logical decoding
clients about the new way of doing things.  Wheras for physical HA
setups to support logical downstreams requires only relatively minor
changes and gets us all the physical HA features _now_.

That's why we pursued failover slots - as a simple, minimal solution
to allowing logical decoding clients to inter-operate with Pg in a
physical HA configuration. TBH, I still think we should just add them.
Sure, they don't help us achieve decoding on standby, but they're a
lot simpler and they help Pg's behaviour with slots match user
expectations for how the rest of Pg behaves, i.e. if it's on the
master it'll be on the replica too.  And as you've said, decoding on
standby is a nice-to-have, wheras I think some kind of HA support is
rather more important.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: [HACKERS] Hash support for grouping sets
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)