Re: Detecting skipped data from logical slots (data silently skipped) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Detecting skipped data from logical slots (data silently skipped)
Date
Msg-id CA+TgmobPk2C4zNaS6+qf9RodbA=CpkrrU=3x9FF_5KfR6jix6A@mail.gmail.com
Whole thread Raw
In response to Detecting skipped data from logical slots (data silently skipped)  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Detecting skipped data from logical slots (data silently skipped)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Aug 3, 2016 at 9:39 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I think we have a bit of a problem with the behaviour specified for logical
> slots, one that makes it hard to prevent a outdated snapshot or backup of a
> logical-slot-using downstream from knowing it's missing a chunk of data
> that's been consumed from a slot. That's not great since slots are supposed
> to ensure a continuous, gapless data stream.
>
> If the downstream requests that logical decoding restarts at an LSN older
> than the slot's confirmed_flush_lsn, we silently ignore the client's request
> and start replay at the confirmed_flush_lsn. That's by design and fine
> normally, since we know the gap LSNs contained no transactions of interest
> to the downstream.

Wow, that sucks.

> The cause is an optimisation intended to allow the downstream to avoid
> having to do local writes and flushes when the upstream's activity isn't of
> interest to it and doesn't result in replicated rows. When the upstream does
> a bunch of writes to another database or otherwise produces WAL not of
> interest to the downstream we send the downstream keepalive messages that
> include the upstream's current xlog position and the client replies to
> acknowledge it's seen the new LSN. But, so that we can avoid disk flushes on
> the downstream, we permit it to skip advancing its replication origin in
> response to those keepalives. We continue to advance the confirmed_flush_lsn
> and restart_lsn in the replication slot on the upstream so we can free WAL
> that's not needed and move the catalog_xmin up. The replication origin on
> the downstream falls behind the confirmed_flush_lsn on the upstream.

This seems entirely too clever.  The upstream could safely remember
that if the downstream asks for WAL position X it's safe to begin
streaming from WAL position Y because nothing in the middle is
interesting, but it can hardly decide to unilaterally ignore the
request position.

> The simplest fix would be to require downstreams to flush their replication
> origin when they get a hot standby feedback message, before they send a
> reply with confirmation. That could be somewhat painful for performance, but
> can be alleviated somewhat by waiting for the downstream postgres to get
> around to doing a flush anyway and only forcing it if we're getting close to
> the walsender timeout. That's pretty much what BDR and pglogical do when
> applying transactions to avoid having to do a disk flush for each and every
> applied xact. Then we change START_REPLICATION ... LOGICAL so it ERRORs if
> you ask for a too-old LSN rather than silently ignoring it.

That's basically just proposing to revert this broken optimization,
IIUC, and instead just try not to flush too often on the standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Prevent "snapshot too old" from trying to return pruned TOAST tu
Next
From: Michael Paquier
Date:
Subject: Re: pg_ctl promote wait