Re: Detecting skipped data from logical slots (data silently skipped) - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Detecting skipped data from logical slots (data silently skipped)
Date
Msg-id CAMsr+YFiEWw1qZg8rxXnhHGZsWiTxajDZHPhBKiLmU-q+t=EmQ@mail.gmail.com
Whole thread Raw
In response to Re: Detecting skipped data from logical slots (data silently skipped)  (Andres Freund <andres@anarazel.de>)
Responses Re: Detecting skipped data from logical slots (data silently skipped)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 5 August 2016 at 14:07, Andres Freund <andres@anarazel.de> wrote:

> > The simplest fix would be to require downstreams to flush their replication
> > origin when they get a hot standby feedback message, before they send a
> > reply with confirmation. That could be somewhat painful for performance, but
> > can be alleviated somewhat by waiting for the downstream postgres to get
> > around to doing a flush anyway and only forcing it if we're getting close to
> > the walsender timeout. That's pretty much what BDR and pglogical do when
> > applying transactions to avoid having to do a disk flush for each and every
> > applied xact. Then we change START_REPLICATION ... LOGICAL so it ERRORs if
> > you ask for a too-old LSN rather than silently ignoring it.
>
> That's basically just proposing to revert this broken optimization,
> IIUC, and instead just try not to flush too often on the standby.

The effect of the optimization is *massive* if you are replicating a
less active database, or a less active subset of a database, in a
cluster with lots of other activity. I don't think that can just be
disregard, to protect against something with plenty of other failure
scenarios.


Right. Though if we flush lazily I'm surprised the effect is that big, you're the one who did the work and knows the significance of it.

All I'm trying to say is that I think the current behaviour is too dangerous. It doesn't just lead to failure, but easy, undetectable, silent failure when users perform common and simple tasks like starting a snapshot or filesystem-level pg_start_backup() copy of a DB. The only reason it can't happen for pg_basebackup too is that we omit slots during pg_basebackup . That inconsistency between snapshot/fs-level and pg_basebackup is unfortunate but understandable.

So I'm not saying "this whole idea must go". I'm saying I think it's to permissive and needs to be able to be stricter about what it's allowed to skip, so we can differentiate between "nothing interesting here" and "um, I think someone else consumed data I needed, I'd better bail out now". I've never been comfortable with the skipping behaviour and found it confusing right from the start, but now I have definite cases it can cause silent inconsistencies and really think it needs to be modified.

Robert's point that we could keep track of the skippable range is IMO a good one. An extra slot attribute with the last LSN that resulted in the output plugin doing a write to the client would be sufficient, at least at this point. To anticipate future needs where we might want to allow output plugins to ignore some things, control could be handed to the output plugin by allowing it to also make a function call for the position to be explicitly advanced even if it performs no writes.

That way we can safely skip ahead if the client asks us for an LSN equal to or after the last real data we sent them, but refuse to skip if we sent them data after the LSN they're asking for.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: No longer possible to query catalogs for index capabilities?
Next
From: Pavel Stehule
Date:
Subject: Re: garbage in xml regress test