Re: Logical decoding slots can go backwards when used from SQL, docs are wrong - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Date
Msg-id CAMsr+YHHPL=qRUeti+Yu0ax6FF4xKRyMVZ-o+QOR=uoPKSDamg@mail.gmail.com
Whole thread Raw
In response to Re: Logical decoding slots can go backwards when used from SQL, docs are wrong  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
List pgsql-hackers
On 11 March 2016 at 20:15, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Craig Ringer wrote:
> Hi all
>
> I think I found a couple of logical decoding issues while writing tests for
> failover slots.
>
> Despite the docs' claim that a logical slot will replay data "exactly
> once", a slot's confirmed_lsn can go backwards and the SQL functions can
> replay the same data more than once.We don't mark a slot as dirty if only
> its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> slots this means it also doesn't get replicated via WAL. After a master
> crash, or for failover slots after a promote event, the confirmed_lsn will
> go backwards.  Users of the SQL interface must keep track of the safely
> locally flushed slot position themselves and throw the repeated data away.
> Unlike with the walsender protocol it has no way to ask the server to skip
> that data.
>
> Worse, because we don't dirty the slot even a *clean shutdown* causes slot
> confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of
> all slots at the shutdown checkpoint, whether dirty or not, to address it.

Why don't we mark the slot dirty when confirmed_lsn advances?  If we fix
that, doesn't it fix the other problems too?

Yes, it does.

That'll cause slots to be written out at checkpoints when they otherwise wouldn't have to be, but I'd rather be doing a little more work in this case. Compared to the disk activity from WAL decoding etc the effect should be undetectable anyway.

Andres? Any objection to dirtying a slot when the confirmed lsn advances, so we write it out at the next checkpoint?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Next
From: Ashutosh Bapat
Date:
Subject: Re: Obsolete comment in postgres_fdw.c