On 14/03/16 08:08, Craig Ringer wrote:
> On 11 March 2016 at 20:15, Alvaro Herrera <alvherre@2ndquadrant.com
> <mailto:alvherre@2ndquadrant.com>> wrote:
>
> Craig Ringer wrote:
> > Hi all
> >
> > I think I found a couple of logical decoding issues while writing tests for
> > failover slots.
> >
> > Despite the docs' claim that a logical slot will replay data "exactly
> > once", a slot's confirmed_lsn can go backwards and the SQL functions can
> > replay the same data more than once.We don't mark a slot as dirty if only
> > its confirmed_lsn is advanced, so it isn't flushed to disk. For failover
> > slots this means it also doesn't get replicated via WAL. After a master
> > crash, or for failover slots after a promote event, the confirmed_lsn will
> > go backwards. Users of the SQL interface must keep track of the safely
> > locally flushed slot position themselves and throw the repeated data away.
> > Unlike with the walsender protocol it has no way to ask the server to skip
> > that data.
> >
> > Worse, because we don't dirty the slot even a *clean shutdown* causes slot
> > confirmed_lsn to go backwards. That's a bug IMO. We should force a flush of
> > all slots at the shutdown checkpoint, whether dirty or not, to address it.
>
> Why don't we mark the slot dirty when confirmed_lsn advances? If we fix
> that, doesn't it fix the other problems too?
>
>
> Yes, it does.
>
It will not change the fact that slot can go backwards however even in
clean shutdown of the server as in walsender the confirmed_lsn only
changes after feedback message so if client crashes it won't get updated
(for obvious reasons).
You btw can emulate asking for the specific LSN in SQL interface by
first calling the pg_logical_slot_get_changes function with upto_lsn set
to whatever lsn you expect to start at, but it's ugly.
-- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services