Home > mailing lists

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date	November 8 00:39:55
Msg-id	8f334886-d69a-4e68-acb5-961390f3ed44@vondra.me Whole thread Raw
List	pgsql-hackers

Tree view

Hi,

I kept investigating this, but I haven't made much progress. I still
don't understand why would it be OK to move any of the LSN fields
backwards - certainly for fields like confirm_flush or restart_lsn.

I did a simple experiment - added asserts to the couple places in
logical.c updating the the LSN fields, checking the value is increased.
But then I simply ran make check-world, instead of the stress test.

And that actually fails too, 040_standby_failover_slots_sync.pl triggers
this

    {
        SpinLockAcquire(&MyReplicationSlot->mutex);
        Assert(MyReplicationSlot->data.confirmed_flush <= lsn);
        MyReplicationSlot->data.confirmed_flush = lsn;
        SpinLockRelease(&MyReplicationSlot->mutex);
    }

So this moves confirm_flush back, albeit only by a tiny amount (I've
seen ~56 byte difference). I don't have an example of this causing an
issue in practice, but I note that CheckPointReplicationSlots does this:

    if (is_shutdown && SlotIsLogical(s))
    {
        SpinLockAcquire(&s->mutex);

        if (s->data.invalidated == RS_INVAL_NONE &&
            s->data.confirmed_flush > s->last_saved_confirmed_flush)
        {
            s->just_dirtied = true;
            s->dirty = true;
        }
        SpinLockRelease(&s->mutex);
    }

to determine if a slot needs to be flushed to disk during checkpoint. So
I guess it's possible we save a slot to disk at some LSN, then the
confirm_flush moves backward, and we fail to sync the slot to disk.

But I don't have a reproducer for this ...


I also noticed a strange difference between LogicalIncreaseXminForSlot
and LogicalIncreaseRestartDecodingForSlot.

The structure of LogicalIncreaseXminForSlot looks like this:

    if (TransactionIdPrecedesOrEquals(xmin, slot->data.catalog_xmin))
    {
    }
    else if (current_lsn <= slot->data.confirmed_flush)
    {
        ... update candidate fields ...
    }
    else if (slot->candidate_xmin_lsn == InvalidXLogRecPtr)
    {
        ... update candidate fields ...
    }

while LogicalIncreaseRestartDecodingForSlot looks like this:

    if (restart_lsn <= slot->data.restart_lsn)
    {
    }
    else if (current_lsn <= slot->data.confirmed_flush)
    {
        ... update candidate fields ...
    }

    if (slot->candidate_restart_valid == InvalidXLogRecPtr)
    {
        ... update candidate fields ...
    }

Notice that LogicalIncreaseXminForSlot has the third block guarded by
"else if", while LogicalIncreaseRestartDecodingForSlot has "if". Isn't
that a bit suspicious, considering the functions do the same thing, just
for different fields? I don't know if this is dangerous, the comments
suggest it may just waste extra effort after reconnect.


regards

-- 
Tomas Vondra

pgsql-hackers by date:

From: Andres Freund
Date: 08 November, 00:19:42
Subject: Re: Commit Timestamp and LSN Inversion issue

From: Jacob Champion
Date: 08 November, 00:44:25
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible

Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

Previous

Next