Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date
Msg-id 8f334886-d69a-4e68-acb5-961390f3ed44@vondra.me
Whole thread Raw
List pgsql-hackers
Hi,

I kept investigating this, but I haven't made much progress. I still
don't understand why would it be OK to move any of the LSN fields
backwards - certainly for fields like confirm_flush or restart_lsn.

I did a simple experiment - added asserts to the couple places in
logical.c updating the the LSN fields, checking the value is increased.
But then I simply ran make check-world, instead of the stress test.

And that actually fails too, 040_standby_failover_slots_sync.pl triggers
this

    {
        SpinLockAcquire(&MyReplicationSlot->mutex);
        Assert(MyReplicationSlot->data.confirmed_flush <= lsn);
        MyReplicationSlot->data.confirmed_flush = lsn;
        SpinLockRelease(&MyReplicationSlot->mutex);
    }

So this moves confirm_flush back, albeit only by a tiny amount (I've
seen ~56 byte difference). I don't have an example of this causing an
issue in practice, but I note that CheckPointReplicationSlots does this:

    if (is_shutdown && SlotIsLogical(s))
    {
        SpinLockAcquire(&s->mutex);

        if (s->data.invalidated == RS_INVAL_NONE &&
            s->data.confirmed_flush > s->last_saved_confirmed_flush)
        {
            s->just_dirtied = true;
            s->dirty = true;
        }
        SpinLockRelease(&s->mutex);
    }

to determine if a slot needs to be flushed to disk during checkpoint. So
I guess it's possible we save a slot to disk at some LSN, then the
confirm_flush moves backward, and we fail to sync the slot to disk.

But I don't have a reproducer for this ...


I also noticed a strange difference between LogicalIncreaseXminForSlot
and LogicalIncreaseRestartDecodingForSlot.

The structure of LogicalIncreaseXminForSlot looks like this:

    if (TransactionIdPrecedesOrEquals(xmin, slot->data.catalog_xmin))
    {
    }
    else if (current_lsn <= slot->data.confirmed_flush)
    {
        ... update candidate fields ...
    }
    else if (slot->candidate_xmin_lsn == InvalidXLogRecPtr)
    {
        ... update candidate fields ...
    }

while LogicalIncreaseRestartDecodingForSlot looks like this:

    if (restart_lsn <= slot->data.restart_lsn)
    {
    }
    else if (current_lsn <= slot->data.confirmed_flush)
    {
        ... update candidate fields ...
    }

    if (slot->candidate_restart_valid == InvalidXLogRecPtr)
    {
        ... update candidate fields ...
    }

Notice that LogicalIncreaseXminForSlot has the third block guarded by
"else if", while LogicalIncreaseRestartDecodingForSlot has "if". Isn't
that a bit suspicious, considering the functions do the same thing, just
for different fields? I don't know if this is dangerous, the comments
suggest it may just waste extra effort after reconnect.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Commit Timestamp and LSN Inversion issue
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible