Attached is a patch to mark a logical replication slot as dirty if its confirmed lsn is changed.
Aesthetically I'm not sure if it's better to do it per this patch and call ReplicationSlotMarkDirty after we release the spinlock, add a new ReplicationSlotMarkDirtyLocked() that skips the spinlock acquisition, or add a bool is_locked arg to ReplicationSlotMarkDirty and update all call sites. All will have the same result.
I've confirmed this works as expected as part of the failover slots test suite but it'd be pretty seriously cumbersome to extract. If anyone feels strongly about it I can add a test that shows that
- start master
- create slot
- do some stuff
- replay from slot
- fast-stop master
- start master
- replay from slot
doesn't see the same data a second time, but if we repeat it using immediate stop it will see the same data when replaying again.
Also attached is another patch to amend the docs to reflect the fact that a slot can actually replay the same change more than once. I avoided the strong temptation to update other parts of the docs nearby.
Both these are fixes that should IMO be applied to 9.6.