Hi,
On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote:
> At step #2, wal sender waits for another transaction started at step
> #1 to complete after creating the replication slot. When the server is
> stopping, wal sender process drops the slot on releasing the slot
> since it's still RS_EPHEMERAL. Then, after dropping the slot we report
> the message for dropping the slot (see ReplicationSlotDropPtr()).
> These are executed in ReplicationSlotRelease() called by ProcKill()
> which is called during calling on_shmem_exit callbacks, which is after
> shutting down pgstats during before_shmem_exit callbacks. I’ve not
> tested yet but I think this can potentially happen also when dropping
> a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
> clean up temporary slots.
>
> There are some ideas to fix this issue but I don’t think it’s a good
> idea to move either ProcKill() or the slot releasing code to
> before_shmem_exit in this case, like we did for other similar
> issues[1][2].
Yea, that's clearly not an option.
I wonder why the replication slot stuff is in ProcKill() rather than its
own callback. That's probably my fault, but I don't remember what lead
to that.
> Reporting the slot dropping message on dropping the slot
> isn’t necessarily essential actually since autovacuums periodically
> check already-dropped slots and report to drop the stats. So another
> idea would be to move pgstat_report_replslot_drop() to a higher layer
> such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
> are not called during callbacks. The replication slot stats are
> dropped when it’s dropped via commands such as
> pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
> hand, for temporary slots and ephemeral slots, we rely on autovacuums
> to drop their stats. Even if we delay to drop the stats for those
> slots, pg_stat_replication_slots don’t show the stats for
> already-dropped slots.
Yea, we could do that, but I think it'd be nicer to find a bit more
principled solution...
Perhaps moving this stuff out from ProcKill() into its own
before_shmem_exit() callback would do the trick?
Greetings,
Andres Freund