Re: Replication slot drop message is sent after pgstats shutdown. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Replication slot drop message is sent after pgstats shutdown.
Date
Msg-id 20210831034530.notpifrvnlouhoti@alap3.anarazel.de
Whole thread Raw
In response to Replication slot drop message is sent after pgstats shutdown.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Replication slot drop message is sent after pgstats shutdown.
List pgsql-hackers
Hi,

On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote:
> At step #2, wal sender waits for another transaction started at step
> #1 to complete after creating the replication slot. When the server is
> stopping, wal sender process drops the slot on releasing the slot
> since it's still RS_EPHEMERAL. Then, after dropping the slot we report
> the message for dropping the slot (see ReplicationSlotDropPtr()).
> These are executed in ReplicationSlotRelease() called by ProcKill()
> which is called during calling on_shmem_exit callbacks, which is after
> shutting down pgstats during before_shmem_exit callbacks. I’ve not
> tested yet but I think this can potentially happen also when dropping
> a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
> clean up temporary slots.
> 
> There are some ideas to fix this issue but I don’t think it’s a good
> idea to move either ProcKill() or the slot releasing code to
> before_shmem_exit in this case, like we did for other similar
> issues[1][2].

Yea, that's clearly not an option.

I wonder why the replication slot stuff is in ProcKill() rather than its
own callback. That's probably my fault, but I don't remember what lead
to that.


> Reporting the slot dropping message on dropping the slot
> isn’t necessarily essential actually since autovacuums periodically
> check already-dropped slots and report to drop the stats. So another
> idea would be to move pgstat_report_replslot_drop() to a higher layer
> such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
> are not called during callbacks. The replication slot stats are
> dropped when it’s dropped via commands such as
> pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
> hand, for temporary slots and ephemeral slots, we rely on autovacuums
> to drop their stats. Even if we delay to drop the stats for those
> slots, pg_stat_replication_slots don’t show the stats for
> already-dropped slots.

Yea, we could do that, but I think it'd be nicer to find a bit more
principled solution...

Perhaps moving this stuff out from ProcKill() into its own
before_shmem_exit() callback would do the trick?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.