Re: Replication slot drop message is sent after pgstats shutdown. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Replication slot drop message is sent after pgstats shutdown.
Date
Msg-id CAD21AoDHQgSpr11epL44qS=u8pJbBpx6G0E8SKewjDJpsEJtnA@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot drop message is sent after pgstats shutdown.  (Andres Freund <andres@anarazel.de>)
Responses Re: Replication slot drop message is sent after pgstats shutdown.  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Aug 31, 2021 at 12:45 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-08-31 11:37:08 +0900, Masahiko Sawada wrote:
> > At step #2, wal sender waits for another transaction started at step
> > #1 to complete after creating the replication slot. When the server is
> > stopping, wal sender process drops the slot on releasing the slot
> > since it's still RS_EPHEMERAL. Then, after dropping the slot we report
> > the message for dropping the slot (see ReplicationSlotDropPtr()).
> > These are executed in ReplicationSlotRelease() called by ProcKill()
> > which is called during calling on_shmem_exit callbacks, which is after
> > shutting down pgstats during before_shmem_exit callbacks. I’ve not
> > tested yet but I think this can potentially happen also when dropping
> > a temporary slot. ProcKill() also calls ReplicationSlotCleanup() to
> > clean up temporary slots.
> >
> > There are some ideas to fix this issue but I don’t think it’s a good
> > idea to move either ProcKill() or the slot releasing code to
> > before_shmem_exit in this case, like we did for other similar
> > issues[1][2].
>
> Yea, that's clearly not an option.
>
> I wonder why the replication slot stuff is in ProcKill() rather than its
> own callback. That's probably my fault, but I don't remember what lead
> to that.
>
>
> > Reporting the slot dropping message on dropping the slot
> > isn’t necessarily essential actually since autovacuums periodically
> > check already-dropped slots and report to drop the stats. So another
> > idea would be to move pgstat_report_replslot_drop() to a higher layer
> > such as ReplicationSlotDrop() and ReplicationSlotsDropDBSlots() that
> > are not called during callbacks. The replication slot stats are
> > dropped when it’s dropped via commands such as
> > pg_drop_replication_slot() and DROP_REPLICATION_SLOT. On the other
> > hand, for temporary slots and ephemeral slots, we rely on autovacuums
> > to drop their stats. Even if we delay to drop the stats for those
> > slots, pg_stat_replication_slots don’t show the stats for
> > already-dropped slots.
>
> Yea, we could do that, but I think it'd be nicer to find a bit more
> principled solution...
>
> Perhaps moving this stuff out from ProcKill() into its own
> before_shmem_exit() callback would do the trick?

You mean to move only the part of sending the message to its own
before_shmem_exit() callback? or move ReplicationSlotRelease() and
ReplicationSlotCleanup() from ProcKill() to it?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Added schema level support for publication.
Next
From: Yugo NAGATA
Date:
Subject: Re: Fix around conn_duration in pgbench