Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | CAA4eK1Jt40-gHsFkinREKMKU+0C4RPgzYu3SwFezcZFc9srnfA@mail.gmail.com Whole thread Raw |
In response to | Re: Replication slot stats misgivings (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Replication slot stats misgivings
|
List | pgsql-hackers |
On Sat, Apr 10, 2021 at 7:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > IIUC there are two problems in the case where the drop message is lost: > > 1. Writing beyond the end of replSlotStats. > This can happen if after restarting the number of slots whose stats > are stored in the stats file exceeds max_replication_slots. Vignesh's > patch addresses this problem. > > 2. The stats for the new slot are not recorded. > If the stats for already-dropped slots remain in replSlotStats, the > stats for the new slot cannot be registered due to the full of > replSlotStats. This can happen even when after restarting the number > of slots whose stats are stored in the stat file does NOT exceed > max_replication_slots as well as even during the server running. The > patch doesn’t address this problem. (If this happens, we will have to > reset all slot stats since pg_stat_reset_replication_slot() cannot > remove the slot stats with the non-existing name). > > I think we can use HTAB to store slot stats and have > pg_stat_get_replication_slot() inquire about stats by the slot name, > resolving both problems. By using HTAB we're no longer concerned about > the problem of writing stats beyond the end of the replSlotStats > array. Instead, we have to consider how and when to clean up the stats > for already-dropped slots. We can have the startup process send slot > names at startup time, which borrows the idea proposed by Amit. But > maybe we need to consider the case again where the message from the > startup process is lost? Another idea would be to have > pgstat_vacuum_stat() check the existing slots and call > pgstat_report_replslot_drop() if the slot in the stats file doesn't > exist. That way, we can continuously check the stats for > already-dropped slots. > Agreed, I think checking periodically via pgstat_vacuum_stat is a better idea then sending once at start up time. I also think using slot_name is better than using 'idx' (index in ReplicationSlotCtl->replication_slots) in this scheme because even after startup 'idx' changes we will be able to drop the dead slot. > I've written a PoC patch for the above idea; using HTAB and cleaning > up slot stats at pgstat_vacuum_stat(). The patch can be applied on top > of 0001 patch Vignesh proposed before[1]. > It seems Vignesh has changed patches based on the latest set of comments so you might want to rebase. > Please note that this cannot resolve the problem of ending up > accumulating the stats to the old slot if the slot is re-created with > the same name and the drop message is lost. To deal with this problem > I think we would need to use something unique identifier for each slot > instead of slot name. > Right, we can probably write it in comments and or docs about this caveat and the user can probably use pg_stat_reset_replication_slot for such slots. -- With Regards, Amit Kapila.
pgsql-hackers by date: