Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CA+fd4k75dsw3hYy=ow_vwOBZBj9g64yhVHUaXUkm9=31OqGCgw@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
Re: Resetting spilled txn statistics in pg_stat_replication |
List | pgsql-hackers |
On Tue, 30 Jun 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 29 Jun 2020 at 20:37, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > I agree that it would not be a common case that the user sets > > > > different values for different processes. Based on that assumption, I > > > > also think having the stats at slot-level is a good idea. > > > > > > > > > > Okay. > > > > > > > But I might > > > > want to have the reset function. > > > > > > > > > > I don't mind but lets fist see how the patch for the basic feature > > > looks and what is required to implement it? Are you interested in > > > writing the patch for this work? > > > > Yes, I'll write the draft patch. > > > > Great, thanks. One thing we can consider is that instead of storing > the stats directly in the slot we can consider sending it to stats > collector as suggested by Tomas. Basically that can avoid contention > around slots (See discussion in email [1]). I have not evaluated any > of the approaches in detail so you can let us know the advantage of > one over another. Now, you might be already considering this but I > thought it is better to share what I have in mind rather than saying > that later once you have the draft patch ready. Thanks! Yes, I'm working on this patch while considering to send the stats to stats collector. I've attached PoC patch that implements a simple approach. I'd like to discuss how we collect the replication slot statistics in the stats collector before I bring the patch to completion. In this PoC patch, we have the array of PgStat_ReplSlotStats struct which has max_replication_slots entries. The backend and wal sender send the slot statistics to the stats collector when decoding a commit WAL record. typedef struct PgStat_ReplSlotStats { char slotname[NAMEDATALEN]; PgStat_Counter spill_txns; PgStat_Counter spill_count; PgStat_Counter spill_bytes; } PgStat_ReplSlotStats; What I'd like to discuss are: Since the unique identifier of replication slots is the name, the process sends slot statistics along with slot name to stats collector. I'm concerned about the amount of data sent to the stats collector and the cost of searching the statistics within the statistics array (it’s O(N) where N is max_replication_slots). Since the maximum length of slot name is NAMEDATALEN (64 bytes default) and max_replication_slots is unlikely a large number, I might be too worrying but it seems like it’s better to avoid that if we can do that easily. An idea I came up with is to use the index of slots (i.g., the index of ReplicationSlotCtl->replication_slots[]) as the index of statistics of slot in the stats collector. But since the index of slots could change after the restart we need to synchronize the index of slots on both array somehow. So I thought that we can determine the index of the statistics of slots at ReplicationSlotAcquire() or ReplicationSlotCreate(), but it will in turn need to read stats file while holding ReplicationSlotControlLock to prevent the same index of the statistics being used by the concurrent process who creating a slot. I might be missing something though. Second, as long as the unique identifier is the slot name there is no convenient way to distinguish between the same name old and new replication slots, so the backend process or wal sender process sends a message to the stats collector to drop the replication slot at ReplicationSlotDropPtr(). This strategy differs from what we do for table, index, and function statistics. It might not be a problem but I’m thinking a better way. The new view name is also an open question. I prefer pg_stat_replication_slots and to add stats of physical replication slots to the same view in the future, rather than a separate view. Aside from the above, this patch will change the most of the changes introduced by commit 9290ad198b1 and introduce new code much. I’m concerned whether such changes are acceptable at the time of beta 2. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: