Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CAA4eK1K2xHOZziGpQKTO_8uFJo2NdwBZTYw3bJpEaBcJOXVWuA@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
|
List | pgsql-hackers |
On Thu, Jul 16, 2020 at 1:45 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 16 Jul 2020 at 15:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jul 10, 2020 at 2:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Jul 10, 2020 at 7:23 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Thu, 9 Jul 2020 at 12:11, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Wed, Jul 8, 2020 at 1:14 PM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > > > > > > > I think that using oids has another benefit that we don't need to send > > > > > > slot name to the stats collector along with the stats. Since the > > > > > > maximum size of slot name is NAMEDATALEN and we don't support the > > > > > > pgstat message larger than PGSTAT_MAX_MSG_SIZE (1000 bytes), if the > > > > > > user wants to increase NAMEDATALEN they might not be able to build. > > > > > > > > > > > > > > > > I think NAMEDATALEN is used for many other objects as well and I don't > > > > > think we want to change it in foreseeable future, so that doesn't > > > > > sound to be a good reason to invent OIDs for slots. OTOH, I do > > > > > understand it would be better to send OIDs than names for slots but I > > > > > am just not sure if it is a good idea to invent a new way to generate > > > > > OIDs (which is different from how we do it for other objects in the > > > > > system) for this purpose. > > > > > > > > I'm concerned that there might be users who are using custom > > > > PostgreSQL that increased NAMEDATALEN for some reason. But indeed, I > > > > also agree with your concerns. So perhaps we can go with the current > > > > PoC patch approach as the first version (i.g., sending slot drop > > > > message to stats collector). When we need such a unique identifier > > > > also for other purposes, we will be able to change this feature so > > > > that it uses that identifier for this statistics reporting purpose. > > > > > > > > > > Okay, feel to submit the version atop my revert patch. > > > > > > > Attached, please find the rebased version. I have kept prorows as 10 > > instead of 100 for pg_stat_get_replication_slots because I don't see > > much reason for keeping the value more than the default value of > > max_replication_slots. > > > > Thank you for rebasing the patch! Agreed. > > > As we are targeting this patch for PG14, so I think we can now add the > > functionality to reset the stats as well. What do you think? > > > > Yeah, I was also updating the patch while adding the reset functions. > > However, I'm concerned about the following part: > > +static int > +pgstat_replslot_index(const char *name) > +{ > + int i; > + > + Assert(nReplSlotStats <= max_replication_slots); > + for (i = 0; i < nReplSlotStats; i++) > + { > + if (strcmp(replSlotStats[i].slotname, name) == 0) > + return i; /* found */ > + } > + > + /* not found, register new slot */ > + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats)); > + memcpy(&replSlotStats[nReplSlotStats].slotname, name, NAMEDATALEN); > + return nReplSlotStats++; > +} > > +static void > +pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len) > +{ > + int idx; > + > + idx = pgstat_replslot_index(msg->m_slotname); > + Assert(idx >= 0 && idx < max_replication_slots); > > As long as we cannot rely on message ordering, the above assertion > could be false. For example, suppose that there is no unused > replication slots and the user: > > 1. drops the existing slot. > 2. creates a new slot. > > If the stats messages arrive in order of 2 and 1, the above assertion > is false or leads to memory corruption when assertions are not > enabled. > > A possible solution would be to add an in-use flag to > PgStat_ReplSlotStats indicating whether the stats for slot is used or > not. When receiving a drop message for a slot, the stats collector > just marks the corresponding stats as unused. When receiving the stats > report for a new slot but there is no unused stats slot, ignore it. > What do you think? > As of now, you have a boolean flag msg.m_drop to distinguish the drop message but we don't have a similar way to distinguish the 'create' message. What if have a way to distinguish 'create' message (we can probably keep some sort of flag to indicate the type of message (create, drop, update)) and then if the slot with the same name already exists, we ignore such a message. Now, we also need a way to create the entry for a slot for a normal stats update message as well to accommodate for the lost 'create' message. Does that make sense? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: