Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Resetting spilled txn statistics in pg_stat_replication
Date
Msg-id CA+fd4k75dsw3hYy=ow_vwOBZBj9g64yhVHUaXUkm9=31OqGCgw@mail.gmail.com
Whole thread Raw
In response to Re: Resetting spilled txn statistics in pg_stat_replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Resetting spilled txn statistics in pg_stat_replication
Re: Resetting spilled txn statistics in pg_stat_replication
List pgsql-hackers
On Tue, 30 Jun 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 30, 2020 at 6:38 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 29 Jun 2020 at 20:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 29, 2020 at 10:26 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > I agree that it would not be a common case that the user sets
> > > > different values for different processes. Based on that assumption, I
> > > > also think having the stats at slot-level is a good idea.
> > > >
> > >
> > > Okay.
> > >
> > > > But I might
> > > > want to have the reset function.
> > > >
> > >
> > > I don't mind but lets fist see how the patch for the basic feature
> > > looks and what is required to implement it?  Are you interested in
> > > writing the patch for this work?
> >
> > Yes, I'll write the draft patch.
> >
>
> Great, thanks.  One thing we can consider is that instead of storing
> the stats directly in the slot we can consider sending it to stats
> collector as suggested by Tomas.  Basically that can avoid contention
> around slots (See discussion in email [1]).  I have not evaluated any
> of the approaches in detail so you can let us know the advantage of
> one over another.  Now, you might be already considering this but I
> thought it is better to share what I have in mind rather than saying
> that later once you have the draft patch ready.

Thanks! Yes, I'm working on this patch while considering to send the
stats to stats collector.

I've attached PoC patch that implements a simple approach. I'd like to
discuss how we collect the replication slot statistics in the stats
collector before I bring the patch to completion.

In this PoC patch, we have the array of PgStat_ReplSlotStats struct
which has max_replication_slots entries. The backend and wal sender
send the slot statistics to the stats collector when decoding a commit
WAL record.

typedef struct PgStat_ReplSlotStats
{
    char            slotname[NAMEDATALEN];
    PgStat_Counter  spill_txns;
    PgStat_Counter  spill_count;
    PgStat_Counter  spill_bytes;
} PgStat_ReplSlotStats;

What I'd like to discuss are:

Since the unique identifier of replication slots is the name, the
process sends slot statistics along with slot name to stats collector.
I'm concerned about the amount of data sent to the stats collector and
the cost of searching the statistics within the statistics array (it’s
O(N) where N is max_replication_slots). Since the maximum length of
slot name is NAMEDATALEN (64 bytes default) and max_replication_slots
is unlikely a large number, I might be too worrying but it seems like
it’s better to avoid that if we can do that easily. An idea I came up
with is to use the index of slots (i.g., the index of
ReplicationSlotCtl->replication_slots[]) as the index of statistics of
slot in the stats collector. But since the index of slots could change
after the restart we need to synchronize the index of slots on both
array somehow. So I thought that we can determine the index of the
statistics of slots at ReplicationSlotAcquire() or
ReplicationSlotCreate(), but it will in turn need to read stats file
while holding ReplicationSlotControlLock to prevent the same index of
the statistics being used by the concurrent process who creating a
slot. I might be missing something though.

Second, as long as the unique identifier is the slot name there is no
convenient way to distinguish between the same name old and new
replication slots, so the backend process or wal sender process sends
a message to the stats collector to drop the replication slot at
ReplicationSlotDropPtr(). This strategy differs from what we do for
table, index, and function statistics. It might not be a problem but
I’m thinking a better way.

The new view name is also an open question. I prefer
pg_stat_replication_slots and to add stats of physical replication
slots to the same view in the future, rather than a separate view.

Aside from the above, this patch will change the most of the changes
introduced by commit 9290ad198b1 and introduce new code much. I’m
concerned whether such changes are acceptable at the time of beta 2.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Next
From: Amit Kapila
Date:
Subject: Re: Backpatch b61d161c14 (Introduce vacuum errcontext ...)