Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CABUevEz7jXQLSAzgFAb2AGBNJhL=ishkJOsfEFghpOy_12kRhg@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
Re: Resetting spilled txn statistics in pg_stat_replication |
List | pgsql-hackers |
On Tue, Jul 7, 2020 at 5:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 6 Jul 2020 at 20:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Second, as long as the unique identifier is the slot name there is no
> > > convenient way to distinguish between the same name old and new
> > > replication slots, so the backend process or wal sender process sends
> > > a message to the stats collector to drop the replication slot at
> > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > table, index, and function statistics. It might not be a problem but
> > > I’m thinking a better way.
> > >
> >
> > Can we rely on message ordering in the transmission mechanism (UDP)
> > for stats? The wiki suggests [1] we can't. If so, then this might
> > not work.
>
> Yeah, I'm also concerned about this. Another idea would be to have
> another unique identifier to distinguish old and new replication slots
> with the same name. For example, creation timestamp. And then we
> reclaim the stats of unused slots later like table and function
> statistics.
>
So, we need to have 'creation timestamp' as persistent data for slots
to achieve this? I am not sure of adding creation_time as a parameter
to identify for this case because users can change timings on systems
so it might not be a bullet-proof method but I agree that it can work
in general.
If we need them to be persistent across time like that, perhaps we simply need to assign oids to replication slots? That might simplify this problem quite a bit?
> On the other hand, if the ordering were to be reversed, we would miss
> that stats but the next stat reporting would create the new entry. If
> the problem is unlikely to happen in common case we can live with
> that.
>
Yeah, that is a valid point and I think otherwise also some UDP
packets can be lost so maybe we don't need to worry too much about
this. I guess we can add a comment in the code for such a case.
The fact that we may in theory lose some packages over UDP is the main reason we're using UDP in the first place, I believe :) But it's highly unlikely to happen in the real world I believe (and I think on some platforms impossible).
> > > Aside from the above, this patch will change the most of the changes
> > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > concerned whether such changes are acceptable at the time of beta 2.
> > >
> >
> > I think it depends on the final patch. My initial thought was that we
> > should do this for PG14 but if you are suggesting removing the changes
> > done by commit 9290ad198b1 then we need to think over it. I could
> > think of below options:
> > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We
> > were anyway having spilling without any work in PG13 but didn’t have
> > stats.
> > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > feature 9290ad198b1.
> > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > additionally have what we are discussing here for PG14. This means
> > that spilled stats at slot level will be available in PG14 via
> > pg_stat_replication_slots and for individual WAL senders it will be
> > available via pg_stat_replication both in PG13 and PG14. Even if we
> > can get your patch in PG13, we can still keep those in
> > pg_stat_replication.
> > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > for PG14. I don't think this will be a popular approach.
>
> I was thinking option (a) or (b). I'm inclined to option (a) since the
> PoC patch added a certain amount of new codes. I agree with you that
> it depends on the final patch.
>
Magnus, Tomas, others, do you have any suggestions on the above
options or let us know if you have any other option in mind?
I have a feeling it's far too late for (b) at this time. Regardless of the size of the patch, it feels that this can end up being a rushed and not thought-through-all-the-way one, in which case we may end up in an even worse position.
Much as I would like to have these stats earlier, I'm also leaning towards (a).
//Magnus
pgsql-hackers by date: