Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CABUevEzYHwsTu-19Kz81YDg2W6jVOZ4AYt4GzD6D_1awdN4-SA@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
On Tue, Jun 23, 2020 at 12:18 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Tue, Jun 23, 2020 at 10:58:18AM +0530, Amit Kapila wrote:
>On Tue, Jun 23, 2020 at 9:32 AM Masahiko Sawada
><masahiko.sawada@2ndquadrant.com> wrote:
>>
>> On Sun, 21 Jun 2020 at 06:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >
>> > >
>> > >What if the decoding has been performed by multiple backends using the
>> > >same slot? In that case, it will be difficult to make the judgment
>> > >for the value of logical_decoding_work_mem based on stats. It would
>> > >make sense if we provide a way to set logical_decoding_work_mem for a
>> > >slot but not sure if that is better than what we have now.
>> > >
>>
>> I thought that the stats are relevant to what
>> logical_decoding_work_mem value was but not with who performed logical
>> decoding. So even if multiple backends perform logical decoding using
>> the same slot, the user can directly use stats as long as
>> logical_decoding_work_mem value doesn’t change.
>>
>
>I think if you maintain these stats at the slot level, you probably
>need to use spinlock or atomic ops in order to update those as slots
>can be used from multiple backends whereas currently, we don't need
>that.
IMHO storing the stats in the slot itself is a bad idea. We have the
statistics collector for exactly this purpose, and it's receiving data
over UDP without any extra locking etc.
Yeah, that seems much more appropriate. Of course, where they are exposed is a different question.
>> > >What problems do we see in displaying these for each process? I think
>> > >users might want to see the stats for the exited processes or after
>> > >server restart but I think both of those are not even possible today.
>> > >I think the stats are available till the corresponding WALSender
>> > >process is active.
>>
>> I might want to see the stats for the exited processes or after server
>> restart. But I'm inclined to agree with displaying the stats per
>> process if the stats are displayed on a separate view (e.g.
>> pg_stat_replication_slots).
>>
>
>Yeah, as told previously, this makes more sense to me.
>
>Do you think we should try to write a POC patch using a per-process
>entry approach and see what difficulties we are facing and does it
>give the stats in a way we are imagining but OTOH, we can wait for
>some more to see if there is clear winner approach here?
>
I may be missing something obvious, but I still see no point in tracking
per-process stats. We don't have that for other stats, and I'm not sure
how common is the scenario when a given slot is decoded by many
backends. I'd say vast majority of cases are simply running decoding
from a walsender, which may occasionally restart, but I doubt the users
are interested in per-pid data - they probably want aggregated data.
Well, technically we do -- we have the pg_stat_xact_* views. However, those are only viewable from *inside* the session itself (which can sometimes be quite annoying).
This does somewhat apply in that normal transactions send their stats batches at transaction end. If this is data we'd be interested in viewing inside of that, a more direct exposure would be needed -- such as the way we do with LSNs in pg_stat_replication or whatever.
For long-term monitoring, people definitely want aggregate data I'd say. The "realtime data" if we call it that is in my experience mostly interesting if you want to define alerts etc ("replication standby is too far behind" is alertable through that, whereas things like "total amount of replication traffic over the past hour" is something that's more trend-alertable which is typically handled in a separate system pulling the aggregate stats)
Can someone explain a plausible scenario for which tracking per-process
stats would be needed, and simply computing deltas would not work? How
will you know which old PID is which, what will you do when a PID is
reused, and so on?
I fail to see that one as well, in a real-world scenario. Maybe if you want to do a one-off point-tuning of one tiny piece of a system? But you will then also need to long term statistics to follow-up if what you did was correct anyway...
pgsql-hackers by date: