Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CABUevEwtv=a-yQ3LmvGZ1xmnSzDPsfMUTf048HfCXsdNgKnMDg@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
|
List | pgsql-hackers |
On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 12, 2020 at 11:20 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 12 Jun 2020 at 12:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Since the logical decoding intermediate files are written at per slots
> > > directory, I thought that corresponding these statistics to
> > > replication slots is also understandable for users.
> > >
> >
> > What I wanted to know is how will it help users to tune
> > logical_decoding_work_mem? Different backends can process from the
> > same slot, so it is not clear how user will be able to make any
> > meaning out of those stats.
>
> I thought that the user needs to constantly monitor them during one
> process is executing logical decoding and to see the increments. I
> might not fully understand but I guess the same is true for displaying
> them w.r.t. process. Since a process can do logical decoding several
> times using the same slot with a different setting, the user will need
> to monitor them several times.
>
Yeah, I think we might not be able to get exact measure but if we
divide total_size spilled by exec_count, we will get some rough idea
of what should be the logical_decoding_work_mem for that particular
session. For ex. consider the logical_decoding_work_mem is 100bytes
for a particular backend and the size spilled by that backend is 100
then I think you can roughly keep it to 200bytes if you want to avoid
spilling. Similarly one can compute its average value over multiple
executions. Does this make sense to you?
The thing that becomes really interesting is to analyze this across time. For example to identify patterns where it always spills at the same time as certain other things are happening. For that usecase, having a "predictable persistence" is important. You may not be able to afford setting logical_decoding_work_mem high enough to cover every possible scenario (if you did, then we would basically not need the spilling..), so you want to track down in relation to the rest of your application exactly when and how this is happening.
> > OTOH, it is easier to see how to make
> > meaning of these stats if we display them w.r.t process. Basically,
> > we have spill_count and spill_size which can be used to tune
> > logical_decoding_work_mem and also the activity of spilling happens at
> > process level, so it sounds like one-to-one mapping.
>
> Displaying them w.r.t process also seems a good idea but I'm still
> unclear what to display and how long these values are valid.
>
I feel till the lifetime of a process if we want to display the values
at process level but I am open to hear others (including yours) views
on this.
The problem with "lifetime of a process" is that it's not predictable. A replication process might "bounce" for any reason, and it is normally not a problem. But if you suddenly lose your stats when you do that, it starts to matter a lot more. Especially when you don't know if it bounced. (Sure you can look at the backend_start time, but that adds a whole different sets of complexitites).
> > The view
> will have the following columns for example?
>
> * pid
> * slot_name
> * spill_txns
> * spill_count
> * spill_bytes
> * exec_count
>
Yeah, these appear to be what I have in mind. Note that we can have
multiple entries of the same pid here because of slotname, there is
some value to display slotname but I am not completely sure if that is
a good idea but I am fine if you have a reason to include slotname?
Well, it's a general view so you can always GROUP BY that away if you want at reading point?
pgsql-hackers by date: