Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Resetting spilled txn statistics in pg_stat_replication
Date
Msg-id 20200620214836.7ncmxorvdkmvzepb@development
Whole thread Raw
In response to Re: Resetting spilled txn statistics in pg_stat_replication  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
Hi,

Sorry for neglecting this thread for the last couple days ...

In general, I agree it's somewhat unfortunate the stats are reset when
the walsender exits. This was mostly fine for tuning of the spilling
(change value -> restart -> see stats) but for proper monitoring this
is somewhat problematic. I simply considered these fields somewhat
similar to lag monitoring, not from the "monitoring" POV.


On Thu, Jun 11, 2020 at 11:09:00PM +0900, Masahiko Sawada wrote:
>
> ...
>
>Since the logical decoding intermediate files are written at per slots
>directory, I thought that corresponding these statistics to
>replication slots is also understandable for users. I was thinking
>something like pg_stat_logical_replication_slot view which shows
>slot_name and statistics of only logical replication slots. The view
>always shows rows as many as existing replication slots regardless of
>logical decoding being running. I think there is no big difference in
>how users use these statistics values between maintaining at slot
>level and at logical decoding level.
>
>In logical replication case, since we generally don’t support setting
>different logical_decoding_work_mem per wal senders, every wal sender
>will decode the same WAL stream with the same setting, meaning they
>will similarly spill intermediate files. Maybe the same is true
>statistics of streaming. So having these statistics per logical
>replication might not help as of now.
>

I think the idea to track these stats per replication slot (rather than
per walsender) is the right approach. We should extend statistics
collector to keep one entry per replication slot and have a new stats
view called e.g. pg_stat_replication_slots, which could be reset just
like other stats in the collector.

I don't quite understand the discussion about different backends using
logical_decoding_work_mem - why would this be an issue? Surely we have
this exact issue e.g. with tracking index vs. sequential scans and GUCs
like random_page_cost. That can change over time too, different backends
may use different values, and yet we don't worry about resetting the
number of index scans for a table etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Operator class parameters and sgml docs
Next
From: Tomas Vondra
Date:
Subject: Re: Resetting spilled txn statistics in pg_stat_replication