Re: shared-memory based stats collector - v66 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector - v66
Date
Msg-id 20220405204019.6yj7ocmpw352c2u5@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector - v66  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2022-04-02 01:16:48 -0700, Andres Freund wrote:
> I just noticed that the code doesn't appear to actually work like that right
> now. Whenever the timeout is reached, pgstat_report_stat() is called with
> force = true.
> 
> And even if the backend is busy running queries, once there's contention, the
> next invocation of pgstat_report_stat() will return the timeout relative to
> pendin_since, which then will trigger a force report via a very short timeout
> soon.
> 
> It might actually make sense to only ever return PGSTAT_RETRY_MIN_INTERVAL
> (with a slightly different name) from pgstat_report_stat() when blocked
> (limiting the max reporting delay for an idle connection) and to continue
> calling pgstat_report_stat(force = true).  But to only trigger force
> "internally" in pgstat_report_stat() when PGSTAT_MAX_INTERVAL is reached.
> 
> I think that'd mean we'd report after max PGSTAT_RETRY_MIN_INTERVAL in an idle
> connection, and try reporting every PGSTAT_RETRY_MIN_INTERVAL (increasing up
> to PGSTAT_MAX_INTERVAL when blocked) on busy connections.
> 
> Makes sense?

I tried to come up with a workload producing a *lot* of stats (multiple
function calls within a transaction, multiple transactions pipelined) and ran
it with 1000 clients (on a machine with 2 x (10 cores / 20 threads)). To
reduce overhead I set
  default_transaction_isolation=repeatable read
  track_activities=false
MVCC Snapshot acquisition is the clear bottleneck otherwise, followed by
pgstat_report_activity() (which, as confusing as it may sound, is independent
of this patch).

I do see a *small* amount of contention if I lower PGSTAT_MIN_INTERVAL to
1ms. Too small to ever be captured in pg_stat_activity.wait_event, but just
about visible in a profiler.


Which leads me to conclude we can simplify the logic significantly. Here's my
current comment explaining the logic:

 * Unless called with 'force', pending stats updates are flushed happen once
 * per PGSTAT_MIN_INTERVAL (1000ms). When not forced, stats flushes do not
 * block on lock acquisition, except if stats updates have been pending for
 * longer than PGSTAT_MAX_INTERVAL (60000ms).
 *
 * Whenever pending stats updates remain at the end of pgstat_report_stat() a
 * suggested idle timeout is returned. Currently this is always
 * PGSTAT_IDLE_INTERVAL (10000ms). Callers can use the returned time to set up
 * a timeout after which to call pgstat_report_stat(true), but are not
 * required to to do so.

Comments?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: should vacuum's first heap pass be read-only?
Next
From: "David G. Johnston"
Date:
Subject: Re: shared-memory based stats collector - v69