Re: shared-memory based stats collector - v66 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector - v66
Date
Msg-id 20220402081648.kbapqdxi2rr3ha3w@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector - v66  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
Hi,

On 2022-03-25 17:24:18 +0900, Kyotaro Horiguchi wrote:
> > AFIXME: Isn't PGSTAT_MIN_INTERVAL way too long? What is the justification
> > for increasing it?
> 
> It is 1000ms in the comment just above but actually 10000ms. The
> number came from a discussion that if we have 1000 clients and each
> backend writes stats once per 0.5 seconds, totally we flush pending
> data to shared area at 2000 times per second which is too frequent. I
> raised it to 5000ms, then 10000ms.  So the expected maximum flush
> frequency is reduces to 100 times per second.  Of course it is
> assuming the worst case and the 10000ms is apparently too long for the
> average cases.
> 
> The current implement of pgstat postpones flushing if lock collision
> happens then postpone by at most 60s.  This is a kind of
> auto-averaging mechanishm.  It might be enough and we can reduce the
> PGSTAT_MIN_INTERVAL to 500ms or so.

I just noticed that the code doesn't appear to actually work like that right
now. Whenever the timeout is reached, pgstat_report_stat() is called with
force = true.

And even if the backend is busy running queries, once there's contention, the
next invocation of pgstat_report_stat() will return the timeout relative to
pendin_since, which then will trigger a force report via a very short timeout
soon.

It might actually make sense to only ever return PGSTAT_RETRY_MIN_INTERVAL
(with a slightly different name) from pgstat_report_stat() when blocked
(limiting the max reporting delay for an idle connection) and to continue
calling pgstat_report_stat(force = true).  But to only trigger force
"internally" in pgstat_report_stat() when PGSTAT_MAX_INTERVAL is reached.

I think that'd mean we'd report after max PGSTAT_RETRY_MIN_INTERVAL in an idle
connection, and try reporting every PGSTAT_RETRY_MIN_INTERVAL (increasing up
to PGSTAT_MAX_INTERVAL when blocked) on busy connections.

Makes sense?


I think we need to do something with the pgstat_report_stat() calls outside of
postgres.c. Otherwise there's nothing limiting their reporting delay, because
they don't have the timeout logic postgres.c has.  None of them is ever hot
enough to be problematic, so I think we should just make them pass force=true?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v66