Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Enhancing Memory Context Statistics Reporting
Date
Msg-id 2eb8fd76-89f0-42e8-bc79-152d50c63016@vondra.me
Whole thread Raw
In response to Re: Enhancing Memory Context Statistics Reporting  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Enhancing Memory Context Statistics Reporting
List pgsql-hackers

On 1/24/25 14:47, Rahila Syed wrote:
> 
> Hi,
> 
> 
>     Just idea; as an another option, how about blocking new requests to
>     the target process (e.g., causing them to fail with an error or
>     returning NULL with a warning) if a previous request is still pending?
>     Users can simply retry the request if it fails. IMO failing quickly
>     seems preferable to getting stuck for a while in cases with concurrent
>     requests.
> 
> Thank you for the suggestion. I agree that it is better to fail 
> early and avoid waiting for a timeout in such cases. I will add a 
> "pending request" tracker for this in shared memory. This approach 
> will help prevent sending a concurrent request if a request for the
> same backend is still being processed.
> 

AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...

> IMO, one downside of throwing an error in such cases is that the
> users might wonder if they need to take a corrective action, even
> though the issue is actually going to solve itself and they just
> need to retry. Therefore, issuing a warning or displaying previously
> updated statistics might be a better alternative to throwing an
> error.
> 

Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.

All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?

I imagined it would work something like this:

requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anything

target backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)

I see v11 does almost this, except that it accepts somewhat stale data.
But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Convert sepgsql tests to TAP
Next
From: Tom Lane
Date:
Subject: Re: Convert sepgsql tests to TAP