Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers
From | Rahila Syed |
---|---|
Subject | Re: Enhancing Memory Context Statistics Reporting |
Date | |
Msg-id | CAH2L28vAxcePsqV1AbjYeU4QAojyXS5M39d5MVGCSgAcoNybkQ@mail.gmail.com Whole thread Raw |
In response to | Re: Enhancing Memory Context Statistics Reporting (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
Hi,
On Sat, Jan 25, 2025 at 3:50 AM Tomas Vondra <tomas@vondra.me> wrote:
On 1/24/25 14:47, Rahila Syed wrote:
>
> Hi,
>
>
> Just idea; as an another option, how about blocking new requests to
> the target process (e.g., causing them to fail with an error or
> returning NULL with a warning) if a previous request is still pending?
> Users can simply retry the request if it fails. IMO failing quickly
> seems preferable to getting stuck for a while in cases with concurrent
> requests.
>
> Thank you for the suggestion. I agree that it is better to fail
> early and avoid waiting for a timeout in such cases. I will add a
> "pending request" tracker for this in shared memory. This approach
> will help prevent sending a concurrent request if a request for the
> same backend is still being processed.
>
AFAIK these failures should be extremely rare - we're only talking about
that because the workload I used for testing is highly concurrent, i.e.
it requests memory context info extremely often. I doubt anyone sane is
going to do that in practice ...
Yes, that makes sense.
> IMO, one downside of throwing an error in such cases is that the
> users might wonder if they need to take a corrective action, even
> though the issue is actually going to solve itself and they just
> need to retry. Therefore, issuing a warning or displaying previously
> updated statistics might be a better alternative to throwing an
> error.
>
Wouldn't this be mostly mitigated by adding proper detail/hint to the
error message? Sure, the user can always ignore that (especially when
calling this from a script), but well ... we can only do so much.
OK.
All this makes me think about how we shared pgstat data before the shmem
approach was introduced in PG15. Until then the process signaled pgstat
collector, and the collector wrote the statistics into a file, with a
timestamp. And the process used the timestamp to decide if it's fresh
enough ... Wouldn't the same approach work here?
I imagined it would work something like this:
requesting backend:
-------------------
* set request_ts to current timestamp
* signal the target process, to generate memory context info
* wait until the DSA gets filled with stats_ts > request_ts
* return the data, don't erase anything
target backend
--------------
* clear the signal
* generate the statistics
* set stats_ts to current timestamp
* wait all the backends waiting for the stats (through CV)
I see v11 does almost this, except that it accepts somewhat stale data.
That's correct.
But why would that be necessary? I don't think it's needed, and I don't
think we should accept data from before the process sends the signal.
This is done in an attempt to avoid concurrent requests from timing out.
In such cases, data in response to another request is likely to already be in the
In such cases, data in response to another request is likely to already be in the
dynamic shared memory. Hence instead of waiting for the latest data and risking a
timeout, the approach displays available statistics that are newer than a defined
threshold. Additionally, since we can't distinguish between sequential and
threshold. Additionally, since we can't distinguish between sequential and
concurrent requests, we accept somewhat stale data for all requests.
I realize this approach has some issues, mainly regarding how to determine
an appropriate threshold value or a limit for old data.
Therefore, I agree that it makes sense to display the data that is published
Therefore, I agree that it makes sense to display the data that is published
after the request is made. If such data can't be published due to concurrent
requests or other delays, the function should detect this and return as soon as
possible.
Thank you,
Rahila Syed
pgsql-hackers by date: