Re: per backend I/O statistics - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: per backend I/O statistics |
Date | |
Msg-id | ZwOvVnmVt0PjWDjh@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: per backend I/O statistics (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
List | pgsql-hackers |
Hi, On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote: > On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote: > > On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote: > >> As an additional benefit of this approach, the client can set a > >> connection variable, for example, no_backend_iostats to true, or set > >> its inverse variable to false, to restrict memory usage to only the > >> required backends. > > > > Thanks for the feedback! > > > > If we were to add an on/off switch button, I think I'd vote for a global one > > instead. Indeed, I see this feature more like an "Administrator" one, where > > the administrator wants to be able to find out which session is reponsible of > > what (from an I/O point of view): like being able to anwser "which session is > > generating this massive amount of reads"? > > > > If we allow each session to disable the feature then the administrator > > would lost this ability. > > Hmm, I've been studying this patch, Thanks for looking at it! > and I am not completely sure to > agree with this feeling of using fixed-numbered stats, actually, after > reading the whole and seeing the structure of the patch > (FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't > know exactly the number of slots we need to know for the > fixed-numbered stats as MaxBackends may change). Right, that's a new way of dealing with "unknown" number of slots (and it has cons as you mentioned in [1]). > If we make these > kind of stats variable-numbered, does it have to actually involve many > creations or removals of the stats entries, though? One point is that > the number of entries to know about is capped by max_connections, > which is a PGC_POSTMASTER. That's the same kind of control as > replication slots. So one approach would be to reuse entries in the > dshash and use in the hashing key the number in the procarrays. If a > new connection spawns and reuses a slot that was used in the past, > then reset all the existing fields and assign its PID. Yeah, like it's done currently with the "fixed-numbered" stats proposal. That sounds reasonable to me, I'll look at this proposed approach and come back with a new patch version, thanks! > Another thing is the consistency of the data that we'd like to keep at > shutdown. If the connections have a balanced amount of stats shared > among them, doing decision-making based on them is kind of easy. But > that may cause confusion if the activity is unbalanced across the > sessions. We could also not flush them to disk as an option, but it > still seems more useful to me to save this data across restarts if one > takes frequent snapshots of the new system view reporting everything, > so as it is possible to get an idea of the deltas across the snapshots > for each connection slot. The idea that has been implemented so far in this patch is that we still maintain an aggregated version of the stats (visible through pg_stat_io) and that only the aggregated stats are flushed/read to/from disk (means we don't flush the per-backend stats). I think that it makes sense that way. The way I see it is that the per-backend I/O stats is more for current activity instrumentation. So it's not clear to me what would be the benefits of restoring the per-backend stats at startup knowing that: 1) we restored the aggregated stats and 2) the sessions that were responsible for the the restored stats are gone. [1]: https://www.postgresql.org/message-id/Zuz5iQ4AjcuOMx_w%40paquier.xyz Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: