Re: per backend I/O statistics - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: per backend I/O statistics |
Date | |
Msg-id | ZyMRJIbUpNPoCXUe@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: per backend I/O statistics (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
List | pgsql-hackers |
Hi, On Tue, Oct 08, 2024 at 04:28:39PM +0000, Bertrand Drouvot wrote: > > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > > > > Okay, per the above and the persistency of the stats. > > Great, I'll work on an updated patch version then. > I spend some time on this during the last 2 days and I think we have 3 design options. === GOALS === But first let's sump up the goals that I think we agreed on: - Keep pg_stat_io as it is today: give the whole server picture and serialize the stats to disk. - Introduce per-backend IO stats and 2 new APIs to: 1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this would take care of the stats_fetch_consistency. 2. Retrieve the IO stats for another backend (through say pg_stat_get_backend_io(pid)) that would _not_ take care of stats_fetch_consistency, as: 2.1/ I think that there is no use case (there is no need to get others backends I/O statistics while taking care of the stats_fetch_consistency) 2.2/ That could be memory expensive to store a snapshot for all the backends (depending of the number of backend created) - There is no need to serialize the per-backend IO stats to disk (no point to see stats for backends that do not exist anymore after a re-start). - The per-backend IO stats should be variable-numbered (not fixed), as per up-thread discussion. === OPTIONS === So, based on this, I think that we could: Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this KIND take care of the aggregated view (pg_stat_io) and the per-backend stats. Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered. Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered and store the "aggregated stats aka pg_stat_io" in shared memory (not part of the variable-numbered hash). Per-backend stats could be aggregated into "pg_stat_io" during the flush_pending_cb call for example. === BEST OPTION? === I would opt for Option 2 as: - The stats system is currently not designed for Option 1 and our goals (for example the shared_data_len is used to serialize but also to fetch the entries, see pgstat_fetch_entry()) so that would need some hack to serialize only a part of them and still be able to fetch them all). - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea (though that might be possible with some hacks, I don't think that would be easy to maintain). - Having the per-backend as "variable" in its dedicated kind looks more reasonable and less error-prone. - I don't think there is a stats design similar to option 3 currently, so I'm not sure there is a need to develop something new while Option 2 could be done. - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization part. - Option 2 seems to offer more flexibility (as compare to Option 1 and 3). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: