Re: per backend I/O statistics - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: per backend I/O statistics |
Date | |
Msg-id | 20240903.153749.1724225439695895017.horikyota.ntt@gmail.com Whole thread Raw |
Responses |
Re: per backend I/O statistics
Re: per backend I/O statistics |
List | pgsql-hackers |
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in > Hi hackers, > > Please find attached a patch to implement $SUBJECT. > > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new > pg_my_stat_io view to display "my" backend I/O statistics and a new > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given > backend pid. > > By having the per backend level of granularity, one could for example identify > which running backend is responsible for most of the reads, most of the extends > and so on... The pg_my_stat_io view could also be useful to check the > impact on the I/O made by some operations, queries,... in the current session. > > Some remarks: > > - it is split in 2 sub patches: 0001 introducing the necessary changes to provide > the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function. > - the idea of having per backend I/O statistics has already been mentioned in > [1] by Andres. > > Some implementation choices: > > - The KIND_IO stats are still "fixed amount" ones as the maximum number of > backend is fixed. > - The statistics snapshot is made for the global stats (the aggregated ones) and > for my backend stats. The snapshot is not build for all the backend stats (that > could be memory expensive depending on the number of max connections and given > the fact that PgStat_IO is 16KB long). > - The above point means that pg_stat_get_backend_io() behaves as if > stats_fetch_consistency is set to none (each execution re-fetches counters > from shared memory). > - The above 2 points are also the reasons why the pg_my_stat_io view has been > added (as its results takes care of the stats_fetch_consistency setting). I think > that makes sense to rely on it in that case, while I'm not sure that would make > a lot of sense to retrieve other's backend I/O stats and taking care of > stats_fetch_consistency. > > > [1]: https://www.postgresql.org/message-id/20230309003438.rectf7xo7pw5t5cj%40awork3.anarazel.de I'm not sure about the usefulness of having the stats only available from the current session. Since they are stored in shared memory, shouldn't we make them accessible to all backends? However, this would introduce permission considerations and could become complex. When I first looked at this patch, my initial thought was whether we should let these stats stay "fixed." The reason why the current PGSTAT_KIND_IO is fixed is that there is only one global statistics storage for the entire database. If we have stats for a flexible number of backends, it would need to be non-fixed, perhaps with the entry for INVALID_PROC_NUMBER storing the global I/O stats, I suppose. However, one concern with that approach would be the impact on performance due to the frequent creation and deletion of stats entries caused by high turnover of backends. Just to be clear, the above comments are not meant to oppose the current implementation approach. They are purely for the sake of discussing comparisons with other possible approaches. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: