Thread: Re: per backend I/O statistics
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in > Hi hackers, > > Please find attached a patch to implement $SUBJECT. > > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new > pg_my_stat_io view to display "my" backend I/O statistics and a new > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given > backend pid. > > By having the per backend level of granularity, one could for example identify > which running backend is responsible for most of the reads, most of the extends > and so on... The pg_my_stat_io view could also be useful to check the > impact on the I/O made by some operations, queries,... in the current session. > > Some remarks: > > - it is split in 2 sub patches: 0001 introducing the necessary changes to provide > the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function. > - the idea of having per backend I/O statistics has already been mentioned in > [1] by Andres. > > Some implementation choices: > > - The KIND_IO stats are still "fixed amount" ones as the maximum number of > backend is fixed. > - The statistics snapshot is made for the global stats (the aggregated ones) and > for my backend stats. The snapshot is not build for all the backend stats (that > could be memory expensive depending on the number of max connections and given > the fact that PgStat_IO is 16KB long). > - The above point means that pg_stat_get_backend_io() behaves as if > stats_fetch_consistency is set to none (each execution re-fetches counters > from shared memory). > - The above 2 points are also the reasons why the pg_my_stat_io view has been > added (as its results takes care of the stats_fetch_consistency setting). I think > that makes sense to rely on it in that case, while I'm not sure that would make > a lot of sense to retrieve other's backend I/O stats and taking care of > stats_fetch_consistency. > > > [1]: https://www.postgresql.org/message-id/20230309003438.rectf7xo7pw5t5cj%40awork3.anarazel.de I'm not sure about the usefulness of having the stats only available from the current session. Since they are stored in shared memory, shouldn't we make them accessible to all backends? However, this would introduce permission considerations and could become complex. When I first looked at this patch, my initial thought was whether we should let these stats stay "fixed." The reason why the current PGSTAT_KIND_IO is fixed is that there is only one global statistics storage for the entire database. If we have stats for a flexible number of backends, it would need to be non-fixed, perhaps with the entry for INVALID_PROC_NUMBER storing the global I/O stats, I suppose. However, one concern with that approach would be the impact on performance due to the frequent creation and deletion of stats entries caused by high turnover of backends. Just to be clear, the above comments are not meant to oppose the current implementation approach. They are purely for the sake of discussing comparisons with other possible approaches. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > When I first looked at this patch, my initial thought was whether we > should let these stats stay "fixed." The reason why the current > PGSTAT_KIND_IO is fixed is that there is only one global statistics > storage for the entire database. If we have stats for a flexible > number of backends, it would need to be non-fixed, perhaps with the > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > suppose. However, one concern with that approach would be the impact > on performance due to the frequent creation and deletion of stats > entries caused by high turnover of backends. As an additional benefit of this approach, the client can set a connection variable, for example, no_backend_iostats to true, or set its inverse variable to false, to restrict memory usage to only the required backends. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote: > At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in > > Hi hackers, > > > > Please find attached a patch to implement $SUBJECT. > > > > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new > > pg_my_stat_io view to display "my" backend I/O statistics and a new > > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given > > backend pid. > > > > I'm not sure about the usefulness of having the stats only available > from the current session. Since they are stored in shared memory, > shouldn't we make them accessible to all backends? Thanks for the feedback! The stats are accessible to all backends thanks to 0002 and the introduction of the pg_stat_get_backend_io() function. > However, this would > introduce permission considerations and could become complex. Not sure that the data exposed here is sensible enough to consider permission restriction. > When I first looked at this patch, my initial thought was whether we > should let these stats stay "fixed." The reason why the current > PGSTAT_KIND_IO is fixed is that there is only one global statistics > storage for the entire database. If we have stats for a flexible > number of backends, it would need to be non-fixed, perhaps with the > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > suppose. However, one concern with that approach would be the impact > on performance due to the frequent creation and deletion of stats > entries caused by high turnover of backends. > The pros of using the fixed amount are: - less code change (I think as I did not write the non fixed counterpart) - probably better performance and less scalabilty impact (in case of high rate of backends creation/ deletion) Cons is probably allocating shared memory space that might not be used ( sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of unused connection). OTOH, if a high number of connections is not used that might be worth to reduce the max_connections setting. "Conceptually" speaking, we know what the maximum number of backend is, so I think that using the fixed amount approach makes sense (somehow I think it can be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS). > Just to be clear, the above comments are not meant to oppose the > current implementation approach. They are purely for the sake of > discussing comparisons with other possible approaches. No problem at all, thanks for your feedback and sharing your thoughts! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote: > At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > When I first looked at this patch, my initial thought was whether we > > should let these stats stay "fixed." The reason why the current > > PGSTAT_KIND_IO is fixed is that there is only one global statistics > > storage for the entire database. If we have stats for a flexible > > number of backends, it would need to be non-fixed, perhaps with the > > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > > suppose. However, one concern with that approach would be the impact > > on performance due to the frequent creation and deletion of stats > > entries caused by high turnover of backends. > > As an additional benefit of this approach, the client can set a > connection variable, for example, no_backend_iostats to true, or set > its inverse variable to false, to restrict memory usage to only the > required backends. Thanks for the feedback! If we were to add an on/off switch button, I think I'd vote for a global one instead. Indeed, I see this feature more like an "Administrator" one, where the administrator wants to be able to find out which session is reponsible of what (from an I/O point of view): like being able to anwser "which session is generating this massive amount of reads"? If we allow each session to disable the feature then the administrator would lost this ability. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2024-Sep-03, Bertrand Drouvot wrote: > Cons is probably allocating shared memory space that might not be used ( > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of > unused connection). OTOH, if a high number of connections is not used that might > be worth to reduce the max_connections setting. I was surprised by the size you mention so I went to look for the definition of that struct: typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; } PgStat_IO; (It would be good to have more than zero comments here BTW) So what's happening is that struct PgStat_IO stores the data for every single process type, be it regular backends, backend-like auxiliary processes, and all other potential postmaster children. So it seems to me that storing one of these struct for "my backend stats" is wrong: I think you should be using PgStat_BktypeIO instead (or maybe another struct which has a reset timestamp plus BktypeIO, if you care about the reset time). That struct is only 1024 bytes long, so it seems much more reasonable to have one of those per backend. Another way to think about this might be to say that the B_BACKEND element of the PgStat_Snapshot.io array should really be spread out to MaxBackends separate elements. This would make it more difficult to obtain a total value accumulating ops done by all backends (since it's require to sum the values of each backend), but it allows you to obtain backend-specific values, including those of remote backends rather than only your own, as Kyotaro suggests. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hi, On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote: > On 2024-Sep-03, Bertrand Drouvot wrote: > > > Cons is probably allocating shared memory space that might not be used ( > > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of > > unused connection). OTOH, if a high number of connections is not used that might > > be worth to reduce the max_connections setting. > > I was surprised by the size you mention so I went to look for the > definition of that struct: > Thanks for looking at it! > typedef struct PgStat_IO > { > TimestampTz stat_reset_timestamp; > PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; > } PgStat_IO; > > (It would be good to have more than zero comments here BTW) > > So what's happening is that struct PgStat_IO stores the data for every > single process type, be it regular backends, backend-like auxiliary > processes, and all other potential postmaster children. Yeap. > So it seems to > me that storing one of these struct for "my backend stats" is wrong: I > think you should be using PgStat_BktypeIO instead (or maybe another > struct which has a reset timestamp plus BktypeIO, if you care about the > reset time). That struct is only 1024 bytes long, so it seems much more > reasonable to have one of those per backend. Yeah, that could be an area of improvement. Thanks, I'll look at it. Currently the filtering is done when retrieving the per backend stats but better to do it when storing the stats. > Another way to think about this might be to say that the B_BACKEND > element of the PgStat_Snapshot.io array should really be spread out to > MaxBackends separate elements. This would make it more difficult to > obtain a total value accumulating ops done by all backends (since it's > require to sum the values of each backend), but it allows you to obtain > backend-specific values, including those of remote backends rather than > only your own, as Kyotaro suggests. > One backend can already see the stats of the other backends thanks to the pg_stat_get_backend_io() function (that takes a backend pid as parameter) that is introduced in the 0002 sub-patch. I'll ensure that's still the case with the next version of the patch. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote: > The struct size is 1040 Bytes and that's much more reasonable than the size > needed for per backend I/O stats in v1 (about 16KB). One could think that's a high increase of shared memory usage with a high number of connections (due to the fact that it's implemented as "fixed amount" stats based on the max number of backends). So out of curiosity, I did some measurements with: dynamic_shared_memory_type=sysv shared_memory_type=sysv max_connections=20000 On my lab, ipcs -m gives me: - on master key shmid owner perms bytes nattch status 0x00113a04 51347487 postgres 600 1149394944 6 0x4bc9f2fa 51347488 postgres 600 4006976 6 0x46790800 51347489 postgres 600 1048576 2 - with v3 applied key shmid owner perms bytes nattch status 0x00113a04 51347477 postgres 600 1170227200 6 0x08e04b0a 51347478 postgres 600 4006976 6 0x74688c9c 51347479 postgres 600 1048576 2 So, with 20000 max_connections (not advocating that's a reasonable number in all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared memory (expected with 20K max_connections and the new struct size of 1040 Bytes) as compare to master which is 1096 MB. It means that v3 produces about 2% shared memory increase with 20000 max_connections. Out of curiosity with max_connections=100000, then: - on master: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00113a04 52953134 postgres 600 5053915136 6 0x37abf5ce 52953135 postgres 600 20006976 6 0x71c07d5c 52953136 postgres 600 1048576 2 - with v3: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00113a04 52953144 postgres 600 5157945344 6 0x7afb24de 52953145 postgres 600 20006976 6 0x2695af58 52953146 postgres 600 1048576 2 So, with 100000 max_connections (not advocating that's a reasonable number in all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared memory (expected with 100K max_connections and the new struct size of 1040 Bytes) as compare to master which is about 4800 MB. It means that v3 produces about 2% shared memory increase with 100000 max_connections. Then, based on those numbers, I think that the "fixed amount" approach is a good one as 1/ the amount of shared memory increase is "relatively" small and 2/ most probably provides performance benefits as compare to the "non fixed" approach. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, Thanks for working on this! Your patch applies and builds cleanly. On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > - As stated up-thread, the pg_stat_get_backend_io() behaves as if > stats_fetch_consistency is set to none (each execution re-fetches counters > from shared memory). Indeed, the snapshot is not build in each backend to copy > all the others backends stats, as 1/ there is no use case (there is no need to > get others backends I/O statistics while taking care of the stats_fetch_consistency) > and 2/ that could be memory expensive depending of the number of max connections. I believe this is the correct approach. I manually tested your patches, and they work as expected. Here is some feedback: - The stats_reset column is NULL in both pg_my_stat_io and pg_stat_get_backend_io() until the first call to reset io statistics. While I'm not sure if it's necessary, it might be useful to display the more recent of the two times in the stats_reset column: the statistics reset time or the backend creation time. - The pgstat_reset_io_counter_internal() is called in the pgstat_shutdown_hook(). This causes the stats_reset column showing the termination time of the old backend when its proc num is reassigned to a new backend. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote: > > - The pgstat_reset_io_counter_internal() is called in the > > pgstat_shutdown_hook(). This causes the stats_reset column showing the > > termination time of the old backend when its proc num is reassigned to > > a new backend. > > doh! Nice catch, thanks! > > And also new backends that are not re-using a previous "existing" process slot > are getting the last reset time (if any). So the right place to fix this is in > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be > right "conceptually" speaking). Thanks, I confirm that it is fixed. You mentioned some refactoring upthread: On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > - If we agree on the current proposal then I'll do some refactoring in > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid > duplicated code (it's not done yet to ease the review). Could we remove pg_stat_get_my_io() completely and use pg_stat_get_backend_io() with the current backend's pid to get the current backend's stats? If you meant the same thing, please don't mind it. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote: > > > - The pgstat_reset_io_counter_internal() is called in the > > > pgstat_shutdown_hook(). This causes the stats_reset column showing the > > > termination time of the old backend when its proc num is reassigned to > > > a new backend. > > > > doh! Nice catch, thanks! > > > > And also new backends that are not re-using a previous "existing" process slot > > are getting the last reset time (if any). So the right place to fix this is in > > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to > > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be > > right "conceptually" speaking). > > Thanks, I confirm that it is fixed. Thanks! > You mentioned some refactoring upthread: > > On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > - If we agree on the current proposal then I'll do some refactoring in > > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid > > duplicated code (it's not done yet to ease the review). > > Could we remove pg_stat_get_my_io() completely and use > pg_stat_get_backend_io() with the current backend's pid to get the > current backend's stats? The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the statistics snapshot is build for "my backend stats" (means it depends of the stats_fetch_consistency value). I can see use cases for that. OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to none (each execution re-fetches counters from shared memory) (see [2]). Indeed, the snapshot is not build in each backend to copy all the others backends stats, as 1/ I think that there is no use case (there is no need to get others backends I/O statistics while taking care of the stats_fetch_consistency) and 2/ that could be memory expensive depending of the number of max connections. So I think it's better to keep both functions as they behave differently. Thoughts? > If you meant the same thing, please don't > mind it. What I meant to say is to try to remove the duplicate code that we can find in those 3 functions (say creating a new function that would contain the duplicate code and make use of this new function in the 3 others). Did not look at it in details yet. [1]: https://www.postgresql.org/message-id/ZtXR%2BCtkEVVE/LHF%40ip-10-97-1-34.eu-west-3.compute.internal [2]: https://www.postgresql.org/message-id/ZtsZtaRza9bFFeF8%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote: > > Could we remove pg_stat_get_my_io() completely and use > > pg_stat_get_backend_io() with the current backend's pid to get the > > current backend's stats? > > The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the > statistics snapshot is build for "my backend stats" (means it depends of the > stats_fetch_consistency value). I can see use cases for that. > > OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to > none (each execution re-fetches counters from shared memory) (see [2]). Indeed, > the snapshot is not build in each backend to copy all the others backends stats, > as 1/ I think that there is no use case (there is no need to get others backends > I/O statistics while taking care of the stats_fetch_consistency) and 2/ that > could be memory expensive depending of the number of max connections. > > So I think it's better to keep both functions as they behave differently. > > Thoughts? Yes, that is correct. Sorry, you already had explained it and I had agreed with it but I forgot. > > If you meant the same thing, please don't > > mind it. > > What I meant to say is to try to remove the duplicate code that we can find in > those 3 functions (say creating a new function that would contain the duplicate > code and make use of this new function in the 3 others). Did not look at it in > details yet. I got it, thanks for the explanation. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > So I think it's better to keep both functions as they behave differently. > > > > Thoughts? > > Yes, that is correct. Sorry, you already had explained it and I had > agreed with it but I forgot. No problem at all! (I re-explained because I'm not always 100% sure that my explanations are crystal clear ;-) ) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com