Thread: Re: per backend I/O statistics
At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in > Hi hackers, > > Please find attached a patch to implement $SUBJECT. > > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new > pg_my_stat_io view to display "my" backend I/O statistics and a new > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given > backend pid. > > By having the per backend level of granularity, one could for example identify > which running backend is responsible for most of the reads, most of the extends > and so on... The pg_my_stat_io view could also be useful to check the > impact on the I/O made by some operations, queries,... in the current session. > > Some remarks: > > - it is split in 2 sub patches: 0001 introducing the necessary changes to provide > the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function. > - the idea of having per backend I/O statistics has already been mentioned in > [1] by Andres. > > Some implementation choices: > > - The KIND_IO stats are still "fixed amount" ones as the maximum number of > backend is fixed. > - The statistics snapshot is made for the global stats (the aggregated ones) and > for my backend stats. The snapshot is not build for all the backend stats (that > could be memory expensive depending on the number of max connections and given > the fact that PgStat_IO is 16KB long). > - The above point means that pg_stat_get_backend_io() behaves as if > stats_fetch_consistency is set to none (each execution re-fetches counters > from shared memory). > - The above 2 points are also the reasons why the pg_my_stat_io view has been > added (as its results takes care of the stats_fetch_consistency setting). I think > that makes sense to rely on it in that case, while I'm not sure that would make > a lot of sense to retrieve other's backend I/O stats and taking care of > stats_fetch_consistency. > > > [1]: https://www.postgresql.org/message-id/20230309003438.rectf7xo7pw5t5cj%40awork3.anarazel.de I'm not sure about the usefulness of having the stats only available from the current session. Since they are stored in shared memory, shouldn't we make them accessible to all backends? However, this would introduce permission considerations and could become complex. When I first looked at this patch, my initial thought was whether we should let these stats stay "fixed." The reason why the current PGSTAT_KIND_IO is fixed is that there is only one global statistics storage for the entire database. If we have stats for a flexible number of backends, it would need to be non-fixed, perhaps with the entry for INVALID_PROC_NUMBER storing the global I/O stats, I suppose. However, one concern with that approach would be the impact on performance due to the frequent creation and deletion of stats entries caused by high turnover of backends. Just to be clear, the above comments are not meant to oppose the current implementation approach. They are purely for the sake of discussing comparisons with other possible approaches. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > When I first looked at this patch, my initial thought was whether we > should let these stats stay "fixed." The reason why the current > PGSTAT_KIND_IO is fixed is that there is only one global statistics > storage for the entire database. If we have stats for a flexible > number of backends, it would need to be non-fixed, perhaps with the > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > suppose. However, one concern with that approach would be the impact > on performance due to the frequent creation and deletion of stats > entries caused by high turnover of backends. As an additional benefit of this approach, the client can set a connection variable, for example, no_backend_iostats to true, or set its inverse variable to false, to restrict memory usage to only the required backends. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote: > At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in > > Hi hackers, > > > > Please find attached a patch to implement $SUBJECT. > > > > While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new > > pg_my_stat_io view to display "my" backend I/O statistics and a new > > pg_stat_get_backend_io() function to retrieve the I/O statistics for a given > > backend pid. > > > > I'm not sure about the usefulness of having the stats only available > from the current session. Since they are stored in shared memory, > shouldn't we make them accessible to all backends? Thanks for the feedback! The stats are accessible to all backends thanks to 0002 and the introduction of the pg_stat_get_backend_io() function. > However, this would > introduce permission considerations and could become complex. Not sure that the data exposed here is sensible enough to consider permission restriction. > When I first looked at this patch, my initial thought was whether we > should let these stats stay "fixed." The reason why the current > PGSTAT_KIND_IO is fixed is that there is only one global statistics > storage for the entire database. If we have stats for a flexible > number of backends, it would need to be non-fixed, perhaps with the > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > suppose. However, one concern with that approach would be the impact > on performance due to the frequent creation and deletion of stats > entries caused by high turnover of backends. > The pros of using the fixed amount are: - less code change (I think as I did not write the non fixed counterpart) - probably better performance and less scalabilty impact (in case of high rate of backends creation/ deletion) Cons is probably allocating shared memory space that might not be used ( sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of unused connection). OTOH, if a high number of connections is not used that might be worth to reduce the max_connections setting. "Conceptually" speaking, we know what the maximum number of backend is, so I think that using the fixed amount approach makes sense (somehow I think it can be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS). > Just to be clear, the above comments are not meant to oppose the > current implementation approach. They are purely for the sake of > discussing comparisons with other possible approaches. No problem at all, thanks for your feedback and sharing your thoughts! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote: > At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > When I first looked at this patch, my initial thought was whether we > > should let these stats stay "fixed." The reason why the current > > PGSTAT_KIND_IO is fixed is that there is only one global statistics > > storage for the entire database. If we have stats for a flexible > > number of backends, it would need to be non-fixed, perhaps with the > > entry for INVALID_PROC_NUMBER storing the global I/O stats, I > > suppose. However, one concern with that approach would be the impact > > on performance due to the frequent creation and deletion of stats > > entries caused by high turnover of backends. > > As an additional benefit of this approach, the client can set a > connection variable, for example, no_backend_iostats to true, or set > its inverse variable to false, to restrict memory usage to only the > required backends. Thanks for the feedback! If we were to add an on/off switch button, I think I'd vote for a global one instead. Indeed, I see this feature more like an "Administrator" one, where the administrator wants to be able to find out which session is reponsible of what (from an I/O point of view): like being able to anwser "which session is generating this massive amount of reads"? If we allow each session to disable the feature then the administrator would lost this ability. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2024-Sep-03, Bertrand Drouvot wrote: > Cons is probably allocating shared memory space that might not be used ( > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of > unused connection). OTOH, if a high number of connections is not used that might > be worth to reduce the max_connections setting. I was surprised by the size you mention so I went to look for the definition of that struct: typedef struct PgStat_IO { TimestampTz stat_reset_timestamp; PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; } PgStat_IO; (It would be good to have more than zero comments here BTW) So what's happening is that struct PgStat_IO stores the data for every single process type, be it regular backends, backend-like auxiliary processes, and all other potential postmaster children. So it seems to me that storing one of these struct for "my backend stats" is wrong: I think you should be using PgStat_BktypeIO instead (or maybe another struct which has a reset timestamp plus BktypeIO, if you care about the reset time). That struct is only 1024 bytes long, so it seems much more reasonable to have one of those per backend. Another way to think about this might be to say that the B_BACKEND element of the PgStat_Snapshot.io array should really be spread out to MaxBackends separate elements. This would make it more difficult to obtain a total value accumulating ops done by all backends (since it's require to sum the values of each backend), but it allows you to obtain backend-specific values, including those of remote backends rather than only your own, as Kyotaro suggests. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Hi, On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote: > On 2024-Sep-03, Bertrand Drouvot wrote: > > > Cons is probably allocating shared memory space that might not be used ( > > sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of > > unused connection). OTOH, if a high number of connections is not used that might > > be worth to reduce the max_connections setting. > > I was surprised by the size you mention so I went to look for the > definition of that struct: > Thanks for looking at it! > typedef struct PgStat_IO > { > TimestampTz stat_reset_timestamp; > PgStat_BktypeIO stats[BACKEND_NUM_TYPES]; > } PgStat_IO; > > (It would be good to have more than zero comments here BTW) > > So what's happening is that struct PgStat_IO stores the data for every > single process type, be it regular backends, backend-like auxiliary > processes, and all other potential postmaster children. Yeap. > So it seems to > me that storing one of these struct for "my backend stats" is wrong: I > think you should be using PgStat_BktypeIO instead (or maybe another > struct which has a reset timestamp plus BktypeIO, if you care about the > reset time). That struct is only 1024 bytes long, so it seems much more > reasonable to have one of those per backend. Yeah, that could be an area of improvement. Thanks, I'll look at it. Currently the filtering is done when retrieving the per backend stats but better to do it when storing the stats. > Another way to think about this might be to say that the B_BACKEND > element of the PgStat_Snapshot.io array should really be spread out to > MaxBackends separate elements. This would make it more difficult to > obtain a total value accumulating ops done by all backends (since it's > require to sum the values of each backend), but it allows you to obtain > backend-specific values, including those of remote backends rather than > only your own, as Kyotaro suggests. > One backend can already see the stats of the other backends thanks to the pg_stat_get_backend_io() function (that takes a backend pid as parameter) that is introduced in the 0002 sub-patch. I'll ensure that's still the case with the next version of the patch. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote: > The struct size is 1040 Bytes and that's much more reasonable than the size > needed for per backend I/O stats in v1 (about 16KB). One could think that's a high increase of shared memory usage with a high number of connections (due to the fact that it's implemented as "fixed amount" stats based on the max number of backends). So out of curiosity, I did some measurements with: dynamic_shared_memory_type=sysv shared_memory_type=sysv max_connections=20000 On my lab, ipcs -m gives me: - on master key shmid owner perms bytes nattch status 0x00113a04 51347487 postgres 600 1149394944 6 0x4bc9f2fa 51347488 postgres 600 4006976 6 0x46790800 51347489 postgres 600 1048576 2 - with v3 applied key shmid owner perms bytes nattch status 0x00113a04 51347477 postgres 600 1170227200 6 0x08e04b0a 51347478 postgres 600 4006976 6 0x74688c9c 51347479 postgres 600 1048576 2 So, with 20000 max_connections (not advocating that's a reasonable number in all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared memory (expected with 20K max_connections and the new struct size of 1040 Bytes) as compare to master which is 1096 MB. It means that v3 produces about 2% shared memory increase with 20000 max_connections. Out of curiosity with max_connections=100000, then: - on master: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00113a04 52953134 postgres 600 5053915136 6 0x37abf5ce 52953135 postgres 600 20006976 6 0x71c07d5c 52953136 postgres 600 1048576 2 - with v3: ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00113a04 52953144 postgres 600 5157945344 6 0x7afb24de 52953145 postgres 600 20006976 6 0x2695af58 52953146 postgres 600 1048576 2 So, with 100000 max_connections (not advocating that's a reasonable number in all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared memory (expected with 100K max_connections and the new struct size of 1040 Bytes) as compare to master which is about 4800 MB. It means that v3 produces about 2% shared memory increase with 100000 max_connections. Then, based on those numbers, I think that the "fixed amount" approach is a good one as 1/ the amount of shared memory increase is "relatively" small and 2/ most probably provides performance benefits as compare to the "non fixed" approach. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, Thanks for working on this! Your patch applies and builds cleanly. On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > - As stated up-thread, the pg_stat_get_backend_io() behaves as if > stats_fetch_consistency is set to none (each execution re-fetches counters > from shared memory). Indeed, the snapshot is not build in each backend to copy > all the others backends stats, as 1/ there is no use case (there is no need to > get others backends I/O statistics while taking care of the stats_fetch_consistency) > and 2/ that could be memory expensive depending of the number of max connections. I believe this is the correct approach. I manually tested your patches, and they work as expected. Here is some feedback: - The stats_reset column is NULL in both pg_my_stat_io and pg_stat_get_backend_io() until the first call to reset io statistics. While I'm not sure if it's necessary, it might be useful to display the more recent of the two times in the stats_reset column: the statistics reset time or the backend creation time. - The pgstat_reset_io_counter_internal() is called in the pgstat_shutdown_hook(). This causes the stats_reset column showing the termination time of the old backend when its proc num is reassigned to a new backend. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote: > > - The pgstat_reset_io_counter_internal() is called in the > > pgstat_shutdown_hook(). This causes the stats_reset column showing the > > termination time of the old backend when its proc num is reassigned to > > a new backend. > > doh! Nice catch, thanks! > > And also new backends that are not re-using a previous "existing" process slot > are getting the last reset time (if any). So the right place to fix this is in > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be > right "conceptually" speaking). Thanks, I confirm that it is fixed. You mentioned some refactoring upthread: On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > - If we agree on the current proposal then I'll do some refactoring in > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid > duplicated code (it's not done yet to ease the review). Could we remove pg_stat_get_my_io() completely and use pg_stat_get_backend_io() with the current backend's pid to get the current backend's stats? If you meant the same thing, please don't mind it. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote: > > > - The pgstat_reset_io_counter_internal() is called in the > > > pgstat_shutdown_hook(). This causes the stats_reset column showing the > > > termination time of the old backend when its proc num is reassigned to > > > a new backend. > > > > doh! Nice catch, thanks! > > > > And also new backends that are not re-using a previous "existing" process slot > > are getting the last reset time (if any). So the right place to fix this is in > > pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to > > 0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be > > right "conceptually" speaking). > > Thanks, I confirm that it is fixed. Thanks! > You mentioned some refactoring upthread: > > On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > > - If we agree on the current proposal then I'll do some refactoring in > > pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid > > duplicated code (it's not done yet to ease the review). > > Could we remove pg_stat_get_my_io() completely and use > pg_stat_get_backend_io() with the current backend's pid to get the > current backend's stats? The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the statistics snapshot is build for "my backend stats" (means it depends of the stats_fetch_consistency value). I can see use cases for that. OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to none (each execution re-fetches counters from shared memory) (see [2]). Indeed, the snapshot is not build in each backend to copy all the others backends stats, as 1/ I think that there is no use case (there is no need to get others backends I/O statistics while taking care of the stats_fetch_consistency) and 2/ that could be memory expensive depending of the number of max connections. So I think it's better to keep both functions as they behave differently. Thoughts? > If you meant the same thing, please don't > mind it. What I meant to say is to try to remove the duplicate code that we can find in those 3 functions (say creating a new function that would contain the duplicate code and make use of this new function in the 3 others). Did not look at it in details yet. [1]: https://www.postgresql.org/message-id/ZtXR%2BCtkEVVE/LHF%40ip-10-97-1-34.eu-west-3.compute.internal [2]: https://www.postgresql.org/message-id/ZtsZtaRza9bFFeF8%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote: > > Could we remove pg_stat_get_my_io() completely and use > > pg_stat_get_backend_io() with the current backend's pid to get the > > current backend's stats? > > The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the > statistics snapshot is build for "my backend stats" (means it depends of the > stats_fetch_consistency value). I can see use cases for that. > > OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to > none (each execution re-fetches counters from shared memory) (see [2]). Indeed, > the snapshot is not build in each backend to copy all the others backends stats, > as 1/ I think that there is no use case (there is no need to get others backends > I/O statistics while taking care of the stats_fetch_consistency) and 2/ that > could be memory expensive depending of the number of max connections. > > So I think it's better to keep both functions as they behave differently. > > Thoughts? Yes, that is correct. Sorry, you already had explained it and I had agreed with it but I forgot. > > If you meant the same thing, please don't > > mind it. > > What I meant to say is to try to remove the duplicate code that we can find in > those 3 functions (say creating a new function that would contain the duplicate > code and make use of this new function in the 3 others). Did not look at it in > details yet. I got it, thanks for the explanation. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > So I think it's better to keep both functions as they behave differently. > > > > Thoughts? > > Yes, that is correct. Sorry, you already had explained it and I had > agreed with it but I forgot. No problem at all! (I re-explained because I'm not always 100% sure that my explanations are crystal clear ;-) ) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote: > On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote: > > On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote: > >> As an additional benefit of this approach, the client can set a > >> connection variable, for example, no_backend_iostats to true, or set > >> its inverse variable to false, to restrict memory usage to only the > >> required backends. > > > > Thanks for the feedback! > > > > If we were to add an on/off switch button, I think I'd vote for a global one > > instead. Indeed, I see this feature more like an "Administrator" one, where > > the administrator wants to be able to find out which session is reponsible of > > what (from an I/O point of view): like being able to anwser "which session is > > generating this massive amount of reads"? > > > > If we allow each session to disable the feature then the administrator > > would lost this ability. > > Hmm, I've been studying this patch, Thanks for looking at it! > and I am not completely sure to > agree with this feeling of using fixed-numbered stats, actually, after > reading the whole and seeing the structure of the patch > (FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't > know exactly the number of slots we need to know for the > fixed-numbered stats as MaxBackends may change). Right, that's a new way of dealing with "unknown" number of slots (and it has cons as you mentioned in [1]). > If we make these > kind of stats variable-numbered, does it have to actually involve many > creations or removals of the stats entries, though? One point is that > the number of entries to know about is capped by max_connections, > which is a PGC_POSTMASTER. That's the same kind of control as > replication slots. So one approach would be to reuse entries in the > dshash and use in the hashing key the number in the procarrays. If a > new connection spawns and reuses a slot that was used in the past, > then reset all the existing fields and assign its PID. Yeah, like it's done currently with the "fixed-numbered" stats proposal. That sounds reasonable to me, I'll look at this proposed approach and come back with a new patch version, thanks! > Another thing is the consistency of the data that we'd like to keep at > shutdown. If the connections have a balanced amount of stats shared > among them, doing decision-making based on them is kind of easy. But > that may cause confusion if the activity is unbalanced across the > sessions. We could also not flush them to disk as an option, but it > still seems more useful to me to save this data across restarts if one > takes frequent snapshots of the new system view reporting everything, > so as it is possible to get an idea of the deltas across the snapshots > for each connection slot. The idea that has been implemented so far in this patch is that we still maintain an aggregated version of the stats (visible through pg_stat_io) and that only the aggregated stats are flushed/read to/from disk (means we don't flush the per-backend stats). I think that it makes sense that way. The way I see it is that the per-backend I/O stats is more for current activity instrumentation. So it's not clear to me what would be the benefits of restoring the per-backend stats at startup knowing that: 1) we restored the aggregated stats and 2) the sessions that were responsible for the the restored stats are gone. [1]: https://www.postgresql.org/message-id/Zuz5iQ4AjcuOMx_w%40paquier.xyz Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote: > > No problem at all! (I re-explained because I'm not always 100% sure that my > > explanations are crystal clear ;-) ) > > We've discussed a bit this patch offline, but after studying the patch > I doubt that this part is a good idea: > > + /* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */ > + PgStatShared_IO io; > } PgStat_ShmemControl; > > We are going to be in trouble if we introduce a second member in this > routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl > relies on the fact that all its members after deterministic offset > positions in this structure. Agree that it would be an issue should we have to add a new FLEXIBLE_ARRAY_MEMBER. > So this lacks flexibility. This choice > is caused by the fact that we don't exactly know the number of > backends because that's controlled by the PGC_POSTMASTER GUC > max_connections so the size of the structure would be undefined. Right. > There is a parallel with replication slot statistics here, where we > save the replication slot data in the dshash based on their index > number in shmem. Hence, wouldn't it be better to do like replication > slot stats, where we use the dshash and a key based on the procnum of > each backend or auxiliary process (ProcNumber in procnumber.h)? If at > restart max_connections is lower than what was previously used, we > could discard entries that would not fit anymore into the charts. > This is probably not something that happens often, so I'm not really > worried about forcing the removal of these stats depending on how the > upper-bound of ProcNumber evolves. Yeah, I'll look at implementing the dshash based on their procnum and see where it goes. > So, using a new kind of ID and making this kind variable-numbered may > ease the implementation quite a bit, while avoiding any extensibility > issues with the shmem portion of the patch if these are > fixed-numbered. Agree. > This would rely on the fact that we would use the ProcNumber for the > dshash key, and this information is not provided in pg_stat_activity. > Perhaps we should add this information in pg_stat_activity so as it > would be easily possible to do joins with a SQL function that returns > a SRF with all the stats associated with a given connection slot > (auxiliary or backend process)? I'm not sure that's needed. What has been done in the previous versions is to get the stats based on the pid (see pg_stat_get_backend_io()) where the procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)). Do you see an issue with that approach? > The active PIDs of the live sessions are not stored in the active > stats, why not? That was not needed. We can still retrieve the stats based on the pid thanks to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having to actually store the pid in the stats. I think that's fine because the pid only matters at "display" time (pg_stat_get_backend_io()). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Oct 08, 2024 at 01:46:23PM +0900, Michael Paquier wrote: > On Mon, Oct 07, 2024 at 09:54:21AM +0000, Bertrand Drouvot wrote: > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > >> This would rely on the fact that we would use the ProcNumber for the > >> dshash key, and this information is not provided in pg_stat_activity. > >> Perhaps we should add this information in pg_stat_activity so as it > >> would be easily possible to do joins with a SQL function that returns > >> a SRF with all the stats associated with a given connection slot > >> (auxiliary or backend process)? > > > > I'm not sure that's needed. What has been done in the previous versions is > > to get the stats based on the pid (see pg_stat_get_backend_io()) where the > > procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)). > > Ah, I see. So you could just have the proc number in the key to > control the upper-bound on the number of possible stats entries in the > dshash. Yes. > Assuming that none of this data is persisted to the stats file at > shutdown and that the stats of a single entry are reset each time a > new backend reuses a previous proc slot, that would be OK by me, I > guess. > > >> The active PIDs of the live sessions are not stored in the active > >> stats, why not? > > > > That was not needed. We can still retrieve the stats based on the pid thanks > > to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having > > to actually store the pid in the stats. I think that's fine because the pid > > only matters at "display" time (pg_stat_get_backend_io()). > > Okay, per the above and the persistency of the stats. Great, I'll work on an updated patch version then. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Oct 08, 2024 at 04:28:39PM +0000, Bertrand Drouvot wrote: > > > On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote: > > > > Okay, per the above and the persistency of the stats. > > Great, I'll work on an updated patch version then. > I spend some time on this during the last 2 days and I think we have 3 design options. === GOALS === But first let's sump up the goals that I think we agreed on: - Keep pg_stat_io as it is today: give the whole server picture and serialize the stats to disk. - Introduce per-backend IO stats and 2 new APIs to: 1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this would take care of the stats_fetch_consistency. 2. Retrieve the IO stats for another backend (through say pg_stat_get_backend_io(pid)) that would _not_ take care of stats_fetch_consistency, as: 2.1/ I think that there is no use case (there is no need to get others backends I/O statistics while taking care of the stats_fetch_consistency) 2.2/ That could be memory expensive to store a snapshot for all the backends (depending of the number of backend created) - There is no need to serialize the per-backend IO stats to disk (no point to see stats for backends that do not exist anymore after a re-start). - The per-backend IO stats should be variable-numbered (not fixed), as per up-thread discussion. === OPTIONS === So, based on this, I think that we could: Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this KIND take care of the aggregated view (pg_stat_io) and the per-backend stats. Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered. Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that would be variable-numbered and store the "aggregated stats aka pg_stat_io" in shared memory (not part of the variable-numbered hash). Per-backend stats could be aggregated into "pg_stat_io" during the flush_pending_cb call for example. === BEST OPTION? === I would opt for Option 2 as: - The stats system is currently not designed for Option 1 and our goals (for example the shared_data_len is used to serialize but also to fetch the entries, see pgstat_fetch_entry()) so that would need some hack to serialize only a part of them and still be able to fetch them all). - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea (though that might be possible with some hacks, I don't think that would be easy to maintain). - Having the per-backend as "variable" in its dedicated kind looks more reasonable and less error-prone. - I don't think there is a stats design similar to option 3 currently, so I'm not sure there is a need to develop something new while Option 2 could be done. - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization part. - Option 2 seems to offer more flexibility (as compare to Option 1 and 3). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Oct 31, 2024 at 05:09:56AM +0000, Bertrand Drouvot wrote: > === OPTIONS === > > So, based on this, I think that we could: > > Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this > KIND take care of the aggregated view (pg_stat_io) and the per-backend stats. > > Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO > that would be variable-numbered. > > Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that > would be variable-numbered and store the "aggregated stats aka pg_stat_io" in > shared memory (not part of the variable-numbered hash). Per-backend stats > could be aggregated into "pg_stat_io" during the flush_pending_cb call for example. > > === BEST OPTION? === > > I would opt for Option 2 as: > > - The stats system is currently not designed for Option 1 and our goals (for > example the shared_data_len is used to serialize but also to fetch the entries, > see pgstat_fetch_entry()) so that would need some hack to serialize only a part > of them and still be able to fetch them all). > > - Mixing "fixed" and "variable" in the same KIND does not sound like a good idea > (though that might be possible with some hacks, I don't think that would be > easy to maintain). > > - Having the per-backend as "variable" in its dedicated kind looks more reasonable > and less error-prone. > > - I don't think there is a stats design similar to option 3 currently, so I'm > not sure there is a need to develop something new while Option 2 could be done. > > - Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization > part. > > - Option 2 seems to offer more flexibility (as compare to Option 1 and 3). > > Thoughts? And why not add more per-backend stats in the future? (once the I/O part is done). I think that's one more reason to go with option 2 (and implementing a brand new PGSTAT_KIND_BACKEND kind). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, Nov 04, 2024 at 10:01:50AM +0000, Bertrand Drouvot wrote: > And why not add more per-backend stats in the future? (once the I/O part is done). > > I think that's one more reason to go with option 2 (and implementing a brand new > PGSTAT_KIND_BACKEND kind). I'm starting working on option 2, I think it will be easier to discuss with a patch proposal to look at. If in the meantime, one strongly disagree with option 2 (means implement a brand new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Nov 06, 2024 at 08:39:07AM +0900, Michael Paquier wrote: > On Tue, Nov 05, 2024 at 05:37:15PM +0000, Bertrand Drouvot wrote: > > I'm starting working on option 2, I think it will be easier to discuss with > > a patch proposal to look at. > > > > If in the meantime, one strongly disagree with option 2 (means implement a brand > > new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know. > > Sorry for the late reply, catching up a bit. No problem at all, thanks for looking at it! > As you are quoting in [1], you do not expect the backend-io stats and > the more global pg_stat_io to achieve the same level of consistency as > the backend stats would be gone at restart, and wiped out when a > backend shuts down. Yes. > So, splitting them with a different stats kind > feels more natural because it would be possible to control how each > stat kind behaves depending on the code shutdown and reset paths > within their own callbacks rather than making the callbacks of > PGSTAT_KIND_IO more complex than they already are. Yeah, thanks for sharing your thoughts. > And pg_stat_io is > a fixed-numbered stats kind because of the way it aggregates its stats > with a number states defined at compile-time. > > Is the structure you have in mind different than PgStat_BktypeIO? Very close. > Perhaps a split is better anyway with that in mind. The in-progress patch (not shared yet) is using the following: " typedef struct PgStat_Backend { TimestampTz stat_reset_timestamp; BackendType bktype; PgStat_BktypeIO stats; } PgStat_Backend; " The bktype is used to be able to filter the stats correctly when we display them. > The amount of memory required to store the snapshots of backend-IO > does not worry me much, TBH, but you are worried about a high turnover > of connections that could cause a lot of bloat in the backend-IO > snapshots because of the persistency that these stats would have, > right? Not only a high turnover but also a high number of entries created in the hash. Furthermore I don't see any use case of relying on stats_fetch_consistency while querying other backend's stats. > If possible, supporting snapshots would be > more consistent with the other stats. I have I mind to support the snapshots _only_ when querying our own stats. I can measure the memory impact if we use them also when querying other backends stats too (though I don't see a use case). > Just to be clear, I am not in favor of making PgStat_HashKey larger > than it already is. That's not needed, the patch I'm working on stores the proc number in the objid field of the key. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Nov 07, 2024 at 09:50:59AM +0900, Michael Paquier wrote: > On Wed, Nov 06, 2024 at 01:51:02PM +0000, Bertrand Drouvot wrote: > > That's not needed, the patch I'm working on stores the proc number in the > > objid field of the key. > > Relying on the procnumber for the object ID gets a +1 here. Thanks! > That > provides an automatic cap on the maximum number of entries that can > exist at once for this new stats kind. Yeah. POC patch is now working, will wear a reviewer hat now and will share in one or 2 days. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote: > On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote: > > 1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore > > with this new stat kind. So, adding an extra bool as parameter to take care of it. > > Why? I have to admit that the addition of this argument at this level > specific to a new stats kind feels really weird and it looks like a > layer violation, while this happens only when resetting the whole > pgstats state because we have loaded a portion of the stats but the > file loaded was in such a state that we don't have a consistent > picture. Yes. I agree that looks weird and I don't like it that much. But the assertion is not true anymore. If you: - change the arguments to false in the pgstat_drop_entry_internal() call in pgstat_drop_all_entries() - start the engine - kill -9 postgres - restart the engine You'll see the assert failing due to the startup process. I don't think it is doing something wrong though, just populating its backend stats. Do you have another view on it? > > 2. a new struct "PgStat_Backend" is created and does contain the backend type. > > The backend type is used for filtering purpose when the stats are displayed. > > Is that necessary? We can guess that from the PID with a join in > pg_stat_activity. pg_stat_get_backend_io() from 0004 returns a set of > tuples for a specific PID, as well.. How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"? Do you mean copy/paste part of the code from pg_stat_get_activity() into pg_stat_get_backend_io() to get the backend type? That sounds like an idea, I'll have a look at it. > + /* save the bktype */ > + if (kind == PGSTAT_KIND_PER_BACKEND) > + bktype = ((PgStatShared_Backend *) header)->stats.bktype; > [...] > + /* restore the bktype */ > + if (kind == PGSTAT_KIND_PER_BACKEND) > + ((PgStatShared_Backend *) header)->stats.bktype = bktype; > > Including the backend type results in these blips in > shared_stat_reset_contents() which should not have anything related > to stats kinds and should remain neutral, as well. Yeah, we can simply get rid of it if we remove the backend type in PgStat_Backend. > pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I > guess that it should be static in this file rather than declared in > pgstat.h? Good catch, thanks! > +typedef struct PgStat_PendingIO > > Perhaps this part should use a separate structure named > "BackendPendingIO"? The definition of the structure has to be in > pgstat.h as this is the pending_size of the new stats kind. It looks > like it would be cleaner to keep PgStat_PendingIO local to > pgstat_io.c, and define PgStat_PendingIO based on > PgStat_BackendPendingIO? I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h? > + /* > + * Do serialize or not this kind of stats. > + */ > + bool to_serialize:1; > > Not sure that "serialize" is the best term that applies here. For > pgstats entries, serialization refers to the matter of writing their > entries with a "serialized" name because they have an undefined number > when stored locally after a reload. I'd suggest to split this concept > into its own patch, rename the flag as "write_to_file" (or what you > think is suited), and also apply the flag in the fixed-numbered loop > done in pgstat_write_statsfile() before going through the dshash. Makes sense to create its own patch, will have a look. > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_stat_reset_single_backend_io_counters</primary> > + </indexterm> > + <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> ) > + <returnvalue>void</returnvalue> > > This should document that the input argument is a PID. Yeap, will add. > > Is pg_my_stat_io() the best name ever? Not 100% sure, but there is already "pg_my_temp_schema" so I thought that pg_my_stat_io would not be that bad. Open to suggestions though. > I'd suggest to just merge 0004 with 0001. Sure. > Structurally, it may be cleaner to keep all the callbacks and the > backend-I/O specific logic into a separate file, perhaps > pgstat_io_backend.c or pgstat_backend_io? Yeah, I was wondering the same. What about pgstat_backend.c (that would contain only one "section" dedicated to IO currently)? > > ==== 0002 > > > > Merge both IO stats flush callback. There is no need to keep both callbacks. > > > > Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES, > > IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES. > > Not sure to be a fan of that, TBH, still I get the performance > argument of the matter. Each stats kind has its own flush path, and > this assumes that we'll never going to diverge in terms of the > information maintained for each backend and the existing pg_stat_io. > Perhaps there could be a divergence at some point? Yeah perhaps, so in case of divergence let's split "again"? I mean for the moment I don't see any reason to keep both and we have pros related to efficiency during flush. > > === Remarks > > > > R1. The key field is still "objid" even if it stores the proc number. I think > > that's fine and there is no need to rename it as it's related comment states: > > Let's not change the object ID and the hash key. The approach you are > using here with a variable-numbered stats kinds and a lookup with the > procnumber is sensible here. Agree, thanks for confirming. > > R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged ( > > duplicated code). That's not mandatory to provide the new per backend I/O stats > > feature. So let's keep it as future work to ease the review. > > Not sure about that. I don't have a strong opinion about this one, so I'm ok to not merge those. > > R3. There is no use case of retrieving other backend's IO stats with > > stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to > > save memory (that could be non negligeable as pgstat_build_snapshot() would add > > _all_ the other backend's stats in the snapshot). I choose to document it and to > > not return any rows: I think that's better than returning some data (that would > > not "ensure" the snapshot consistency) or an error. > > I'm pretty sure that there is a sensible argument about being able to > get a snapshot of the I/O stat of another backend and consider that a > feature. For example, keeping a look at the activity of the > checkpointer while doing a scan at a certain point in time? hm, not sure how the stats_fetch_consistency set to 'snapshot' could help here: - the checkpointer could write stuff for reason completly unrelated to "your" backend - if the idea is to 1. record checkpointer stats, 2. do stuff in the backend and 3. check again the checkpointer stats then I don't think setting stats_fetch_consistency to snapshot is needed at all. In fact it's quite the opposite as 1. and 3. would report the same values with stats_fetch_consistency set to 'snapshot' (if done in the same transaction). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Nov 19, 2024 at 10:47:53AM +0900, Michael Paquier wrote: > On Thu, Nov 14, 2024 at 01:30:11PM +0000, Bertrand Drouvot wrote: > > - change the arguments to false in the pgstat_drop_entry_internal() call in > > pgstat_drop_all_entries() > > - start the engine > > - kill -9 postgres > > - restart the engine > > > > You'll see the assert failing due to the startup process. I don't think it is > > doing something wrong though, just populating its backend stats. Do you have > > another view on it? > > It feels sad that we have to plug in the internals at this level for > this particular case. Perhaps there is something to do with more > callbacks. Or perhaps there is just no point in tracking the stats of > auxiliary processes because we already have this data in the existing > pg_stat_io already? We can not discard the "per backend" stats collection for auxiliary processes because those are the source for pg_stat_io too (once the 2 flush callbacks are merged). I have another idea: after a bit more of investigation it turns out that only the startup process is generating the failed assertion. So, for the startup process only, what about? - don't call pgstat_create_backend_stat() in pgstat_beinit()... - but call it in StartupXLOG() instead (after the stats are discarded or restored). That way we could get rid of the changes linked to the assertion and still handle the startup process particular case. Thoughts? > > How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"? > > > > Do you mean copy/paste part of the code from pg_stat_get_activity() into > > pg_stat_get_backend_io() to get the backend type? That sounds like an idea, > > I'll have a look at it. > > I was under the impression that we should keep PgStat_Backend for the > reset_timestamp part, but drop BackendType as it can be guessed from > pg_stat_activity itself. Removing BackendType sounds doable, I'll look at it. > >> Structurally, it may be cleaner to keep all the callbacks and the > >> backend-I/O specific logic into a separate file, perhaps > >> pgstat_io_backend.c or pgstat_backend_io? > > > > Yeah, I was wondering the same. What about pgstat_backend.c (that would contain > > only one "section" dedicated to IO currently)? > > pgstat_backend.c looks good to me. Could there be other stats than > just IO, actually? Perhaps not, just asking.. Yeah that's the reason why I suggested "pgstat_backend.c". I would not be surprised if we add more "per backend" stats (that are not I/O related) in the future as the main machinery would be there. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote: > + /* > + * Do serialize or not this kind of stats. > + */ > + bool to_serialize:1; > > Not sure that "serialize" is the best term that applies here. For > pgstats entries, serialization refers to the matter of writing their > entries with a "serialized" name because they have an undefined number > when stored locally after a reload. I'd suggest to split this concept > into its own patch, rename the flag as "write_to_file" Yeah, also this could useful for custom statistics. So I created a dedicated thread and a patch proposal (see [1]). [1]: https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Nov 20, 2024 at 09:01:26AM +0900, Michael Paquier wrote: > On Tue, Nov 19, 2024 at 04:28:55PM +0000, Bertrand Drouvot wrote: > > So, for the startup process only, what about? > > > > - don't call pgstat_create_backend_stat() in pgstat_beinit()... > > - but call it in StartupXLOG() instead (after the stats are discarded or restored). > > > > That way we could get rid of the changes linked to the assertion and still handle > > the startup process particular case. Thoughts? > > Hmm. That may prove to be a good idea in the long-term. The startup > process is a specific path kicked in at a very early stage, so it is > also a bit weird that we'd try to insert statistics while we are going > to reset them anyway a bit later. Exactly. > That may also be relevant for > custom statistics, actually, especially if some calls happen in some > hook paths taken before the reset is done. This could happen for > injection point stats when loading it with shared_preload_libraries, > actually, if you load, attach or run a callback at this early stage. Right. I did not had in mind to go that far here (for the per backend stats needs). My idea was "just" to move the new pgstat_create_backend_stat() (which is related to per backend stats only) call at the right place in StartupXLOG() for the startup process only. As that's the startup process that will reset or restore the stats I think that makes sense. It looks like that what you have in mind is much more generic, what about: - Focus on this thread first and then move the call as proposed above - Think about a more generic idea later on (on the per-backend I/O stats is in). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Nov 21, 2024 at 07:18:24AM +0900, Michael Paquier wrote: > On Wed, Nov 20, 2024 at 02:20:18PM +0000, Bertrand Drouvot wrote: > > Right. I did not had in mind to go that far here (for the per backend stats > > needs). My idea was "just" to move the new pgstat_create_backend_stat() (which > > is related to per backend stats only) call at the right place in StartupXLOG() > > for the startup process only. As that's the startup process that will reset > > or restore the stats I think that makes sense. > > > > It looks like that what you have in mind is much more generic, what about: > > > > - Focus on this thread first and then move the call as proposed above > > - Think about a more generic idea later on (on the per-backend I/O stats is > > in). > > Moving pgstat_create_backend_stat() may be OK in the long-term, at > least that would document why we need to care about the startup > process. Yeap. > Still, moving only the call feels incomplete, so how about adding a > boolean field in PgStat_ShmemControl that defaults to false when the > shmem area is initialized in StatsShmemInit(), then switched to true > once we know that the stats have been restored or reset by the startup > process. Something like that should work to control the dshash > inserts, I guess? I did some tests (on a primary and on a standby) and currently there is no call to pgstat_get_entry_ref() at all before we reach the stats reset or restore in StartupXLOG(). The first ones appear after "we really open for business" (quoting process_pm_child_exit()). Now, with the per-backend I/O stats patch in place, there is 3 processes that could call pgstat_get_entry_ref() before we reach the stats reset or restore in StartupXLOG() (observed by setting a breakpoint on the startup process before stats are reset or restored): - checkpointer - background writer - startup process All calls are for the new backend stat kind. It also makes sense to see non "regular" backends as the "regulars" are not allowed to connect yet. Results: they could collect some stats for nothing since a reset or restore will happen after. Now, if we want to prevent new entries to be created before the stats are reset or restored: - then the end results will be the same (means starting the instance with reset or restored stats). The only difference would be that they would not collect stats for "nothing" during this small time window. - it looks like that would lead to some non negligible refactoring: I started the coding and observed the potential impact. The reason is that all the code (at least the one I looked at) does not expect to receive a NULL value from pgstat_get_entry_ref() (when asking for creation) and we would need to take care of all the cases (all the stats, all backend type) for safety reason. So, given that: - the end result would be the same - the code changes would be non negligible (unless we have a better idea than pgstat_get_entry_ref() returning a NULL value). I think that might be valid to try to implement it (to avoid those processes to collect the stats for nothing during this small time window) but I am not sure that's worth it (as the end result would be the same and the code change non negligible). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote: > On Thu, Nov 21, 2024 at 05:23:42PM +0000, Bertrand Drouvot wrote: > > So, given that: > > > > - the end result would be the same > > - the code changes would be non negligible (unless we have a better idea than > > pgstat_get_entry_ref() returning a NULL value). > > Hmm. created_entry only matters for pgstat_init_function_usage(). > All the other callers of pgstat_prep_pending_entry() pass a NULL > value. I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref(). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, Nov 25, 2024 at 10:06:44AM +0900, Michael Paquier wrote: > On Fri, Nov 22, 2024 at 07:49:58AM +0000, Bertrand Drouvot wrote: > > On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote: > >> Hmm. created_entry only matters for pgstat_init_function_usage(). > >> All the other callers of pgstat_prep_pending_entry() pass a NULL > >> value. > > > > I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref(). > > Ah, OK, I think that I see your point here. > > I am wondering how much this would matter as well for custom stats, > but we're not there yet without at least one release out and folks try > new things with these APIs and variable-numbered kinds. Not sure here, could custom stats start incrementing before the database system is ready to accept connections? > pgstat_prep_pending_entry() to return NULL even if "create" is true > may be a good thing, at the end, because that's the only way I can see > based on the current APIs where we could say "Sorry, but the stats > have not been loaded yet, so you cannot try to do anything related to > the dshash". Yeah, same here. > From my view having a kind of barrier would be cleaner in the long > run, but it's true that it may not be mandatory, as well. pg_stat_io > is currently OK to be called because the stats are loaded for > auxiliary processes because it uses fixed-numbered stats in shmem. > And it means we already have early calls that add stats getting > overwritten once the stats are loaded from the on-disk file (Am I > getting this part right?). Yeah, we can already see that, for example, the background writer could enter pgstat_io_flush_cb() before the stats are reset or restored. > Anyway, do we really require that for the sake of this thread? We > know that there's only one of each auxiliary process at a time, and > they keep a footprint in pg_stat_io already. So we could just limit > outselves to live database backends, WAL senders and autovacuum > workers, everything that's not auxiliary and spawned on request? I think that's a fair starting point and that we will not lose any informations doing so (as you said there is only one of each auxiliary process at a time, so that one could already see their stats from pg_stat_io). The only cons that I can see is that we will not be able to merge the flush cb but I don't think that's a blocker (the flush are done in shared memory so the impact on performance should not be that much of an issue). I'll come back with a new version implementing the above. [1]: https://www.postgresql.org/message-id/Zz9sno%2BJJbWqdXhQ%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Nov 27, 2024 at 03:33:38PM +0900, Michael Paquier wrote: > On Mon, Nov 25, 2024 at 03:47:59PM +0000, Bertrand Drouvot wrote: > > It also takes care of most of the comments that you have made in [1], meaning > > that it: > > > > - removes the backend type from PgStat_Backend and look for the backend type > > at "display" time. > > - creates PgStat_BackendPendingIO and PgStat_PendingIO now refers to it (I > > used PgStat_BackendPendingIO and not PgStat_BackendPending because this is what > > it is after all). > > - adds the missing comment related to the PID in the doc. > > - merges 0004 with 0001 (so that pg_stat_get_backend_io() is now part of 0001). > > - creates its own pgstat_backend.c file. > > I have begun studying the patch, and I have one question. > > +void > +pgstat_create_backend_stat(ProcNumber procnum) > [...] > + /* Create the per-backend statistics entry */ > + if (pgstat_tracks_per_backend_bktype(MyBackendType)) > + pgstat_create_backend_stat(MyProcNumber); > > Perhaps that's a very stupid question, but I was looking at this part > of the patch, and wondered why we don't use init_backend_cb? I > vaguely recalled that MyProcNumber and MyBackendType would be set > before we go through the pgstat initialization step. MyDatabaseId is > filled after the pgstat initialization, but we don't care about this > information for such backend stats. Using init_backend_cb (and moving the pgstat_tracks_per_backend_bktype() check in it) is currently producing a failed assertion: TRAP: failed Assert("pgstat_is_initialized && !pgstat_is_shutdown"), File: "pgstat.c", Line: 1567, PID: 2080000 postgres: postgres postgres [local] initializing(ExceptionalCondition+0xbb)[0x5fd0c69afaed] postgres: postgres postgres [local] initializing(pgstat_assert_is_up+0x3f)[0x5fd0c67d1b77] postgres: postgres postgres [local] initializing(pgstat_get_entry_ref+0x95)[0x5fd0c67db068] postgres: postgres postgres [local] initializing(pgstat_prep_pending_entry+0xaa)[0x5fd0c67d1181] postgres: postgres postgres [local] initializing(pgstat_create_backend_stat+0x96)[0x5fd0c67d3986] But even if we, say move the "pgstat_is_initialized = true" before the init_backend_cb() call in pgstat_initialize() (not saying that makes sense, just trying out), then we are back to the restore/reset issue but this time during initdb: TRAP: failed Assert("!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key)"), File: "pgstat_shmem.c", Line: 852,PID: 2109086 /home/postgres/postgresql/pg_installed/pg18/bin/postgres(ExceptionalCondition+0xbb)[0x621723499c5a] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x70dd2b)[0x6217232c5d2b] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(pgstat_drop_all_entries+0x61)[0x6217232c60b5] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x705356)[0x6217232bd356] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x70462a)[0x6217232bc62a] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(pgstat_restore_stats+0x1c)[0x6217232b9f01] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(StartupXLOG+0x693)[0x621722da9697] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(InitPostgres+0x1d8)[0x6217234b40df] /home/postgres/postgresql/pg_installed/pg18/bin/postgres(BootstrapModeMain+0x532)[0x621722dd79bf] where "PID: 2109086" is a B_STANDALONE_BACKEND. This is due to the fact that, during the initdb bootstrap, init_backend_cb() is called before StartupXLOG(). One option could be to move B_STANDALONE_BACKEND in the "false" section of pgstat_tracks_per_backend_bktype() and ensure that we set pgstat_is_initialized to true before init_backend_cb() is called (but I don't think that makes that much sense). I'd vote to just keep the pgstat_create_backend_stat() call in pgstat_beinit(). That said, we probably should document that init_backend_cb() should not call pgstat_get_entry_ref(), but that's probably worth another thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Dec 17, 2024 at 03:26:02PM +0900, Michael Paquier wrote: > On Mon, Dec 16, 2024 at 03:42:04PM +0000, Bertrand Drouvot wrote: > >> Perhaps there's an argument for an entirely separate > >> callback that would run before pgstat is plugged off, like a new > >> before_shmem_exit() callback registered after the entry is created? > > > > As the ProcKill() is run in shmem_exit() (and so after before_shmem_exit()), > > I think that the way we currently drop the backend stats entry is fine (unless > > I miss something about your concern). > > Looking closely at this one, I think that you're right as long as the > shutdown callback is registered before the entry is created. Yeah, which is the case. > Anyway, I have put my hands on v9. A couple of notes, while hacking > through it. See v9-0002 for the parts I have modified, that applies > on top of your v9-0001. Thanks! > You may have noticed fee2b3ea2ecdg, which led me to fix two comments > as these paths are not related to only database objects. Yeap ;-) > Added more documentation, tweaked quite a bit the comments, a bit less > the docs (no need for the two linkends) and applied an indentation. Looks good to me. > pg_stat_get_backend_io() can be simpler, and does not need the loop > with the extra LocalPgBackendStatus. It is possible to call once > pgstat_get_beentry_by_proc_number() and retrieve the PID and the > bktype for the two sanity checks we want to do on them. Looked at the changes you've done and agree that's simpler. > s/pgstat_fetch_proc_stat_io/pgstat_fetch_stat_backend/. Not sure about this one as being called in pg_stat_get_backend_io(). > s/pgstat_create_backend_stat/pgstat_create_backend/. LGTM. > I've been wondering for quite a bit about PgStat_BackendPendingIO and > PgStat_PendingIO, and concluded to define both in pgstat.h, with the > former being defined based on the latter to keep the dependency > between both at the same place. That's fine by me. Also: === 1 the changes in pgstat_flush_backend() make sense to me. === 2 - * Create the backend statistics entry for procnum. + * Create backend statistics entry for proc number. Yeah, "proc number" looks better as already used in multiple places. === 3 the changes in stats.sql look good to me. Having said that, v9-0002 looks good to me (except the pgstat_fetch_proc_stat_io renaming). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Tue, Dec 17, 2024 at 06:13:44PM +0900, Michael Paquier wrote: > On Tue, Dec 17, 2024 at 08:13:59AM +0000, Bertrand Drouvot wrote: > > Having said that, v9-0002 looks good to me (except the pgstat_fetch_proc_stat_io > > renaming). > > This routine returns a PgStat_Backend from a pgstats entry of type > PGSTAT_KIND_BACKEND, so the name in v9-0001 would not be true once > more types of stats data are attached to this structure. Fair enough, let's go with the name change done in v9-0002. > The name in > v9-0002 is more consistent long-term. Agree, we may need to add parameters to it but we'll see when the time comes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Dec 18, 2024 at 08:11:55AM +0000, Bertrand Drouvot wrote: > Yeah, I also had something like this in mind (see R2 in [1]). So, +1 for it. [1] being: https://www.postgresql.org/message-id/Zy4bmvgHqGjcK1pI%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Dec 19, 2024 at 01:21:54PM +0900, Michael Paquier wrote: > > While doing more tests with backends exiting concurrently with their > stats scanned, I have detected one path in pg_stat_get_backend_io() > after calling pgstat_get_beentry_by_proc_number() where we should also > check that it does not return NULL, or we would crash on a pointer > dereference when reading the backend type. > > Fixed that, Oh right, indeed all the others pgstat_get_beentry_by_proc_number() callers are checking for a NULL returned value. > bumped the two version counters, and done. Thanks! I think I'll start a dedicated thread to discuss the stats_fetch_consistency/'snapshot' point (will be easier to follow than resuming the discussion in this thread). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hello Michael,
19.12.2024 06:21, Michael Paquier wrote:
19.12.2024 06:21, Michael Paquier wrote:
Fixed that, bumped the two version counters, and done.
Could you, please, look at recent failures produced by grassquit (which
has fsync = on in it's config), on an added test case? For instance, [1]:
--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/stats.out 2024-12-19 04:44:08.779311933 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2024-12-19 16:37:41.351784840 +0000
@@ -1333,7 +1333,7 @@
AND :my_io_sum_shared_after_fsyncs= 0);
?column?
----------
- t
+ f
(1 row)
The complete query is:
SELECT current_setting('fsync') = 'off'
OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
AND :my_io_sum_shared_after_fsyncs= 0);
And the corresponding query in 027_stream_regress_primary.log is:
2024-12-19 16:37:39.907 UTC [4027467][client backend][15/1980:0] LOG: statement: SELECT current_setting('fsync') = 'off'
OR (1 = 1
AND 1= 0);
(I can reproduce this locally with an asan-enabled build too.)
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-12-19%2016%3A28%3A58
Best regards,
Alexander
Hi, On Thu, Dec 19, 2024 at 06:12:04AM +0000, Bertrand Drouvot wrote: > On Thu, Dec 19, 2024 at 01:21:54PM +0900, Michael Paquier wrote: > > bumped the two version counters, and done. > > The existing structure could be expanded in the > > future to add more information about other statistics related to > > backends, depending on requirements or ideas. BTW, now that the per backend I/O statistics is done, I'll start working on per backend wal statistics. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com