Thread: Pluggable cumulative statistics
Hi all, While looking at ways to make pg_stat_statements more scalable and dynamically manageable (no more PGC_POSTMASTER for the max number of entries), which came out as using a dshash, Andres has mentioned me off-list (on twitter/X) that we'd better plug in it to the shmem pgstats facility, moving the text file that holds the query strings into memory (with size restrictions for the query strings, for example). This has challenges on its own (query ID is 8 bytes incompatible with the dboid/objid hash key used by pgstats, discard of entries when maximum). Anyway, this won't happen if we don't do one of these two things: 1) Move pg_stat_statements into core, adapting pgstats for its requirements. 2) Make the shmem pgstats pluggable so as it is possible for extensions to register their own stats kinds. 1) may have its advantages, still I am not sure if we want to do that. And 2) is actually something that can be used for more things than just pg_stat_statements, because people love extensions and statistics (spoiler: I do). The idea is simple: any extension defining a custom stats kind would be able to rely on all the in-core facilities we use for the existing in-core kinds: a) Snapshotting and caching of the stats, via stats_fetch_consistency. b) Native handling and persistency of the custom stats data. c) Reuse stats after a crash, pointing at this comment in xlog.c: * TODO: With a bit of extra work we could just start with a pgstat file * associated with the checkpoint redo location we're starting from. This means that we always remove the stats after a crash. That's something I have a patch for, not for this thread, but the idea is that custom stats would also benefit from this property. The implementation is based on the following ideas: * A structure in shared memory that tracks the IDs of the custom stats kinds with their names. These are incremented starting from PGSTAT_KIND_LAST. * Processes use a local array cache that keeps tracks of all the custom PgStat_KindInfos, indexed by (kind_id - PGSTAT_KIND_LAST). * The kind IDs may change across restarts, meaning that any stats data associated to a custom kind is stored with the *name* of the custom stats kind. Depending on the discussion happening here, I'd be open to use the same concept as custom RMGRs, where custom kind IDs are "reserved", fixed in time, and tracked in the Postgres wiki. It is cheaper to store the stats this way, as well, while managing conflicts across extensions available in the community ecosystem. * Custom stats can be added without shared_preload_libraries, loading them from a shmem startup hook with shared_preload_libraries is also possible. * The shmem pgstats defines two types of statistics: the ones in a dshash and what's called a "fixed" type like for archiver, WAL, etc. pointing to areas of shared memory. All the fixed types are linked to structures for snapshotting and shmem tracking. As a matter of simplification and because I could not really see a case where I'd want to plug in a fixed stats kind, the patch forbids this case. This case could be allowed, but I'd rather refactor the structures of pgstat_internal.h so as we don't have traces of the "fixed" stats structures in so many areas. * Making custom stats data persistent is an interesting problem, and there are a couple of approaches I've considered: ** Allow custom kinds to define callbacks to read and write data from a source they'd want, like their own file through a fd. This has the disadvantage to remove the benefit of c) above. ** Store everything in the existing stats file, adding one type of entry like 'S' and 'N' with a "custom" type, where the *name* of the custom stats kind is stored instead of its ID computed from shared memory. A mix of both? The patch attached has used the second approach. If the process reading/writing the stats does not know about the custom stats data, the data is discarded. * pgstat.c has a big array called pgstat_kind_infos to define all the existing stats kinds. Perhaps the code should be refactored to use this new API? That would make the code more consistent with what we do for resource managers, for one, while moving the KindInfos into their own file. With that in mind, storing the kind ID in KindInfos feels intuitive. While thinking about a use case to show what these APIs can do, I have decided to add statistics to the existing module injection_points rather than implement a new test module, gathering data about them and have tests that could use this data (like tracking the number of times a point is taken). This is simple enough that it can be used as a template, as well. There is a TAP test checking the data persistence across restarts, so I did not mess up this part much, hopefully. Please find attached a patch set implementing these ideas: - 0001 switches PgStat_Kind from an enum to a uint32, for the internal counters. - 0002 is some cleanup for the hardcoded S, N and E in pgstat.c. - 0003 introduces the backend-side APIs, with the shmem table counter and the routine to give code paths a way to register their own stats kind (see pgstat_add_kind). - 0004 implements an example of how to use these APIs, see injection_stats.c in src/test/modules/injection_points/. - 0005 adds some docs. - 0006 is an idea of how to make this custom stats data persistent. This will hopefully spark a discussion, and I was looking for answers regarding these questions: - Should the pgstat_kind_infos array in pgstat.c be refactored to use something similar to pgstat_add_kind? - How should the persistence of the custom stats be achieved? Callbacks to give custom stats kinds a way to write/read their data, push everything into a single file, or support both? - Should this do like custom RMGRs and assign to each stats kinds ID that are set in stone rather than dynamic ones? Thanks for reading. -- Michael
Attachment
- 0001-Switch-PgStat_Kind-from-enum-to-uint32.patch
- 0002-Replace-hardcoded-identifiers-in-pgstats-file-by-var.patch
- 0003-Introduce-pluggable-APIs-for-Cumulative-Statistics.patch
- 0004-injection_points-Add-statistics-for-custom-points.patch
- 0005-doc-Add-section-for-Custom-Cumulative-Statistics-API.patch
- 0006-Extend-custom-cumulative-stats-to-be-persistent.patch
- signature.asc
On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > - How should the persistence of the custom stats be achieved? > Callbacks to give custom stats kinds a way to write/read their data, > push everything into a single file, or support both? > - Should this do like custom RMGRs and assign to each stats kinds ID > that are set in stone rather than dynamic ones? These two questions have been itching me in terms of how it would work for extension developers, after noticing that custom RMGRs are used more than I thought: https://wiki.postgresql.org/wiki/CustomWALResourceManagers The result is proving to be nicer, shorter by 300 lines in total and much simpler when it comes to think about the way stats are flushed because it is possible to achieve the same result as the first patch set without manipulating any of the code paths doing the read and write of the pgstats file. In terms of implementation, pgstat.c's KindInfo data is divided into two parts, for efficiency: - The exiting in-core stats with designated initializers, renamed as built-in stats kinds. - The custom stats kinds are saved in TopMemoryContext, and can only be registered with shared_preload_libraries. The patch reserves a set of 128 harcoded slots for all the custom kinds making the lookups for the KindInfos quite cheap. Upon registration, a custom stats kind needs to assign a unique ID, with uniqueness on the names and IDs checked at registration. The backend code does ID -> information lookups in the hotter paths, meaning that the code only checks if an ID is built-in or custom, then redirects to the correct array where the information is stored. There is one code path that does a name -> information lookup for the undocumented SQL function pg_stat_have_stats() used in the tests, which is a bit less efficient now, but that does not strike me as an issue. modules/injection_points/ works as previously as a template to show how to use these APIs, with tests for the whole. With that in mind, the patch set is more pleasant to the eye, and the attached v2 consists of: - 0001 and 0002 are some cleanups, same as previously to prepare for the backend-side APIs. - 0003 adds the backend support to plug-in custom stats. - 0004 includes documentation. - 0005 is an example of how to use them, with a TAP test providing coverage. Note that the patch I've proposed to make stats persistent at checkpoint so as we don't discard everything after a crash is able to work with the custom stats proposed on this thread: https://commitfest.postgresql.org/48/5047/ -- Michael
Attachment
- v2-0004-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v2-0002-Replace-hardcoded-identifiers-in-pgstats-file-by-.patch
- v2-0003-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v2-0001-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v2-0005-injection_points-Add-statistics-for-custom-points.patch
- signature.asc
Hi, On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > Hi all, > > 2) Make the shmem pgstats pluggable so as it is possible for extensions > to register their own stats kinds. Thanks for the patch! I like the idea of having custom stats (it has also been somehow mentioned in [1]). > 2) is actually something that can be used for more things than > just pg_stat_statements, because people love extensions and > statistics (spoiler: I do). +1 > * Making custom stats data persistent is an interesting problem, and > there are a couple of approaches I've considered: > ** Allow custom kinds to define callbacks to read and write data from > a source they'd want, like their own file through a fd. This has the > disadvantage to remove the benefit of c) above. > ** Store everything in the existing stats file, adding one type of > entry like 'S' and 'N' with a "custom" type, where the *name* of the > custom stats kind is stored instead of its ID computed from shared > memory. What about having 2 files? - One is the existing stats file - One "predefined" for all the custom stats (so what you've done minus the in-core stats). This one would not be configurable and the extensions will not need to know about it. Would that remove the benefit from c) that you mentioned up-thread? [1]: https://www.postgresql.org/message-id/20220818195124.c7ipzf6c5v7vxymc%40awork3.anarazel.de Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote: > On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > > - How should the persistence of the custom stats be achieved? > > Callbacks to give custom stats kinds a way to write/read their data, > > push everything into a single file, or support both? > > - Should this do like custom RMGRs and assign to each stats kinds ID > > that are set in stone rather than dynamic ones? > These two questions have been itching me in terms of how it would work > for extension developers, after noticing that custom RMGRs are used > more than I thought: > https://wiki.postgresql.org/wiki/CustomWALResourceManagers > > The result is proving to be nicer, shorter by 300 lines in total and > much simpler when it comes to think about the way stats are flushed > because it is possible to achieve the same result as the first patch > set without manipulating any of the code paths doing the read and > write of the pgstats file. I think it makes sense to follow the same "behavior" as the custom wal resource managers. That, indeed, looks much more simpler than v1. > In terms of implementation, pgstat.c's KindInfo data is divided into > two parts, for efficiency: > - The exiting in-core stats with designated initializers, renamed as > built-in stats kinds. > - The custom stats kinds are saved in TopMemoryContext, Agree that a backend lifetime memory area is fine for that purpose. > and can only > be registered with shared_preload_libraries. The patch reserves a set > of 128 harcoded slots for all the custom kinds making the lookups for > the KindInfos quite cheap. + MemoryContextAllocZero(TopMemoryContext, + sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE); and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total. I had a quick look at the patches (have in mind to do more): > With that in mind, the patch set is more pleasant to the eye, and the > attached v2 consists of: > - 0001 and 0002 are some cleanups, same as previously to prepare for > the backend-side APIs. 0001 and 0002 look pretty straightforward at a quick look. > - 0003 adds the backend support to plug-in custom stats. 1 === It looks to me that there is a mix of "in core" and "built-in" to name the non custom stats. Maybe it's worth to just use one? As I can see (and as you said above) this is mainly inspired by the custom resource manager and 2 === and 3 === are probably copy/paste consequences. 2 === + if (pgstat_kind_custom_infos[idx] != NULL && + pgstat_kind_custom_infos[idx]->name != NULL) + ereport(ERROR, + (errmsg("failed to register custom cumulative statistics \"%s\" with ID %u", kind_info->name,kind), + errdetail("Custom resource manager \"%s\" already registered with the same ID.", + pgstat_kind_custom_infos[idx]->name))); s/Custom resource manager/Custom cumulative statistics/ 3 === + ereport(LOG, + (errmsg("registered custom resource manager \"%s\" with ID %u", + kind_info->name, kind))); s/custom resource manager/custom cumulative statistics/ > - 0004 includes documentation. Did not look yet. > - 0005 is an example of how to use them, with a TAP test providing > coverage. Did not look yet. As I said, I've in mind to do a more in depth review. I've noted the above while doing a quick read of the patches so thought it makes sense to share them now while at it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Jun 20, 2024 at 01:05:42PM +0000, Bertrand Drouvot wrote: > On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: >> * Making custom stats data persistent is an interesting problem, and >> there are a couple of approaches I've considered: >> ** Allow custom kinds to define callbacks to read and write data from >> a source they'd want, like their own file through a fd. This has the >> disadvantage to remove the benefit of c) above. >> ** Store everything in the existing stats file, adding one type of >> entry like 'S' and 'N' with a "custom" type, where the *name* of the >> custom stats kind is stored instead of its ID computed from shared >> memory. > > What about having 2 files? > > - One is the existing stats file > - One "predefined" for all the custom stats (so what you've done minus the > in-core stats). This one would not be configurable and the extensions will > not need to know about it. Another thing that can be done here is to add a few callbacks to control how an entry should be written out when the dshash is scanned or read when the dshash is populated depending on the KindInfo. That's not really complicated to do as the populate part could have a cleanup phase if an error is found. I just did not do it yet because this patch set is already covering a lot, just to get the basics in. > Would that remove the benefit from c) that you mentioned up-thread? Yes, that can be slightly annoying. Splitting the stats across multiple files would mean that each stats file would have to store the redo LSN. That's not really complicated to implement, but really easy to miss. Perhaps folks implementing their own stats kinds would be aware anyway because we are going to need a callback to initialize the file to write if we do that, and the redo LSN should be provided in input of it. Giving more control to extension developers here would be OK for me, especially since they could use their own format for their output file(s). -- Michael
Attachment
On Thu, Jun 20, 2024 at 02:27:14PM +0000, Bertrand Drouvot wrote: > On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote: > I think it makes sense to follow the same "behavior" as the custom > wal resource managers. That, indeed, looks much more simpler than v1. Thanks for the feedback. >> and can only >> be registered with shared_preload_libraries. The patch reserves a set >> of 128 harcoded slots for all the custom kinds making the lookups for >> the KindInfos quite cheap. > > + MemoryContextAllocZero(TopMemoryContext, > + sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE); > > and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total. Enlarging that does not worry me much. Just not too much. >> With that in mind, the patch set is more pleasant to the eye, and the >> attached v2 consists of: >> - 0001 and 0002 are some cleanups, same as previously to prepare for >> the backend-side APIs. > > 0001 and 0002 look pretty straightforward at a quick look. 0002 is quite independentn. Still, 0001 depends a bit on the rest. Anyway, the Kind is already 4 bytes and it cleans up some APIs that used int for the Kind, so enforcing signedness is just cleaner IMO. >> - 0003 adds the backend support to plug-in custom stats. > > 1 === > > It looks to me that there is a mix of "in core" and "built-in" to name the > non custom stats. Maybe it's worth to just use one? Right. Perhaps better to remove "in core" and stick to "builtin", as I've used the latter for the variables and such. > As I can see (and as you said above) this is mainly inspired by the custom > resource manager and 2 === and 3 === are probably copy/paste consequences. > > 2 === > > + if (pgstat_kind_custom_infos[idx] != NULL && > + pgstat_kind_custom_infos[idx]->name != NULL) > + ereport(ERROR, > + (errmsg("failed to register custom cumulative statistics \"%s\" with ID %u", kind_info->name,kind), > + errdetail("Custom resource manager \"%s\" already registered with the same ID.", > + pgstat_kind_custom_infos[idx]->name))); > > s/Custom resource manager/Custom cumulative statistics/ > > 3 === > > + ereport(LOG, > + (errmsg("registered custom resource manager \"%s\" with ID %u", > + kind_info->name, kind))); > > s/custom resource manager/custom cumulative statistics/ Oops. Will fix. -- Michael
Attachment
At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in > * The kind IDs may change across restarts, meaning that any stats data > associated to a custom kind is stored with the *name* of the custom > stats kind. Depending on the discussion happening here, I'd be open > to use the same concept as custom RMGRs, where custom kind IDs are > "reserved", fixed in time, and tracked in the Postgres wiki. It is > cheaper to store the stats this way, as well, while managing conflicts > across extensions available in the community ecosystem. I prefer to avoid having a central database if possible. If we don't intend to move stats data alone out of a cluster for use in another one, can't we store the relationship between stats names and numeric IDs (or index numbers) in a separate file, which is loaded just before and synced just after extension preloading finishes? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jun 21, 2024 at 01:09:10PM +0900, Kyotaro Horiguchi wrote: > At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in >> * The kind IDs may change across restarts, meaning that any stats data >> associated to a custom kind is stored with the *name* of the custom >> stats kind. Depending on the discussion happening here, I'd be open >> to use the same concept as custom RMGRs, where custom kind IDs are >> "reserved", fixed in time, and tracked in the Postgres wiki. It is >> cheaper to store the stats this way, as well, while managing conflicts >> across extensions available in the community ecosystem. > > I prefer to avoid having a central database if possible. I was thinking the same originally, but the experience with custom RMGRs has made me change my mind. There are more of these than I thought originally: https://wiki.postgresql.org/wiki/CustomWALResourceManagers > If we don't intend to move stats data alone out of a cluster for use > in another one, can't we store the relationship between stats names > and numeric IDs (or index numbers) in a separate file, which is loaded > just before and synced just after extension preloading finishes? Yeah, I've implemented a prototype that does exactly something like that with a restriction on the stats name to NAMEDATALEN, except that I've added the kind ID <-> kind name mapping at the beginning of the main stats file. At the end, it still felt weird and over-engineered to me, like the v1 prototype of upthread, because we finish with a strange mix when reloading the dshash where the builtin ID are handled with fixed values, with more code paths required when doing the serialize callback dance for stats kinds like replication slots, because the custom kinds need to update their hash keys to the new values based on the ID/name mapping stored at the beginning of the file itself. The equation complicates itself a bit more once you'd try to add more ways to write some stats kinds to other places, depending on what a custom kind wants to achieve. I can see the benefits of both approaches, still fixing the IDs in time leads to a lot of simplicity in this infra, which is very appealing on its own before tackling the next issues where I would rely on the proposed APIs. -- Michael
Attachment
On Fri, Jun 21, 2024 at 08:13:15AM +0900, Michael Paquier wrote: > On Thu, Jun 20, 2024 at 02:27:14PM +0000, Bertrand Drouvot wrote: >> On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote: >> I think it makes sense to follow the same "behavior" as the custom >> wal resource managers. That, indeed, looks much more simpler than v1. > > Thanks for the feedback. While looking at a different patch from Tristan in this area at [1], I still got annoyed that this patch set was not able to support the case of custom fixed-numbered stats, so as it is possible to plug in pgstats things similar to the archiver, the checkpointer, WAL, etc. These are plugged in shared memory, and are handled with copies in the stats snapshots. After a good night of sleep, I have come up with a good solution for that, among the following lines: - PgStat_ShmemControl holds an array of void* indexed by PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each fixed-numbered stats. Each entry is allocated a size corresponding to PgStat_KindInfo->shared_size. - PgStat_Snapshot holds an array of void* also indexed by PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the snapshots. These have a size of PgStat_KindInfo->shared_data_len, set up when stats are initialized at process startup, so this reflects everywhere. - Fixed numbered stats now set shared_size, and we use this number to determine the size to allocate for each fixed-numbered stats in shmem. - A callback is added to initialize the shared memory assigned to each fixed-numbered stats, consisting of LWLock initializations for the current types of stats. So this initialization step is moved out of pgstat.c into each stats kind file. All that has been done in the rebased patch set as of 0001, which is kind of a nice cleanup overall because it removes all the dependencies to the fixed-numbered stats structures from the "main" pgstats code in pgstat.c and pgstat_shmem.c. The remaining patches consist of: - 0002, Switch PgStat_Kind to a uint32. Cleanup. - 0003 introduces the pluggable stats facility. Feeding on the refactoring for the fixed-numbered stats in 0001, it is actually possible to get support for these in the pluggable APIs by just removing the restriction in the registration path. This extends the void* arrays to store references that cover the range of custom kind IDs. - 0004 has some docs. - 0005 includes an example of implementation for variable-numbered stats with the injection_points module. - 0006 is new for this thread, implementing an example for fixed-numbered stats, using again the injection_points module. This stuff gathers stats about the number of times points are run, attached and detached. Perhaps that's useful in itself, I don't know, but it provides the coverage I want for this facility. While on it, I have applied one of the cleanup patches as 9fd02525793f. [1]: https://www.postgresql.org/message-id/ZoNytpoHOzHGBLYi@paquier.xyz -- Michael
Attachment
- v3-0001-Rework-handling-of-fixed-numbered-statistics-in-p.patch
- v3-0002-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v3-0003-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v3-0004-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v3-0005-injection_points-Add-statistics-for-custom-points.patch
- v3-0006-Add-support-for-fixed-numbered-statistics-in-plug.patch
- signature.asc
On 6/13/24 14:59, Michael Paquier wrote: > This will hopefully spark a discussion, and I was looking for answers > regarding these questions: > - Should the pgstat_kind_infos array in pgstat.c be refactored to use > something similar to pgstat_add_kind? > - How should the persistence of the custom stats be achieved? > Callbacks to give custom stats kinds a way to write/read their data, > push everything into a single file, or support both? > - Should this do like custom RMGRs and assign to each stats kinds ID > that are set in stone rather than dynamic ones? It is a feature my extensions (which usually change planning behaviour) definitely need. It is a problem to show the user if the extension does something or not because TPS smooths the execution time of a single query and performance cliffs. BTW, we have 'labelled DSM segments', which allowed extensions to be 'lightweight' - not necessarily be loaded on startup, stay backend-local and utilise shared resources. It was a tremendous win for me. Is it possible to design this extension in the same way? -- regards, Andrei Lepikhov
On Thu, Jul 04, 2024 at 10:11:02AM +0700, Andrei Lepikhov wrote: > It is a feature my extensions (which usually change planning behaviour) > definitely need. It is a problem to show the user if the extension does > something or not because TPS smooths the execution time of a single query > and performance cliffs. Yeah, I can get that. pgstat.c is quite good regarding that as it delays stats flushes until commit by holding pending entries (see the pgStatPending business for variable-size stats). Custom stats kinds registered would just rely on these facilities, including snapshot APIs, etc. > BTW, we have 'labelled DSM segments', which allowed extensions to be > 'lightweight' - not necessarily be loaded on startup, stay backend-local and > utilise shared resources. It was a tremendous win for me. > > Is it possible to design this extension in the same way? I am not sure how this would be useful when it comes to cumulative statistics, TBH. These stats are global by design, and especially since these most likely need to be flushed at shutdown (as of HEAD) and read at startup, the simplest way to achieve that to let the checkpointer and the startup process know about them is to restrict the registration of custom stats types via _PG_init when loading shared libraries. That's what we do for custom WAL RMGRs, for example. I would not be against a new flag in KindInfo to state that a given stats type should not be flushed, as much as a set of callbacks that offers the possibility to redirect some stats kinds to somewhere else than pgstat.stat, like pg_stat_statements. That would be a separate patch than what's proposed here. -- Michael
Attachment
Hi, On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote: > While looking at a different patch from Tristan in this area at [1], I > still got annoyed that this patch set was not able to support the case > of custom fixed-numbered stats, so as it is possible to plug in > pgstats things similar to the archiver, the checkpointer, WAL, etc. > These are plugged in shared memory, and are handled with copies in the > stats snapshots. After a good night of sleep, I have come up with a > good solution for that, Great! > among the following lines: > - PgStat_ShmemControl holds an array of void* indexed by > PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each > fixed-numbered stats. Each entry is allocated a size corresponding to > PgStat_KindInfo->shared_size. That makes sense to me, and that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now. > - PgStat_Snapshot holds an array of void* also indexed by > PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the > snapshots. Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now. > These have a size of PgStat_KindInfo->shared_data_len, set > up when stats are initialized at process startup, so this reflects > everywhere. Yeah. > - Fixed numbered stats now set shared_size, and we use this number to > determine the size to allocate for each fixed-numbered stats in shmem. > - A callback is added to initialize the shared memory assigned to each > fixed-numbered stats, consisting of LWLock initializations for the > current types of stats. So this initialization step is moved out of > pgstat.c into each stats kind file. That looks a reasonable approach to me. > All that has been done in the rebased patch set as of 0001, which is > kind of a nice cleanup overall because it removes all the dependencies > to the fixed-numbered stats structures from the "main" pgstats code in > pgstat.c and pgstat_shmem.c. Looking at 0001: 1 === In the commit message: - Fixed numbered stats now set shared_size, so as Is something missing in that sentence? 2 === @@ -425,14 +427,12 @@ typedef struct PgStat_ShmemControl pg_atomic_uint64 gc_request_count; /* - * Stats data for fixed-numbered objects. + * Stats data for fixed-numbered objects, indexed by PgStat_Kind. + * + * Each entry has a size of PgStat_KindInfo->shared_size. */ - PgStatShared_Archiver archiver; - PgStatShared_BgWriter bgwriter; - PgStatShared_Checkpointer checkpointer; - PgStatShared_IO io; - PgStatShared_SLRU slru; - PgStatShared_Wal wal; + void *fixed_data[PGSTAT_NUM_KINDS]; Can we move from PGSTAT_NUM_KINDS to the exact number of fixed stats? (add a new define PGSTAT_NUM_FIXED_KINDS for example). That's not a big deal but we are allocating some space for pointers that we won't use. Would need to change the "indexing" logic though. 3 === Same as 2 === but for PgStat_Snapshot. 4 === +static void pgstat_init_snapshot(void); what about pgstat_init_snapshot_fixed? (as it is for fixed-numbered statistics only). 5 === + /* Write various stats structs with fixed number of objects */ s/Write various stats/Write the stats/? (not coming from your patch but they all were listed before though). 6 === + for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++) + { + char *ptr; + const PgStat_KindInfo *info = pgstat_get_kind_info(kind); + + if (!info->fixed_amount) + continue; Nit: Move the "ptr" declaration into an extra else? (useless to declare it if it's not a fixed number stat) 7 === + /* prepare snapshot data and write it */ + pgstat_build_snapshot_fixed(kind); What about changing pgstat_build_snapshot_fixed() to accept a PgStat_KindInfo parameter (instead of the current PgStat_Kind one)? Reason is that pgstat_get_kind_info() is already called/known in pgstat_snapshot_fixed(), pgstat_build_snapshot() and pgstat_write_statsfile(). That would avoid pgstat_build_snapshot_fixed() to retrieve (again) the kind_info. 8 === /* * Reads in existing statistics file into the shared stats hash. This comment above pgstat_read_statsfile() is not correct, fixed stats are not going to the hash (was there before your patch though). 9 === +pgstat_archiver_init_shmem_cb(void *stats) +{ + PgStatShared_Archiver *stats_shmem = (PgStatShared_Archiver *) stats; + + LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA); Nit: Almost all the pgstat_XXX_init_shmem_cb() look very similar, I wonder if we could use a macro to avoid code duplication. 10 === Remark not related to this patch: I think we could get rid of the shared_data_off for the fixed stats (by moving the "stats" part at the header of their dedicated struct). That would mean having things like: " typedef struct PgStatShared_Archiver { PgStat_ArchiverStats stats; /* lock protects ->reset_offset as well as stats->stat_reset_timestamp */ LWLock lock; uint32 changecount; PgStat_ArchiverStats reset_offset; } PgStatShared_Archiver; " Not sure that's worth it though. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On 2024-07-03 18:47:15 +0900, Michael Paquier wrote: > While looking at a different patch from Tristan in this area at [1], I > still got annoyed that this patch set was not able to support the case > of custom fixed-numbered stats, so as it is possible to plug in > pgstats things similar to the archiver, the checkpointer, WAL, etc. > These are plugged in shared memory, and are handled with copies in the > stats snapshots. After a good night of sleep, I have come up with a > good solution for that, among the following lines: > - PgStat_ShmemControl holds an array of void* indexed by > PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each > fixed-numbered stats. Each entry is allocated a size corresponding to > PgStat_KindInfo->shared_size. I am dubious this is a good idea. The more indirection you add, the more expensive it gets to count stuff, the more likely it is that we end up with backend-local "caching" in front of the stats system. IOW, I am against making builtin stats pay the price for pluggable fixed-numbered stats. It also substantially reduces type-safety, making it harder to refactor. Note that you had to add static casts in a good number of additional places. Greetings, Andres Freund
Hi, On 2024-06-13 16:59:50 +0900, Michael Paquier wrote: > * Making custom stats data persistent is an interesting problem, and > there are a couple of approaches I've considered: > ** Allow custom kinds to define callbacks to read and write data from > a source they'd want, like their own file through a fd. This has the > disadvantage to remove the benefit of c) above. I am *strongly* against this. That'll make it much harder to do stuff like not resetting stats after crashes and just generally will make it harder to improve the stats facility further. I think that pluggable users of the stats facility should only have control over how data is stored via quite generic means. Greetings, Andres Freund
Hi, On 2024-07-04 14:00:47 -0700, Andres Freund wrote: > On 2024-06-13 16:59:50 +0900, Michael Paquier wrote: > > * Making custom stats data persistent is an interesting problem, and > > there are a couple of approaches I've considered: > > ** Allow custom kinds to define callbacks to read and write data from > > a source they'd want, like their own file through a fd. This has the > > disadvantage to remove the benefit of c) above. > > I am *strongly* against this. That'll make it much harder to do stuff like not > resetting stats after crashes and just generally will make it harder to > improve the stats facility further. > > I think that pluggable users of the stats facility should only have control > over how data is stored via quite generic means. I forgot to say: In general I am highly supportive of this effort and thankful to Michael for tackling it. The above was just about that one aspect. Greetings, Andres Freund
On Thu, Jul 04, 2024 at 02:00:47PM -0700, Andres Freund wrote: > On 2024-06-13 16:59:50 +0900, Michael Paquier wrote: >> * Making custom stats data persistent is an interesting problem, and >> there are a couple of approaches I've considered: >> ** Allow custom kinds to define callbacks to read and write data from >> a source they'd want, like their own file through a fd. This has the >> disadvantage to remove the benefit of c) above. > > I am *strongly* against this. That'll make it much harder to do stuff like not > resetting stats after crashes and just generally will make it harder to > improve the stats facility further. > > I think that pluggable users of the stats facility should only have control > over how data is stored via quite generic means. I'm pretty much on the same line here, I think. If the redo logic is changed, then any stats kinds pushing their stats into their own file would need to copy/paste the same logic as the main file. And that's more error prone. I can get why some people would get that they don't want some stats kinds to never be flushed at shutdown or even read at startup. Adding more callbacks in this area is a separate discussion. -- Michael
Attachment
On Thu, Jul 04, 2024 at 02:08:25PM -0700, Andres Freund wrote: > I forgot to say: In general I am highly supportive of this effort and thankful > to Michael for tackling it. The above was just about that one aspect. Thanks. Let's discuss how people want this stuff to be shaped, and how much we want to cover. Better to do it one small step at a time. -- Michael
Attachment
On Thu, Jul 04, 2024 at 01:56:52PM -0700, Andres Freund wrote: > On 2024-07-03 18:47:15 +0900, Michael Paquier wrote: >> - PgStat_ShmemControl holds an array of void* indexed by >> PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each >> fixed-numbered stats. Each entry is allocated a size corresponding to >> PgStat_KindInfo->shared_size. > > I am dubious this is a good idea. The more indirection you add, the more > expensive it gets to count stuff, the more likely it is that we end up with > backend-local "caching" in front of the stats system. > > IOW, I am against making builtin stats pay the price for pluggable > fixed-numbered stats. Okay, noted. So, if I get that right, you would prefer an approach where we add an extra member in the snapshot and shmem control area dedicated only to the custom kind IDs, indexed based on the range of the custom kind IDs, leaving the built-in fixed structures in PgStat_ShmemControl and PgStat_Snapshot? I was feeling a bit uncomfortable with the extra redirection for the built-in fixed kinds, still the temptation of making that more generic was here, so.. Having the custom fixed types point to their own array in the snapshot and ShmemControl adds a couple more null-ness checks depending on if you're dealing with a builtin or custom ID range. That's mostly the path in charge of retrieving the KindInfos. > It also substantially reduces type-safety, making it harder to refactor. Note > that you had to add static casts in a good number of additional places. Not sure on this one, because that's the same issue as variable-numbered stats, no? The central dshash only knows about the size of the shared stats entries for each kind, with an offset to the stats data that gets copied to the snapshots. So I don't quite get the worry here. Separately from that, I think that read/write of the fixed-numbered stats would gain in clarity if we update them to be closer to the variable-numbers by storing entries with a specific character ('F' in 0001). If we keep track of the fixed-numbered structures in PgStat_Snapshot, that means adding an extra field in PgStat_KindInfo to point to the offset in PgStat_Snapshot for the write part. Note that the addition of the init_shmem callback simplifies shmem init, and it is also required for the fixed-numbered pluggable part. -- Michael
Attachment
On Thu, Jul 04, 2024 at 11:30:17AM +0000, Bertrand Drouvot wrote: > On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote: >> among the following lines: >> - PgStat_ShmemControl holds an array of void* indexed by >> PGSTAT_NUM_KINDS, pointing to shared memory areas allocated for each >> fixed-numbered stats. Each entry is allocated a size corresponding to >> PgStat_KindInfo->shared_size. > > That makes sense to me, and that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) > as compared to now. pgstat_io.c is by far the largest chunk. >> - PgStat_Snapshot holds an array of void* also indexed by >> PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the >> snapshots. > > Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now. Still Andres does not seem to like that much, well ;) > Looking at 0001: > > 1 === > > In the commit message: > > - Fixed numbered stats now set shared_size, so as > > Is something missing in that sentence? Right. This is missing a piece. > - PgStatShared_Archiver archiver; > - PgStatShared_BgWriter bgwriter; > - PgStatShared_Checkpointer checkpointer; > - PgStatShared_IO io; > - PgStatShared_SLRU slru; > - PgStatShared_Wal wal; > + void *fixed_data[PGSTAT_NUM_KINDS]; > > Can we move from PGSTAT_NUM_KINDS to the exact number of fixed stats? (add > a new define PGSTAT_NUM_FIXED_KINDS for example). That's not a big deal but we > are allocating some space for pointers that we won't use. Would need to change > the "indexing" logic though. > > 3 === > > Same as 2 === but for PgStat_Snapshot. > True for both. Based on the first inputs I got from Andres, the built-in fixed stats structures would be kept as they are now, and we could just add an extra member here for the custom fixed stats. That still results in a few bytes wasted as not all custom stats want fixed stats, but that's much cheaper. > 4 === > > +static void pgstat_init_snapshot(void); > > what about pgstat_init_snapshot_fixed? (as it is for fixed-numbered statistics > only). Sure. > 5 === > > + /* Write various stats structs with fixed number of objects */ > > s/Write various stats/Write the stats/? (not coming from your patch but they > all were listed before though). Yes, there are a few more things about that. > 6 === > > + for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++) > + { > + char *ptr; > + const PgStat_KindInfo *info = pgstat_get_kind_info(kind); > + > + if (!info->fixed_amount) > + continue; > > Nit: Move the "ptr" declaration into an extra else? (useless to declare it > if it's not a fixed number stat) Comes down to one's taste. I think that this is OK as-is, but that's my taste. > 7 === > > + /* prepare snapshot data and write it */ > + pgstat_build_snapshot_fixed(kind); > > What about changing pgstat_build_snapshot_fixed() to accept a PgStat_KindInfo > parameter (instead of the current PgStat_Kind one)? Reason is that > pgstat_get_kind_info() is already called/known in pgstat_snapshot_fixed(), > pgstat_build_snapshot() and pgstat_write_statsfile(). That would avoid > pgstat_build_snapshot_fixed() to retrieve (again) the kind_info. pgstat_snapshot_fixed() only calls pgstat_get_kind_info() with assertions enabled. Perhaps we could do that, just that it does not seem that critical to me. > 8 === > > /* > * Reads in existing statistics file into the shared stats hash. > > This comment above pgstat_read_statsfile() is not correct, fixed stats > are not going to the hash (was there before your patch though). Good catch. Let's adjust that separately. > 9 === > > +pgstat_archiver_init_shmem_cb(void *stats) > +{ > + PgStatShared_Archiver *stats_shmem = (PgStatShared_Archiver *) stats; > + > + LWLockInitialize(&stats_shmem->lock, LWTRANCHE_PGSTATS_DATA); > > Nit: Almost all the pgstat_XXX_init_shmem_cb() look very similar, I wonder if we > could use a macro to avoid code duplication. They are very similar, still can do different things like pgstat_io. I am not sure that the macro would bring more readability. > Remark not related to this patch: I think we could get rid of the shared_data_off > for the fixed stats (by moving the "stats" part at the header of their dedicated > struct). That would mean having things like: > > " > typedef struct PgStatShared_Archiver > { > PgStat_ArchiverStats stats; > /* lock protects ->reset_offset as well as stats->stat_reset_timestamp */ > LWLock lock; > uint32 changecount; > PgStat_ArchiverStats reset_offset; > } PgStatShared_Archiver; > " I'm not really convinced that it is a good idea to force the ordering of the members in the shared structures for the fixed-numbered stats, requiring these "stats" fields to always be first. -- Michael
Attachment
> On Fri, Jun 21, 2024 at 01:28:11PM +0900, Michael Paquier wrote: > On Fri, Jun 21, 2024 at 01:09:10PM +0900, Kyotaro Horiguchi wrote: > > At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier <michael@paquier.xyz> wrote in > >> * The kind IDs may change across restarts, meaning that any stats data > >> associated to a custom kind is stored with the *name* of the custom > >> stats kind. Depending on the discussion happening here, I'd be open > >> to use the same concept as custom RMGRs, where custom kind IDs are > >> "reserved", fixed in time, and tracked in the Postgres wiki. It is > >> cheaper to store the stats this way, as well, while managing conflicts > >> across extensions available in the community ecosystem. > > > > I prefer to avoid having a central database if possible. > > I was thinking the same originally, but the experience with custom > RMGRs has made me change my mind. There are more of these than I > thought originally: > https://wiki.postgresql.org/wiki/CustomWALResourceManagers From what I understand, coordinating custom RmgrIds via a wiki page was made under the assumption that implementing a table AM with custom WAL requires significant efforts, which limits the demand for ids. This might not be same for custom stats -- I've got an impression it's easier to create one, and there could be multiple kinds of stats per an extension (one per component), right? This would mean more kind Ids to manage and more efforts required to do that. I agree though that it makes sense to start this way, it's just simpler. But maybe it's worth thinking about some other solution in the long term, taking the over-engineered prototype as a sign that more refactoring is needed.
On Sun, Jul 07, 2024 at 12:21:26PM +0200, Dmitry Dolgov wrote: > From what I understand, coordinating custom RmgrIds via a wiki page was > made under the assumption that implementing a table AM with custom WAL > requires significant efforts, which limits the demand for ids. This > might not be same for custom stats -- I've got an impression it's easier > to create one, and there could be multiple kinds of stats per an > extension (one per component), right? This would mean more kind Ids to > manage and more efforts required to do that. A given module will likely have one single RMGR because it is possible to divide the RMGR into multiple records. Yes, this cannot really be said for stats, and a set of stats kinds in one module may want different kinds because these could have different properties. My guess is that a combination of one fixed-numbered to track a global state and one variable-numbered would be the combination most likely to happen. Also, my impression about pg_stat_statements is that we'd need this combination, actually, to track the number of entries in a tighter way because scanning all the partitions of the central dshash for entries with a specific KindInfo would have a high concurrency cost. > I agree though that it makes sense to start this way, it's just simpler. > But maybe it's worth thinking about some other solution in the long > term, taking the over-engineered prototype as a sign that more > refactoring is needed. The three possible methods I can think of here are, knowing that we use a central, unique, file to store the stats (per se the arguments on the redo thread for the stats): - Store the name of the stats kinds with each entry. This is very costly with many entries, and complicates the read-write paths because currently we rely on the KindInfo. - Store a mapping between the stats kind name and the KindInfo in the file at write, then use the mapping at read and compare it reassemble the entries stored. KindInfos are assigned at startup with a unique counter in shmem. As mentioned upthread, I've implemented something like that while making the custom stats being registered in the shmem_startup_hook with requests in shmem_request_hook. That felt over-engineered considering that the startup process needs to know the stats kinds very early anyway, so we need _PG_init() and should encourage its use. - Fix the KindInfos in time and centralize the values assigned. This eases the error control and can force the custom stats kinds to be registered when shared_preload_libraries is loaded. The read is faster as there is no need to re-check the mapping to reassemble the stats entries. At the end, fixing the KindInfos in time is the most reliable method here (debugging could be slightly easier, less complicated than with the mapping stored, still doable for all three methods). -- Michael
Attachment
On Fri, Jul 05, 2024 at 09:35:19AM +0900, Michael Paquier wrote: > On Thu, Jul 04, 2024 at 11:30:17AM +0000, Bertrand Drouvot wrote: >> On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote: >>> - PgStat_Snapshot holds an array of void* also indexed by >>> PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the >>> snapshots. >> >> Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now. > > Still Andres does not seem to like that much, well ;) Please find attached a rebased patch set labelled v4. Built-in fixed-numbered stats are still attached to the snapshot and shmem control structures, and custom fixed stats kinds are tracked in the same way as v3 with new members tracking data stored in TopMemoryContext for the snapshots and shmem for the control data. So, the custom and built-in stats kinds are separated into separate parts of the structures, including the "valid" flags for the snapshots. And this avoids any redirection when looking at the built-in fixed-numbered stats. I've tried at address all the previous comments (there could be stuff I've missed while rebasing, of course). The first three patches are refactoring pieces to make the rest more edible, while 0004~ implement the main logic with templates in modules/injection_points: - 0001 refactors pgstat_write_statsfile() so as a loop om PgStat_KindInfo is used to write the data. This is done with the addition of snapshot_ctl_off in PgStat_KindInfo, to point to the area in PgStat_Snapshot where the data is located for fixed stats. 9004abf6206e has done the same for the read part. - 0002 adds an init_shmem callback, to let stats kinds initialize states based on what's been allocated. - 0003 refactors the read/write to use a new entry type in the stats file for fixed-numbered stats. - 0004 switches PgStat_Kind from an enum to uint32, adding a better type for pluggability. - 0005 is the main implementation. - 0006 adds some docs. - 0007 (variable-numbered stats) and 0008 (fixed-numbered stats) are the examples demonstrating how to make pluggable stats for both types, with tests of their own. -- Michael
Attachment
- v4-0001-Use-pgstat_kind_infos-to-write-fixed-shared-stati.patch
- v4-0002-Add-PgStat_KindInfo.init_shmem_cb.patch
- v4-0003-Add-a-new-F-type-of-entry-in-pgstats-file.patch
- v4-0004-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v4-0005-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v4-0006-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v4-0007-injection_points-Add-statistics-for-custom-points.patch
- v4-0008-injection_points-Add-example-for-fixed-numbered-s.patch
- signature.asc
Hi, On Mon, Jul 08, 2024 at 02:30:23PM +0900, Michael Paquier wrote: > On Fri, Jul 05, 2024 at 09:35:19AM +0900, Michael Paquier wrote: > > On Thu, Jul 04, 2024 at 11:30:17AM +0000, Bertrand Drouvot wrote: > >> On Wed, Jul 03, 2024 at 06:47:15PM +0900, Michael Paquier wrote: > >>> - PgStat_Snapshot holds an array of void* also indexed by > >>> PGSTAT_NUM_KINDS, pointing to the fixed stats stored in the > >>> snapshots. > >> > >> Same, that's just a 96 bytes overhead (8 * PGSTAT_NUM_KINDS) as compared to now. > > > > Still Andres does not seem to like that much, well ;) > > Please find attached a rebased patch set labelled v4. Thanks! > Built-in > fixed-numbered stats are still attached to the snapshot and shmem > control structures, and custom fixed stats kinds are tracked in the > same way as v3 with new members tracking data stored in > TopMemoryContext for the snapshots and shmem for the control data. > So, the custom and built-in stats kinds are separated into separate > parts of the structures, including the "valid" flags for the > snapshots. And this avoids any redirection when looking at the > built-in fixed-numbered stats. Yeap. > I've tried at address all the previous comments (there could be stuff > I've missed while rebasing, of course). Thanks! > The first three patches are refactoring pieces to make the rest more > edible, while 0004~ implement the main logic with templates in > modules/injection_points: > - 0001 refactors pgstat_write_statsfile() so as a loop om > PgStat_KindInfo is used to write the data. This is done with the > addition of snapshot_ctl_off in PgStat_KindInfo, to point to the area > in PgStat_Snapshot where the data is located for fixed stats. > 9004abf6206e has done the same for the read part. Looking at 0001: 1 == + for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++) + { + char *ptr; + const PgStat_KindInfo *info = pgstat_get_kind_info(kind); I wonder if we could avoid going through stats that are not fixed ones. What about doing something like? " for (int kind = <first fixed>; kind <= <last fixed>; kind++); " Would probably need to change the indexing logic though. and then we could replace: + if (!info->fixed_amount) + continue; with an assert instead. Same would apply for the read part added in 9004abf6206e. 2 === + pgstat_build_snapshot_fixed(kind); + ptr = ((char *) &pgStatLocal.snapshot) + info->snapshot_ctl_off; + write_chunk(fpout, ptr, info->shared_data_len); I think that using "shared_data_len" is confusing here (was not the case in the context of 9004abf6206e). I mean it is perfectly correct but the wording "shared" looks weird to me when being used here. What it really is, is the size of the stats. What about renaming shared_data_len with stats_data_len? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Jul 08, 2024 at 06:39:56AM +0000, Bertrand Drouvot wrote: > + for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++) > + { > + char *ptr; > + const PgStat_KindInfo *info = pgstat_get_kind_info(kind); > > I wonder if we could avoid going through stats that are not fixed ones. What about > doing something like? > Same would apply for the read part added in 9004abf6206e. This becomes more relevant when the custom stats are added, as this performs a scan across the full range of IDs supported. So this choice is here for consistency, and to ease the pluggability. > 2 === > > + pgstat_build_snapshot_fixed(kind); > + ptr = ((char *) &pgStatLocal.snapshot) + info->snapshot_ctl_off; > + write_chunk(fpout, ptr, info->shared_data_len); > > I think that using "shared_data_len" is confusing here (was not the case in the > context of 9004abf6206e). I mean it is perfectly correct but the wording "shared" > looks weird to me when being used here. What it really is, is the size of the > stats. What about renaming shared_data_len with stats_data_len? It is the stats data associated to a shared entry. I think that's OK, but perhaps I'm just used to it as I've been staring at this area for days. -- Michael
Attachment
Hi, On Mon, Jul 08, 2024 at 03:49:34PM +0900, Michael Paquier wrote: > On Mon, Jul 08, 2024 at 06:39:56AM +0000, Bertrand Drouvot wrote: > > + for (int kind = PGSTAT_KIND_FIRST_VALID; kind <= PGSTAT_KIND_LAST; kind++) > > + { > > + char *ptr; > > + const PgStat_KindInfo *info = pgstat_get_kind_info(kind); > > > > I wonder if we could avoid going through stats that are not fixed ones. What about > > doing something like? > > Same would apply for the read part added in 9004abf6206e. > > This becomes more relevant when the custom stats are added, as this > performs a scan across the full range of IDs supported. So this > choice is here for consistency, and to ease the pluggability. Gotcha. > > > 2 === > > > > + pgstat_build_snapshot_fixed(kind); > > + ptr = ((char *) &pgStatLocal.snapshot) + info->snapshot_ctl_off; > > + write_chunk(fpout, ptr, info->shared_data_len); > > > > I think that using "shared_data_len" is confusing here (was not the case in the > > context of 9004abf6206e). I mean it is perfectly correct but the wording "shared" > > looks weird to me when being used here. What it really is, is the size of the > > stats. What about renaming shared_data_len with stats_data_len? > > It is the stats data associated to a shared entry. I think that's OK, > but perhaps I'm just used to it as I've been staring at this area for > days. Yeah, what I meant to say is that one could think for example that's the PgStatShared_Archiver size while in fact it's the PgStat_ArchiverStats size. I think it's more confusing when writing the stats. Here we are manipulating "snapshot" and "snapshot" offsets. It was not that confusing when reading as we are manipulating "shmem" and "shared" offsets. As I said, the code is fully correct, that's just the wording here that sounds weird to me in the "snapshot" context. Except the above (which is just a Nit), 0001 LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Mon, Jul 08, 2024 at 07:22:32AM +0000, Bertrand Drouvot wrote: > Except the above (which is just a Nit), 0001 LGTM. > Looking at 0002: It looks pretty straightforward, just one comment: + ptr = ((char *) ctl) + kind_info->shared_ctl_off; + kind_info->init_shmem_cb((void *) ptr); I don't think we need to cast ptr to void when calling init_shmem_cb(). Looking at some examples in the code, it does not look like we cast the argument to void when a function has (void *) as parameter (also there is examples in 0003 where it's not done, see next comments for 0003). So I think removing the cast here would be more consistent. Looking at 0003: It looks pretty straightforward. Also for example, here: + fputc(PGSTAT_FILE_ENTRY_FIXED, fpout); + write_chunk_s(fpout, &kind); write_chunk(fpout, ptr, info->shared_data_len); ptr is not casted to void when calling write_chunk() while its second parameter is a "void *". + ptr = ((char *) shmem) + info->shared_ctl_off + + info->shared_data_off; + + if (!read_chunk(fpin, ptr, Same here for read_chunk(). I think that's perfectly fine and that we should do the same in 0002 when calling init_shmem_cb() for consistency. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Jul 08, 2024 at 07:22:32AM +0000, Bertrand Drouvot wrote: > Yeah, what I meant to say is that one could think for example that's the > PgStatShared_Archiver size while in fact it's the PgStat_ArchiverStats size. > I think it's more confusing when writing the stats. Here we are manipulating > "snapshot" and "snapshot" offsets. It was not that confusing when reading as we > are manipulating "shmem" and "shared" offsets. > > As I said, the code is fully correct, that's just the wording here that sounds > weird to me in the "snapshot" context. After sleeping on it, I can see your point. If we were to do the (shared_data_len -> stats_data_len) switch, could it make sense to rename shared_data_off to stats_data_off to have a better symmetry? This one is the offset of the stats data in a shmem entry, so perhaps shared_data_off is OK, but it feels a bit inconsistent as well. > Except the above (which is just a Nit), 0001 LGTM. Thanks, I've applied 0001 for now to improve the serialization of this code. -- Michael
Attachment
Hi, On Tue, Jul 09, 2024 at 10:45:05AM +0900, Michael Paquier wrote: > On Mon, Jul 08, 2024 at 07:22:32AM +0000, Bertrand Drouvot wrote: > > Yeah, what I meant to say is that one could think for example that's the > > PgStatShared_Archiver size while in fact it's the PgStat_ArchiverStats size. > > I think it's more confusing when writing the stats. Here we are manipulating > > "snapshot" and "snapshot" offsets. It was not that confusing when reading as we > > are manipulating "shmem" and "shared" offsets. > > > > As I said, the code is fully correct, that's just the wording here that sounds > > weird to me in the "snapshot" context. > > After sleeping on it, I can see your point. If we were to do the > (shared_data_len -> stats_data_len) switch, could it make sense to > rename shared_data_off to stats_data_off to have a better symmetry? > This one is the offset of the stats data in a shmem entry, so perhaps > shared_data_off is OK, but it feels a bit inconsistent as well. Agree that if we were to rename one of them then the second one should be renamed to. I gave a second thought on it, and I think that this is the "data" part that lead to the confusion (as too generic), what about? shared_data_len -> shared_stats_len shared_data_off -> shared_stats_off That looks ok to me even in the snapshot context (shared is fine after all because that's where the stats come from). Attached a patch proposal doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Jul 08, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote: > It looks pretty straightforward, just one comment: > > + ptr = ((char *) ctl) + kind_info->shared_ctl_off; > + kind_info->init_shmem_cb((void *) ptr); > > I don't think we need to cast ptr to void when calling init_shmem_cb(). Looking > at some examples in the code, it does not look like we cast the argument to void > when a function has (void *) as parameter (also there is examples in 0003 where > it's not done, see next comments for 0003). Yep. Fine by me. Please find attached a rebased patch set for now, to make the CF bot happy. -- Michael
Attachment
- v5-0001-Add-PgStat_KindInfo.init_shmem_cb.patch
- v5-0002-Add-a-new-F-type-of-entry-in-pgstats-file.patch
- v5-0003-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v5-0004-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v5-0005-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v5-0006-injection_points-Add-statistics-for-custom-points.patch
- v5-0007-injection_points-Add-example-for-fixed-numbered-s.patch
- signature.asc
On Tue, Jul 09, 2024 at 05:23:03AM +0000, Bertrand Drouvot wrote: > I gave a second thought on it, and I think that this is the "data" part that lead > to the confusion (as too generic), what about? > > shared_data_len -> shared_stats_len > shared_data_off -> shared_stats_off > > That looks ok to me even in the snapshot context (shared is fine after all > because that's where the stats come from). I'd tend to prefer the original suggestion because of the snapshot context, actually, as the fixed-numbered stats in a snapshot are a copy of what's in shmem, and that's not shared at all. The rename is not the most important part, still if others have an opinion, feel free. -- Michael
Attachment
Hi, On Tue, Jul 09, 2024 at 03:54:37PM +0900, Michael Paquier wrote: > On Mon, Jul 08, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote: > > It looks pretty straightforward, just one comment: > > > > + ptr = ((char *) ctl) + kind_info->shared_ctl_off; > > + kind_info->init_shmem_cb((void *) ptr); > > > > I don't think we need to cast ptr to void when calling init_shmem_cb(). Looking > > at some examples in the code, it does not look like we cast the argument to void > > when a function has (void *) as parameter (also there is examples in 0003 where > > it's not done, see next comments for 0003). > > Yep. Fine by me. Thanks! > > Please find attached a rebased patch set for now, to make the > CF bot happy. v5-0001 LGTM. As far v5-0002: + goto error; + info = pgstat_get_kind_info(kind); Nit: add an empty line between the two? Except this Nit, v5-0002 LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Jul 10, 2024 at 08:28:56AM +0000, Bertrand Drouvot wrote: > Hi, > > On Tue, Jul 09, 2024 at 03:54:37PM +0900, Michael Paquier wrote: > > On Mon, Jul 08, 2024 at 02:07:58PM +0000, Bertrand Drouvot wrote: > > > It looks pretty straightforward, just one comment: > > > > > > + ptr = ((char *) ctl) + kind_info->shared_ctl_off; > > > + kind_info->init_shmem_cb((void *) ptr); > > > > > > I don't think we need to cast ptr to void when calling init_shmem_cb(). Looking > > > at some examples in the code, it does not look like we cast the argument to void > > > when a function has (void *) as parameter (also there is examples in 0003 where > > > it's not done, see next comments for 0003). > > > > Yep. Fine by me. > > Thanks! > > > > > Please find attached a rebased patch set for now, to make the > > CF bot happy. > > v5-0001 LGTM. > > As far v5-0002: > > + goto error; > + info = pgstat_get_kind_info(kind); > > Nit: add an empty line between the two? > > Except this Nit, v5-0002 LGTM. Oh, and also due to this change in 0002: switch (t) { + case PGSTAT_FILE_ENTRY_FIXED: + { Then this comment: /* * We found an existing statistics file. Read it and put all the hash * table entries into place. */ for (;;) { int t = fgetc(fpin); switch (t) { case PGSTAT_FILE_ENTRY_FIXED: { is not correct anymore (as we're not reading the stats only into the hash table anymore). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Jul 10, 2024 at 09:00:31AM +0000, Bertrand Drouvot wrote: > On Wed, Jul 10, 2024 at 08:28:56AM +0000, Bertrand Drouvot wrote: >> v5-0001 LGTM. Thanks. I've applied this refactoring piece. > /* > * We found an existing statistics file. Read it and put all the hash > * table entries into place. > */ Indeed. Reworded that slightly and applied it as well. So we are down to the remaining parts of the patch, and this is going to need a consensus about a few things because this impacts the developer experience when implementing one's own custom stats: - Are folks OK with the point of fixing the kind IDs in time like RMGRs with a control in the wiki? Or should a more artistic approach be used like what I am mentioning at the bottom of [1]. The patch allows a range of IDs to be used, to make the access to the stats faster even if some area of memory may not be used. - The fixed-numbered custom stats kinds are stored in an array in PgStat_Snapshot and PgStat_ShmemControl, so as we have something consistent with the built-in kinds. This makes the tracking of the validity of the data in the snapshots split into parts of the structure for builtin and custom kinds. Perhaps there are better ideas than that? The built-in fixed-numbered kinds have no redirection. - The handling of both built-in and custom kinds touches some areas of pgstat.c and pgstat_shmem.c, which is the minimal I could come up with. Attached is a rebased patch set with the remaining pieces. [1]: https://www.postgresql.org/message-id/ZoshTO9K7O7Z1wrX%40paquier.xyz -- Michael
Attachment
- v6-0001-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v6-0002-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v6-0003-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v6-0004-injection_points-Add-statistics-for-custom-points.patch
- v6-0005-injection_points-Add-example-for-fixed-numbered-s.patch
- signature.asc
Hi, On Fri, Jul 05, 2024 at 09:35:19AM +0900, Michael Paquier wrote: > On Thu, Jul 04, 2024 at 11:30:17AM +0000, Bertrand Drouvot wrote: > > > > /* > > * Reads in existing statistics file into the shared stats hash. > > > > This comment above pgstat_read_statsfile() is not correct, fixed stats > > are not going to the hash (was there before your patch though). > > Good catch. Let's adjust that separately. Please find attached a patch to do so (attached as .txt to not perturb the cfbot). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jul 11, 2024 at 01:29:08PM +0000, Bertrand Drouvot wrote: > Please find attached a patch to do so (attached as .txt to not perturb the > cfbot). + * Reads in existing statistics file into the shared stats hash (for non fixed + * amount stats) or into the fixed amount stats. Thanks. I have applied a simplified version of that, not mentioning the details of what happens depending on the kinds of stats dealt with. -- Michael
Attachment
> On Thu, Jul 11, 2024 at 04:42:22PM GMT, Michael Paquier wrote: > > So we are down to the remaining parts of the patch, and this is going > to need a consensus about a few things because this impacts the > developer experience when implementing one's own custom stats: > - Are folks OK with the point of fixing the kind IDs in time like > RMGRs with a control in the wiki? Or should a more artistic approach > be used like what I am mentioning at the bottom of [1]. The patch > allows a range of IDs to be used, to make the access to the stats > faster even if some area of memory may not be used. I think it's fine. Although this solution feels a bit uncomfortable, after thinking back and forth I don't see any significantly better option. Worth noting that since the main goal is to maintain uniqueness, fixing the kind IDs could be accomplished in more than one way, with varying amount of control over the list of custom IDs: * One coud say "lets keep it in wiki and let the community organize itself somehow", and it's done. * Another way would be to keep it in wiki, and introduce some maintenance rules, e.g. once per release someone is going to cleanup the list from old unmaintained extensions, correct errors if needed, etc. Not sure if such cleanup would be needed, but it's not impossible to image. * Even more closed option would be to keep the kind IDs in some separate git repository, and let committers add new records on demand, expressed via some request form. As far as I understand the current proposal is about the first option, on one side of the spectrum. > - The fixed-numbered custom stats kinds are stored in an array in > PgStat_Snapshot and PgStat_ShmemControl, so as we have something > consistent with the built-in kinds. This makes the tracking of the > validity of the data in the snapshots split into parts of the > structure for builtin and custom kinds. Perhaps there are better > ideas than that? The built-in fixed-numbered kinds have no > redirection. Are you talking about this pattern? if (pgstat_is_kind_builtin(kind)) ptr = // get something from a snapshot/shmem by offset else ptr = // get something from a custom_data by kind Maybe it would be possible to hide it behind some macros or an inlinable function with the offset and kind as arguments (and one of them will not be used)?
On Fri, Jul 12, 2024 at 03:44:26PM +0200, Dmitry Dolgov wrote: > I think it's fine. Although this solution feels a bit uncomfortable, > after thinking back and forth I don't see any significantly better > option. Worth noting that since the main goal is to maintain uniqueness, > fixing the kind IDs could be accomplished in more than one way, with > varying amount of control over the list of custom IDs: > > * One coud say "lets keep it in wiki and let the community organize > itself somehow", and it's done. > * Another way would be to keep it in wiki, and introduce some > maintenance rules, e.g. once per release someone is going to cleanup > the list from old unmaintained extensions, correct errors if needed, > etc. Not sure if such cleanup would be needed, but it's not impossible > to image. > * Even more closed option would be to keep the kind IDs in some separate > git repository, and let committers add new records on demand, > expressed via some request form. RMGRs have been taking the wiki page approach to control the source of truth, that still sounds like the simplest option to me. I'm OK to be outvoted, but this simplifies the read/write pgstats paths a lot, and these would get more complicated if we add more options because of new entry types (more things like serialized names I cannot think of, etc). Extra point is that this makes future entensibility a bit easier to work on. > As far as I understand the current proposal is about the first option, > on one side of the spectrum. Yes. >> - The fixed-numbered custom stats kinds are stored in an array in >> PgStat_Snapshot and PgStat_ShmemControl, so as we have something >> consistent with the built-in kinds. This makes the tracking of the >> validity of the data in the snapshots split into parts of the >> structure for builtin and custom kinds. Perhaps there are better >> ideas than that? The built-in fixed-numbered kinds have no >> redirection. > > Are you talking about this pattern? > > if (pgstat_is_kind_builtin(kind)) > ptr = // get something from a snapshot/shmem by offset > else > ptr = // get something from a custom_data by kind > > Maybe it would be possible to hide it behind some macros or an inlinable > function with the offset and kind as arguments (and one of them will not > be used)? Kind of. All the code paths calling pgstat_is_kind_builtin() in the patch manipulate different areas of the snapshot and/or the shmem control structures, so a macro makes little sense. Perhaps we should have a few more inline functions like pgstat_get_entry_len() to retrieve the parts of the custom data in the snapshot and shmem control structures for fixed-numbered stats. That would limit what extensions need to know about pgStatLocal.shmem->custom_data[] and pgStatLocal.snapshot.custom_data[], which is easy to use incorrectly. They don't need to know about pgStatLocal at all, either. Thinking over the weekend on this patch, splitting injection_stats.c into two separate files to act as two templates for the variable and fixed-numbered cases would be more friendly to developers, as well. -- Michael
Attachment
On Tue, Jul 16, 2024 at 10:27:25AM +0900, Michael Paquier wrote: > Perhaps we should have a few more inline functions like > pgstat_get_entry_len() to retrieve the parts of the custom data in the > snapshot and shmem control structures for fixed-numbered stats. That > would limit what extensions need to know about > pgStatLocal.shmem->custom_data[] and > pgStatLocal.snapshot.custom_data[], which is easy to use incorrectly. > They don't need to know about pgStatLocal at all, either. > > Thinking over the weekend on this patch, splitting injection_stats.c > into two separate files to act as two templates for the variable and > fixed-numbered cases would be more friendly to developers, as well. I've been toying a bit with these two ideas, and the result is actually neater: - The example for fixed-numbered stats is now in its own new file, called injection_stats_fixed.c. - Stats in the dshash are at the same location, injection_stats.c. - pgstat_internal.h gains two inline routines called pgstat_get_custom_shmem_data and pgstat_get_custom_snapshot_data that hide completely the snapshot structure for extensions when it comes to custom fixed-numbered stats, see the new injection_stats_fixed.c that uses them. -- Michael
Attachment
- v7-0001-Switch-PgStat_Kind-from-enum-to-uint32.patch
- v7-0002-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v7-0003-Add-helper-routines-to-retrieve-custom-data-for-f.patch
- v7-0004-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v7-0005-injection_points-Add-statistics-for-custom-points.patch
- v7-0006-injection_points-Add-example-for-fixed-numbered-s.patch
- signature.asc
> On Thu, Jul 18, 2024 at 02:56:20PM GMT, Michael Paquier wrote: > On Tue, Jul 16, 2024 at 10:27:25AM +0900, Michael Paquier wrote: > > Perhaps we should have a few more inline functions like > > pgstat_get_entry_len() to retrieve the parts of the custom data in the > > snapshot and shmem control structures for fixed-numbered stats. That > > would limit what extensions need to know about > > pgStatLocal.shmem->custom_data[] and > > pgStatLocal.snapshot.custom_data[], which is easy to use incorrectly. > > They don't need to know about pgStatLocal at all, either. > > > > Thinking over the weekend on this patch, splitting injection_stats.c > > into two separate files to act as two templates for the variable and > > fixed-numbered cases would be more friendly to developers, as well. > > I've been toying a bit with these two ideas, and the result is > actually neater: > - The example for fixed-numbered stats is now in its own new file, > called injection_stats_fixed.c. > - Stats in the dshash are at the same location, injection_stats.c. > - pgstat_internal.h gains two inline routines called > pgstat_get_custom_shmem_data and pgstat_get_custom_snapshot_data that > hide completely the snapshot structure for extensions when it comes to > custom fixed-numbered stats, see the new injection_stats_fixed.c that > uses them. Agree, looks good. I've tried to quickly sketch out such a fixed statistic for some another extension, everything was fine and pretty straightforward. One question, why don't you use pgstat_get_custom_shmem_data & pgstat_get_custom_snapshot_data outside of the injection points code? There seems to be a couple of possible places in pgstats itself.
On Sat, Jul 27, 2024 at 03:49:42PM +0200, Dmitry Dolgov wrote: > Agree, looks good. I've tried to quickly sketch out such a fixed > statistic for some another extension, everything was fine and pretty > straightforward. That's my hope. Thanks a lot for the feedback. > One question, why don't you use > pgstat_get_custom_shmem_data & pgstat_get_custom_snapshot_data outside > of the injection points code? There seems to be a couple of possible > places in pgstats itself. Because these two helper routines are only able to fetch the fixed data area in the snapshot and the control shmem structures for the custom kinds, not the in-core ones. We could, but the current code is OK as well. My point was just to ease the pluggability effort. I would like to apply this new infrastructure stuff and move on to the problems related to the scability of pg_stat_statements. So, are there any objections with all that? -- Michael
Attachment
> On Sun, Jul 28, 2024 at 10:20:45PM GMT, Michael Paquier wrote: > I would like to apply this new infrastructure stuff and move on to the > problems related to the scability of pg_stat_statements. So, are > there any objections with all that? So far I've got nothing against :)
On Sun, Jul 28, 2024 at 10:03:56PM +0200, Dmitry Dolgov wrote: > So far I've got nothing against :) I've looked again at the first patch of this series, and applied the first one. Another last-minute edit I have done is to use more consistently PgStat_Kind in the loops for the stats kinds across all the pgstats code. Attached is a rebased set of the rest, with 0001 now introducing the pluggable core part. -- Michael
Attachment
- v3-0001-Introduce-pluggable-APIs-for-Cumulative-Statistic.patch
- v3-0002-Add-helper-routines-to-retrieve-custom-data-for-f.patch
- v3-0003-doc-Add-section-for-Custom-Cumulative-Statistics-.patch
- v3-0004-injection_points-Add-statistics-for-custom-points.patch
- v3-0005-injection_points-Add-example-for-fixed-numbered-s.patch
- signature.asc
On Fri, Aug 02, 2024 at 05:53:31AM +0900, Michael Paquier wrote: > Attached is a rebased set of the rest, with 0001 now introducing the > pluggable core part. So, I have been able to spend a few more days on all that while travelling across three continents, and I have applied the core patch followed by the template parts after more polishing. The core part has been tweaked a bit more in terms of variable and structure names, to bring the builtin and custom stats parts more consistent with each other. There were also a bunch of loops that did not use the PgStat_Kind, but an int with an index on the custom_data arrays. I have uniformized the whole. I am keeping an eye on the buildfarm and it is currently green. My machines don't seem to have run the new tests with injection points yet, the CI on the CF app is not reporting any failure caused by that, and my CI runs have all been stable. -- Michael