Re: Vacuum statistics - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: Vacuum statistics |
Date | |
Msg-id | 79490c8c-2c35-464d-bb3e-3d2ea5aca9ab@postgrespro.ru Whole thread Raw |
In response to | Re: Vacuum statistics (Alena Rybakina <a.rybakina@postgrespro.ru>) |
List | pgsql-hackers |
To be honest, I haven’t provided extensions for the PostgreSQL [0] to hackers yet, nor have I encountered this situation in general. Just in case, I created an open repository on GitHub with the code and added a description in the README. [0] https://github.com/Alena0704/vacuum_statistics# On 04.09.2025 18:49, Alena Rybakina wrote: > Hi, all! > > On 02.06.2025 19:50, Alena Rybakina wrote: >> >> On 02.06.2025 19:25, Alexander Korotkov wrote: >>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina >>> <a.rybakina@postgrespro.ru> wrote: >>>> On 12.05.2025 08:30, Amit Kapila wrote: >>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina >>>>> <a.rybakina@postgrespro.ru> wrote: >>>>>> I did a rebase and finished the part with storing statistics >>>>>> separately from the relation statistics - now it is possible to >>>>>> disable the collection of statistics for relationsh using gucs and >>>>>> this allows us to solve the problem with the memory consumed. >>>>>> >>>>> I think this patch is trying to collect data similar to what we do >>>>> for >>>>> pg_stat_statements for SQL statements. So, can't we follow a similar >>>>> idea such that these additional statistics will be collected once >>>>> some >>>>> external module like pg_stat_statements is enabled? That module >>>>> should >>>>> be responsible for accumulating and resetting the data, so we won't >>>>> have this memory consumption issue. >>>> The idea is good, it will require one hook for the >>>> pgstat_report_vacuum >>>> function, the extvac_stats_start and extvac_stats_end functions can be >>>> run if the extension is loaded, so as not to add more hooks. >>> +1 >>> Nice idea of a hook. Given the volume of the patch, it might be a >>> good idea to keep this as an extension. >> Okay, I'll realize it and apply the patch) >>> >>>> But I see a problem here with tracking deleted objects for which >>>> statistics are no longer needed. There are two solutions to this and I >>>> don't like both of them, to be honest. >>>> The first way is to add a background process that will go through the >>>> table with saved statistics and check whether the relation or the >>>> database are relevant now or not and if not, then >>>> delete the vacuum statistics information for it. This may be >>>> resource-intensive. The second way is to add hooks for deleting the >>>> database and relationships (functions dropdb, index_drop, >>>> heap_drop_with_catalog). >>> Can we workaround this with object_access_hook? >> >> I think this could fix the problem. For the OAT-DROP access type, we >> can call a function to reset the vacuum statistics for relations that >> are about to be dropped. >> >> At the moment, I don’t see any limitations to using this approach. >> > I’ve prepared the first working version of the extension. > > I haven’t yet implemented writing the statistics to a file and > reloading them into a hash table and shared memory at instance > startup, and I also haven’t implemented a proper output for > database-level statistics yet. > > I structured the extension as follows: statistics are stored in a hash > table keyed by a composite key - database OID, relation OID, and > object type (index, table, or database). When VACUUM or a worker > processes a table or index, an exclusive lock is taken to update the > corresponding record; a shared lock is taken when reading the > statistics. For database-level output, I plan to compute the totals by > summing table and index statistics on demand. > > To optimize that, I plan to keep entries in the hash table ordered by > database OID. When accessing the first element by the partial key > (database OID), I’ll scan forward and aggregate until the partitial > database key changes. > > Right now this requires adding the extension to > `shared_preload_libraries`. I haven’t found a way to avoid that > because of shared-memory setup, and I’m not sure it’s even possible. > > I’m also unsure whether it’s better to store the statistics in the > cumulative statistics system (as done here) or entirely inside the > extension. Note that the code added to the core to support the > extension executes regardless of whether the extension is enabled.
pgsql-hackers by date: