Re: Vacuum statistics - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Vacuum statistics
Date
Msg-id 026722b5-6e61-44d8-816d-482ad24423a9@postgrespro.ru
Whole thread Raw
In response to Re: Vacuum statistics  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Vacuum statistics
List pgsql-hackers
Hi, all!

On 02.06.2025 19:50, Alena Rybakina wrote:
>
> On 02.06.2025 19:25, Alexander Korotkov wrote:
>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
>> <a.rybakina@postgrespro.ru> wrote:
>>> On 12.05.2025 08:30, Amit Kapila wrote:
>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina 
>>>> <a.rybakina@postgrespro.ru> wrote:
>>>>> I did a rebase and finished the part with storing statistics 
>>>>> separately from the relation statistics - now it is possible to 
>>>>> disable the collection of statistics for relationsh using gucs and
>>>>> this allows us to solve the problem with the memory consumed.
>>>>>
>>>> I think this patch is trying to collect data similar to what we do for
>>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>>> idea such that these additional statistics will be collected once some
>>>> external module like pg_stat_statements is enabled? That module should
>>>> be responsible for accumulating and resetting the data, so we won't
>>>> have this memory consumption issue.
>>> The idea is good, it will require one hook for the pgstat_report_vacuum
>>> function, the extvac_stats_start and extvac_stats_end functions can be
>>> run if the extension is loaded, so as not to add more hooks.
>> +1
>> Nice idea of a hook.  Given the volume of the patch, it might be a
>> good idea to keep this as an extension.
> Okay, I'll realize it and apply the patch)
>>
>>> But I see a problem here with tracking deleted objects for which
>>> statistics are no longer needed. There are two solutions to this and I
>>> don't like both of them, to be honest.
>>> The first way is to add a background process that will go through the
>>> table with saved statistics and check whether the relation or the
>>> database are relevant now or not and if not, then
>>> delete the vacuum statistics information for it. This may be
>>> resource-intensive. The second way is to add hooks for deleting the
>>> database and relationships (functions dropdb, index_drop,
>>> heap_drop_with_catalog).
>> Can we workaround this with object_access_hook?
>
> I think this could fix the problem. For the OAT-DROP access type, we 
> can call a function to reset the vacuum statistics for relations that 
> are about to be dropped.
>
> At the moment, I don’t see any limitations to using this approach.
>
I’ve prepared the first working version of the extension.

I haven’t yet implemented writing the statistics to a file and reloading 
them into a hash table and shared memory at instance startup, and I also 
haven’t implemented a proper output for database-level statistics yet.

I structured the extension as follows: statistics are stored in a hash 
table keyed by a composite key - database OID, relation OID, and object 
type (index, table, or database). When VACUUM or a worker processes a 
table or index, an exclusive lock is taken to update the corresponding 
record; a shared lock is taken when reading the statistics. For 
database-level output, I plan to compute the totals by summing table and 
index statistics on demand.

To optimize that, I plan to keep entries in the hash table ordered by 
database OID. When accessing the first element by the partial key 
(database OID), I’ll scan forward and aggregate until the partitial 
database key changes.

Right now this requires adding the extension to 
`shared_preload_libraries`. I haven’t found a way to avoid that because 
of shared-memory setup, and I’m not sure it’s even possible.

I’m also unsure whether it’s better to store the statistics in the 
cumulative statistics system (as done here) or entirely inside the 
extension. Note that the code added to the core to support the extension 
executes regardless of whether the extension is enabled.

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Improve LWLock tranche name visibility across backends
Next
From: Tom Lane
Date:
Subject: Re: [PG19-3 PATCH] Don't ignore passfile