Re: Vacuum statistics - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Vacuum statistics
Date
Msg-id 79490c8c-2c35-464d-bb3e-3d2ea5aca9ab@postgrespro.ru
Whole thread Raw
In response to Re: Vacuum statistics  (Alena Rybakina <a.rybakina@postgrespro.ru>)
List pgsql-hackers
To be honest, I haven’t provided extensions for the PostgreSQL [0] to
hackers yet, nor have I encountered this situation in general. Just in
case, I created an open repository on GitHub with the code and added a
description in the README.

[0] https://github.com/Alena0704/vacuum_statistics#

On 04.09.2025 18:49, Alena Rybakina wrote:
> Hi, all!
>
> On 02.06.2025 19:50, Alena Rybakina wrote:
>>
>> On 02.06.2025 19:25, Alexander Korotkov wrote:
>>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
>>> <a.rybakina@postgrespro.ru> wrote:
>>>> On 12.05.2025 08:30, Amit Kapila wrote:
>>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina
>>>>> <a.rybakina@postgrespro.ru> wrote:
>>>>>> I did a rebase and finished the part with storing statistics
>>>>>> separately from the relation statistics - now it is possible to
>>>>>> disable the collection of statistics for relationsh using gucs and
>>>>>> this allows us to solve the problem with the memory consumed.
>>>>>>
>>>>> I think this patch is trying to collect data similar to what we do
>>>>> for
>>>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>>>> idea such that these additional statistics will be collected once
>>>>> some
>>>>> external module like pg_stat_statements is enabled? That module
>>>>> should
>>>>> be responsible for accumulating and resetting the data, so we won't
>>>>> have this memory consumption issue.
>>>> The idea is good, it will require one hook for the
>>>> pgstat_report_vacuum
>>>> function, the extvac_stats_start and extvac_stats_end functions can be
>>>> run if the extension is loaded, so as not to add more hooks.
>>> +1
>>> Nice idea of a hook.  Given the volume of the patch, it might be a
>>> good idea to keep this as an extension.
>> Okay, I'll realize it and apply the patch)
>>>
>>>> But I see a problem here with tracking deleted objects for which
>>>> statistics are no longer needed. There are two solutions to this and I
>>>> don't like both of them, to be honest.
>>>> The first way is to add a background process that will go through the
>>>> table with saved statistics and check whether the relation or the
>>>> database are relevant now or not and if not, then
>>>> delete the vacuum statistics information for it. This may be
>>>> resource-intensive. The second way is to add hooks for deleting the
>>>> database and relationships (functions dropdb, index_drop,
>>>> heap_drop_with_catalog).
>>> Can we workaround this with object_access_hook?
>>
>> I think this could fix the problem. For the OAT-DROP access type, we
>> can call a function to reset the vacuum statistics for relations that
>> are about to be dropped.
>>
>> At the moment, I don’t see any limitations to using this approach.
>>
> I’ve prepared the first working version of the extension.
>
> I haven’t yet implemented writing the statistics to a file and
> reloading them into a hash table and shared memory at instance
> startup, and I also haven’t implemented a proper output for
> database-level statistics yet.
>
> I structured the extension as follows: statistics are stored in a hash
> table keyed by a composite key - database OID, relation OID, and
> object type (index, table, or database). When VACUUM or a worker
> processes a table or index, an exclusive lock is taken to update the
> corresponding record; a shared lock is taken when reading the
> statistics. For database-level output, I plan to compute the totals by
> summing table and index statistics on demand.
>
> To optimize that, I plan to keep entries in the hash table ordered by
> database OID. When accessing the first element by the partial key
> (database OID), I’ll scan forward and aggregate until the partitial
> database key changes.
>
> Right now this requires adding the extension to
> `shared_preload_libraries`. I haven’t found a way to avoid that
> because of shared-memory setup, and I’m not sure it’s even possible.
>
> I’m also unsure whether it’s better to store the statistics in the
> cumulative statistics system (as done here) or entirely inside the
> extension. Note that the code added to the core to support the
> extension executes regardless of whether the extension is enabled.



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
Next
From: Tom Lane
Date:
Subject: Re: Solaris compiler status