Re: New criteria for autovacuum - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: New criteria for autovacuum |
Date | |
Msg-id | 39b8c638-0762-4d43-ba9c-68accc902f9f@garret.ru Whole thread Raw |
In response to | Re: New criteria for autovacuum (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: New criteria for autovacuum
Re: New criteria for autovacuum |
List | pgsql-hackers |
On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:What is needed to reproduce the problem? 1. Table with populated data 2. Presence of transaction with assigned XID which prevents vacuum from marking pages of this table as all visible 3. Vacuum or autovacuum processed this table (to eliminate dead tuple and reset number of inserted tuples since last vacuum). After this 3 steps autovacuum will never be called for this table (at least until forced vacuum caused by wraparound). And IOS will not be used or be very inefficient fot this table.ISTM, this is more an issue of ins_since_vacuum being reset to 0 in pstat_report_vacuum() even though those inserted tuples weren't necessarily frozen and their pages not set all-visible. I don't know exactly how we could modify that logic, but insert-triggered vacuums are meant to set pages all-visible and freeze tuples, and if they don't do that, it doesn't quite make sense to zero out the counter that could trigger another one. That being said, long-running transactions are a problem for autovacuum in general. Even if you track this stat you are proposing about heap fetches by index only scans, you won't know if the long running transaction is over and thus if it makes sense to try and trigger an autovacuum for that table again anyway. - Melanie
From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.
And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.
So I still think that maintaining count of heap visibility check is the best alternative. It quite easy to implement, adds almost no overhead and this information indicates efficiency of index-only scan. So it seems to be useful even if not used by autovacuum.
Yes, long-living transactions and vacuum are "antagonists". If there is long-living transaction, then forcing autovacuum because of number of visibility checks criteria can also (as in case of not reseting `ins_since_vacuum` counter) cause degrade of performance because of too frequent and useless autovacuum runs for the table. But there is big difference: using `checks_since_vacuum` criteria we trigger autovacuum next time only when this counter exceeds threshold. Which should not happen fast because this counter is reset after each vacuum. Unlike `ins_since_vacuum` counter which you suggested not to reset until pages are marked as all-visible by vacuum. In the last case autovacuum will be invoked for the table each `autovacum_naptime` seconds.
pgsql-hackers by date: