Thread: New criteria for autovacuum
Hi hackers,
Sometime ago I investigated slow query performance case of one customer and noticed that index-only scan has made a lot of heap fetches.
-> Index Only Scan using ix_client_objects_vendor_object_id on client_objects client_objects_1 (cost=0.56..2.78 rows=1 width=0) (actual time=0.006..0.006 rows=1 loops=208081) Index Cond: (vendor_object_id = vendor_objects.id) Heap Fetches: 208081 Buffers: shared hit=1092452 read=156034
So almost any index entry requires visibility check and index-only scan is actually normal index-scan.
It certainly have bad impact on performance.
I do not know what happen in this particular case, why pages are not marked as all-visible and why index-only scan plan was chosen by optimizer.
But the problem can be quite easily reproduced. We can just populate table with some data with some other transaction with assigned XID active.
Then explicitly vacuum this tables or wait until autovacuum does it.
At this moment table has no more dead or inserted tuples so autovacuum will not be called for it. But heap pages of this table are still not marked as all-visible.
And will never be marked as all-visible unless table is updated or is explicitly vacuumed.
This is why I think that it may be useful to add more columns to pg_stat_all_tables
and pg_stat_all_indexes
views providing information about heap visibility checks performed by index-only scan. And in addition to number of dead/inserted tuples add number of such visibility checks as criteria for performing autovacuum for the particular table.
Proposed patch is attached.
I am not quit happy with the test - it is intended to check if autovacuum is really triggered by this new criteria. But it depends on autovacuum activation frequency and may take several seconds.
Will be glad to receive any feedback.
Attachment
Hi Konstantin, > But the problem can be quite easily reproduced. We can just populate table with some data with some other transaction withassigned XID active. > Then explicitly vacuum this tables or wait until autovacuum does it. > At this moment table has no more dead or inserted tuples so autovacuum will not be called for it. But heap pages of thistable are still not marked as all-visible. > And will never be marked as all-visible unless table is updated or is explicitly vacuumed. I decided to experiment with the scenario you are describing. For those who likes to have exact steps to reproduce the issue, as I do, here they are: Session 1: ``` CREATE TABLE humidity( ts TIMESTAMP NOT NULL, city TEXT NOT NULL, humidity INT NOT NULL); CREATE INDEX humidity_idx ON humidity (ts, city) INCLUDE (humidity); INSERT INTO humidity (ts, city, humidity) SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random() FROM generate_series('2010-01-01' :: TIMESTAMP, '2020-12-31', '1 hour') AS ts, unnest(array['Moscow', 'Berlin']) AS city; ``` Session 2: ``` BEGIN; INSERT INTO humidity (ts, city, humidity) SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random() FROM generate_series('2022-01-01' :: TIMESTAMP, '2025-12-31', '1 hour') AS ts, unnest(array['Moscow', 'Berlin']) AS city; -- no COMMIT ``` Session 1: ``` VACUUM humidity; ``` Session 2: ``` COMMIT; ``` Session 1: ``` EXPLAIN (ANALYZE, BUFFERS ON) SELECT humidity FROM humidity WHERE ts >= '2022-01-01' AND ts < '2022-05-02' AND city = 'Moscow'; ``` The result is: ``` Index Only Scan using humidity_idx on humidity (cost=0.42..58.75 rows=67 width=4) (actual time=0.060..7.490 rows=2904.00 loops=1) Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone) AND (city = 'M oscow'::text)) Heap Fetches: 2904 Index Searches: 1 Buffers: shared hit=123 Planning: Buffers: shared hit=10 Planning Time: 0.840 ms Execution Time: 7.964 ms ``` ... and it is claimed that autovacuum will never be triggered in order to set hint bits, assuming we never modify the table again. I have mixed feelings about addressing this. Although this behavior is somewhat counterintuitive, if the user has a read-only lookup table he/she can always execute VACUUM manually. In order to relieve him of this unbearable burden we are going to need to introduce some overhead that will affect all the users (not to mention people maintaining the code). This would be convenient for sure but I'm not entirely sure if it's worth it. Thoughts? -- Best regards, Aleksander Alekseev
Hi, > ... and it is claimed that autovacuum will never be triggered in order > to set hint bits, assuming we never modify the table again. Actually I waited a bit and got a better EXPLAIN: ``` Index Only Scan using humidity_idx on humidity (cost=0.42..181.36 rows=1970 width=4) (actual time=0.372..16.869 rows=2904.00 loops=1) Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone) AND (city = 'M oscow'::text)) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=44 Planning Time: 0.520 ms Execution Time: 17.980 ms (7 rows) ``` This happens when CHECKPOINT starts: ``` 2025-04-03 18:36:23.435 MSK [209952] LOG: checkpoint starting: time ``` Interestingly it takes unusually long for my toy database: ``` 2025-04-03 18:40:53.082 MSK [209952] LOG: checkpoint complete: wrote 3522 buffers (5.4%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0 removed, 5 recycled; write=269.463 s, sync=0.029 s, total=269.647 s; sync files=32, longest=0.004 s, average=0.001 s; distance=68489 kB, estimate=68489 kB; lsn=0/F4F3870, redo lsn=0/F167DD0 ``` There is nothing in between these two lines. To my humble knowledge, CHECKOINT shouldn't set hint bits and should take that long. At this point I don't know what's going on. This is `master` branch, b82e7eddb023. -- Best regards, Aleksander Alekseev
On 03/04/2025 6:29 pm, Aleksander Alekseev wrote: > I have mixed feelings about addressing this. Although this behavior is > somewhat counterintuitive, if the user has a read-only lookup table > he/she can always execute VACUUM manually. In order to relieve him of > this unbearable burden we are going to need to introduce some overhead > that will affect all the users (not to mention people maintaining the > code). This would be convenient for sure but I'm not entirely sure if > it's worth it. Overhead seems to be minimal (update of one statistic field in case of heap fetch in index-only scan) and information about visibility checks performed by IOS seems to be useful in any case, even if it is not used by autovacuum. So I am not sure how frequent this scenario is (when index-only scan has to perform extra heap checks or is just not used because of VM examination), but if it can be prevented without much efforts or overhead, then why not to implement it?
> Interestingly it takes unusually long for my toy database: > There is nothing in between these two lines. > > To my humble knowledge, CHECKOINT shouldn't set hint bits and should > take that long. At this point I don't know what's going on. > From what I can tell in your example, you ran the manual vacuum ( session 1) while you had an open transaction (session 2), so vacuum could not remove the dead tuples or update the visibility map. Once you committed session 2, autovacuum came in and did its job after the autovacuum_naptime passed (default 1 minute). Checkpoint does not update the visibility map, only vacuum does. IMO, I don't think we need this patch for vacuum, as simply making sure autovacuum runs more frequently on the table that is accessed via index-only scans often is a way to deal with this already, i.e lowering autovacuum_vacuum_scale_factor. Maybe others have a different opinion? 13 also introduced autovacuum_vacuum_scale_factor to deal with append only tables that only saw their first vacuum for wraparound prevention ( 200 million transactions by default ) and that made index-only scans slow because of an outdated visibility map as well as the wraparound vacuum being more disruptive. As far as extra metrics go for the scenario in which and index only scan must visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a idx_tup_fetch counter which increases anytime an index scan visits the table, i.e. index-only scan with heap fetches or a regular index scan. I think having a counter specifically for heap fetches due to index-only scans could be valuable. -- Sami Imseih Amazon Web Services (AWS)
On Thu, Apr 3, 2025 at 4:37 PM Sami Imseih <samimseih@gmail.com> wrote: > > From what I can tell in your example, you ran the manual vacuum ( session 1) > while you had an open transaction (session 2), so vacuum could not remove > the dead tuples or update the visibility map. Once you committed session 2, > autovacuum came in and did its job after the autovacuum_naptime passed > (default 1 minute). Checkpoint does not update the visibility map, only > vacuum does. > > IMO, I don't think we need this patch for vacuum, as simply making sure > autovacuum runs more frequently on the table that is accessed via > index-only scans often is a way to deal with this already, i.e > lowering autovacuum_vacuum_scale_factor. Maybe others have > a different opinion? Yea, I mean if you have a table that isn't meeting the threshold for being vacuumed by autovacuum based on inserted/modified tuples that you are querying a lot, then you should probably change the autovacuum table storage options for that table. I don't fully understand if the example provided here is in that situation (i.e. autovac not being triggered because they were below the thresholds). Historically, we haven't updated the VM when doing on-access pruning because we wanted vacuum to have to vacuum those pages and freeze them before an aggressive vacuum was required. This release, in 052026c9b903, I added functionality to vacuum to scan more all-visible pages during regular vacuums. I think this enables us to update the VM during on-access pruning. This is something I plan to work on in 19. It seems like it would alleviate situations like this. > As far as extra metrics go for the scenario in which and index only scan must > visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a > idx_tup_fetch > counter which increases anytime an index scan visits the table, i.e. > index-only scan > with heap fetches or a regular index scan. I think having a counter > specifically for > heap fetches due to index-only scans could be valuable. What do you imagine using the heap fetches during index-only scans counter for? - Melanie
> I think this enables us to update the VM > during on-access pruning. This is something I plan to work on in 19. > It seems like it would alleviate situations like this. IMO, index-only scans hitting the heap have always caught users off guard, especially because scan performance fluctuates between good and bad. Reducing this performance volatility is a great improvement. > > As far as extra metrics go for the scenario in which and index only scan must > > visit a table, pg_stat_all_indexes and pg_stat_all_tables do have a > > idx_tup_fetch > > counter which increases anytime an index scan visits the table, i.e. > > index-only scan > > with heap fetches or a regular index scan. I think having a counter > > specifically for > > heap fetches due to index-only scans could be valuable. > > What do you imagine using the heap fetches during index-only scans counter for? A user can only determine that they are using index-only scans through EXPLAIN, which limits visibility. It would be beneficial to provide additional metrics to support autovacuum/vacuum tuning for workloads that rely on index-only scans and require them to be as efficient as possible. Currently, pg_stat_all_tables/indexes displays the number of index scans, but it would also be helpful to show the number of index-only scans specifically, along with "heap fetches". This would help in identifying the "heap fetches" issue and confirming whether vacuum tuning had any impact to reduce "heap fetches". The existing indx_tup_fetch combines heap fetches from normal and index-only scans, so it could be useful but not perfect for this task. -- Sami Imseih Amazon Web Services (AWS)
On 03/04/2025 6:50 pm, Aleksander Alekseev wrote: > Hi, > >> ... and it is claimed that autovacuum will never be triggered in order >> to set hint bits, assuming we never modify the table again. > Actually I waited a bit and got a better EXPLAIN: > > ``` > Index Only Scan using humidity_idx on humidity (cost=0.42..181.36 > rows=1970 width=4) (actual time=0.372..16.869 rows=2904.00 loops=1) > Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time > zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone) > AND (city = 'M > oscow'::text)) > Heap Fetches: 0 > Index Searches: 1 > Buffers: shared hit=44 > Planning Time: 0.520 ms > Execution Time: 17.980 ms > (7 rows) > ``` > > This happens when CHECKPOINT starts: > > ``` > 2025-04-03 18:36:23.435 MSK [209952] LOG: checkpoint starting: time > ``` > > Interestingly it takes unusually long for my toy database: > > ``` > 2025-04-03 18:40:53.082 MSK [209952] LOG: checkpoint complete: wrote > 3522 buffers (5.4%), wrote 1 SLRU buffers; 0 WAL file(s) added, 0 > removed, 5 recycled; write=269.463 s, sync=0.029 s, total=269.647 s; > sync files=32, longest=0.004 s, average=0.001 s; distance=68489 kB, > estimate=68489 kB; lsn=0/F4F3870, redo lsn=0/F167DD0 > ``` > > There is nothing in between these two lines. > > To my humble knowledge, CHECKOINT shouldn't set hint bits and should > take that long. At this point I don't know what's going on. > > This is `master` branch, b82e7eddb023. > Checkpoint is not setting hint bits and not updating VM. it just writes dirty pages to disk. My patch includes simple test reproducing the problem. You can check that VM is never updated in this case and explicit checkpoint doesn't solve the issue. Certainly manual vacuum will help. But user should somehow managed to notice that index-only scan is not used or used and perform larger number of heap fetches and understand that problem is with not updated VM and vacuum is needed to fix it. In your example both sessions are inserting data into the table. Vacuum performed in one session doesn't take in account records created by uncommitted transaction in another session. So I do not think that plan in your case is improved because of checkpoint. Most likely autovacuum was just triggered for this table and updates VM. What is needed to reproduce the problem? 1. Table with populated data 2. Presence of transaction with assigned XID which prevents vacuum from marking pages of this table as all visible 3. Vacuum or autovacuum processed this table (to eliminate dead tuple and reset number of inserted tuples since last vacuum). After this 3 steps autovacuum will never be called for this table (at least until forced vacuum caused by wraparound). And IOS will not be used or be very inefficient fot this table.
On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > What is needed to reproduce the problem? > 1. Table with populated data > 2. Presence of transaction with assigned XID which prevents vacuum from > marking pages of this table as all visible > 3. Vacuum or autovacuum processed this table (to eliminate dead tuple > and reset number of inserted tuples since last vacuum). > > After this 3 steps autovacuum will never be called for this table (at > least until forced vacuum caused by wraparound). > And IOS will not be used or be very inefficient fot this table. ISTM, this is more an issue of ins_since_vacuum being reset to 0 in pstat_report_vacuum() even though those inserted tuples weren't necessarily frozen and their pages not set all-visible. I don't know exactly how we could modify that logic, but insert-triggered vacuums are meant to set pages all-visible and freeze tuples, and if they don't do that, it doesn't quite make sense to zero out the counter that could trigger another one. That being said, long-running transactions are a problem for autovacuum in general. Even if you track this stat you are proposing about heap fetches by index only scans, you won't know if the long running transaction is over and thus if it makes sense to try and trigger an autovacuum for that table again anyway. - Melanie
On Fri, Apr 4, 2025 at 12:11 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > That being said, long-running transactions are a problem for > autovacuum in general. Even if you track this stat you are proposing > about heap fetches by index only scans, you won't know if the long > running transaction is over and thus if it makes sense to try and > trigger an autovacuum for that table again anyway. This. It would be really useful to have some kind of a system for figuring out when -- in terms of XIDs -- we ought to vacuum which table. I think that's a hard problem, but it would help a lot of people. I do not think the approach in the proposed patch is correct at all. The proposed new check would have exactly the same problem as the existing one -- this could easily trigger vacuuming at a time when the relevant tuples can't yet be made all-visible, in which case we'd just do a lot of VACUUM work for nothing. That's already a problem with autovacuum in some scenarios, and I bet this would be way worse. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 4, 2025 at 1:53 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:What is needed to reproduce the problem? 1. Table with populated data 2. Presence of transaction with assigned XID which prevents vacuum from marking pages of this table as all visible 3. Vacuum or autovacuum processed this table (to eliminate dead tuple and reset number of inserted tuples since last vacuum). After this 3 steps autovacuum will never be called for this table (at least until forced vacuum caused by wraparound). And IOS will not be used or be very inefficient fot this table.ISTM, this is more an issue of ins_since_vacuum being reset to 0 in pstat_report_vacuum() even though those inserted tuples weren't necessarily frozen and their pages not set all-visible. I don't know exactly how we could modify that logic, but insert-triggered vacuums are meant to set pages all-visible and freeze tuples, and if they don't do that, it doesn't quite make sense to zero out the counter that could trigger another one. That being said, long-running transactions are a problem for autovacuum in general. Even if you track this stat you are proposing about heap fetches by index only scans, you won't know if the long running transaction is over and thus if it makes sense to try and trigger an autovacuum for that table again anyway. - Melanie
From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible.
So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible.
And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.
So I still think that maintaining count of heap visibility check is the best alternative. It quite easy to implement, adds almost no overhead and this information indicates efficiency of index-only scan. So it seems to be useful even if not used by autovacuum.
Yes, long-living transactions and vacuum are "antagonists". If there is long-living transaction, then forcing autovacuum because of number of visibility checks criteria can also (as in case of not reseting `ins_since_vacuum` counter) cause degrade of performance because of too frequent and useless autovacuum runs for the table. But there is big difference: using `checks_since_vacuum` criteria we trigger autovacuum next time only when this counter exceeds threshold. Which should not happen fast because this counter is reset after each vacuum. Unlike `ins_since_vacuum` counter which you suggested not to reset until pages are marked as all-visible by vacuum. In the last case autovacuum will be invoked for the table each `autovacum_naptime` seconds.
On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to markpage as all-visible. > So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. Ifnewly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But Ihave completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently insertedtuples and on each transaction commit determine which one of them become all-visible. > > And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement,but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuumiteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-livingtransactions. Right, you definitely can't just never reset it to 0. As it is currently defined, the behavior is correct, that is how many tuples were inserted since the last vacuum. As for your proposed patch, I agree with Robert that triggering vacuums using the stat you proposed just adds to the problem we know we already have with wasted vacuuming work due to long-running transactions. A more targeted solution to your specific problem would be to update the visibility map on access. Then, the first time you have to fetch that heap page, you could mark it all-visible (assuming the long running transaction has ended) and then the next index-only scan wouldn't have to do the same heap fetch. It doesn't add any overhead in the case that the long running transaction has not ended, unlike trying to trigger another autovacuum. - Melanie
> From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to markpage as all-visible. > So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. > If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. cumulative stats are not updated for an in-flight transaction, so effectively, these rows are not accounted for by vacuum. I did just realize however that it is strange that a rolledback session will accumulate n_tup_ins and n_ins_since_vacuum # session 1 test=# create table t (id int); CREATE TABLE test=# begin; BEGIN test=*# insert into t values (1); INSERT 0 1 test=*# insert into t values (1); INSERT 0 1 test=*# insert into t values (1); INSERT 0 1 # session 2 test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname = 't'; n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | n_ins_since_vacuum -----------+-----------+-----------+------------+------------+-------------------- 0 | 0 | 0 | 0 | 0 | 0 (1 row) # session 1 ROLLBACK # session 2 test=# select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname = 't'; n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | n_ins_since_vacuum -----------+-----------+-----------+------------+------------+-------------------- 3 | 0 | 0 | 0 | 3 | 3 (1 row) ROLLBACKs should not be the norm, but this looks like a bug to me as it may trigger vacuum based on insert threshold more often. Thoughts? > So I still think that maintaining count of heap visibility check is the best alternative. > It quite easy to implement, adds almost no overhead and this information indicates efficiency > of index-only scan. So it seems to be useful even if not used by autovacuum. I agree. It could aid in a user that wants to schedule some manual vacuum for tables that need very optimal index-only scans, or to per-table tune autovac for those types of tables. I actually think we should add a few columns as mentioned here [0] >> That being said, long-running transactions are a problem for >> autovacuum in general. Even if you track this stat you are proposing >> about heap fetches by index only scans, you won't know if the long >> running transaction is over and thus if it makes sense to try and >> trigger an autovacuum for that table again anyway. > This. It would be really useful to have some kind of a system for > figuring out when -- in terms of XIDs -- we ought to vacuum which > table. I think that's a hard problem, but it would help a lot of > people. hmm, isn't that what anti-wraparound autovac does? or, I may have missed the point here completely [0] https://www.postgresql.org/message-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA%40mail.gmail.com -- Sami Imseih Amazon Web Services (AWS)
On Fri, Apr 4, 2025 at 3:27 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to mark page as all-visible. So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples. If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum. But I have completely no idea of how to efficiently maintain such counter: we should keep track of xids of all recently inserted tuples and on each transaction commit determine which one of them become all-visible. And your suggestion just to not reset `ins_since_vacuum` until all of them becomes all-visible may be easier to implement, but under permanent workload it can lead to situation when `ins_since_vacuum` is never reset and at each vacuum iteration cause vacuuming of the table. Which may cause significant degrade of performance. Even without long-living transactions.Right, you definitely can't just never reset it to 0. As it is currently defined, the behavior is correct, that is how many tuples were inserted since the last vacuum. As for your proposed patch, I agree with Robert that triggering vacuums using the stat you proposed just adds to the problem we know we already have with wasted vacuuming work due to long-running transactions.
If we already have a problem with wasted vacuuming then IMHO it should be addressed (for example by keeping last_autovacuum_xid_horizon). I do no think that the proposed patch adds something new to the problem. Yes it will trigger autovacuum after each N visibility checks. And if there is still some long running transaction, then this vacuum is useless. But the same is true for criteria based on number of dead tuples.
And from my point of view additional autovacuum runs are less critical than missed autovacuum runs (when table contains garbage or not updarted VM but VM is not launched).
A more targeted solution to your specific problem would be to update the visibility map on access. Then, the first time you have to fetch that heap page, you could mark it all-visible (assuming the long running transaction has ended) and then the next index-only scan wouldn't have to do the same heap fetch. It doesn't add any overhead in the case that the long running transaction has not ended, unlike trying to trigger another autovacuum.
I really considered this alternative when thinking about the solution of the problem. It is more consistent with hint bit approach.
I declined it in favor of this solution because of the following reasons:
1. Index-only scan holds read-only lock on heap page. In order to update it, we need to upgrade this lock to exclusive.
2. We need to check visibility for all elements on the page (actually do something like `heap_page_is_all_visible`) but if there is large number elements at the page it can be quite expensive. And I afraid that it can slowdown speed of index-only scan. Yes, only in "slow case" - when it has to access heap to perform visibility check. But still it may be not acceptable. Also it is not clear how to mark page as already checked. Otherwise we will have to repeat this check for all tids referring this page.
3. `heap_page_is_all_visible` is local to lazyvaccum.c. So to use it in index-only scan we either have to make it global, either cut&paste it's code. Just removing "static" is not possible, because it is using local `LVRelState`, so some refactoring is needed in any case.
4. We need to wal-log VM page and heap pages in case of setting all-visible bit. It is quite expensive operation. Doing it inside index-only scan can significantly increase time of select. Certainly Postgres is not a real-time DBMS. But still it is better to provide some predictable query execution time. This is why I think that it is better to do such workt in background (in vaccum).