On Mon, Apr 18, 2022 at 12:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If the planner looks at index reltuples at all, it's doing so
> for cost estimation purposes, where the count including dead
> entries is probably the right thing to use.
Then why does heapam_index_build_range_scan do it the other way around?
I think that it probably doesn't matter that much in practice. The
inconsistency should be noted in update_relstats_all_indexes, though.
> If you want to make this cleaner, maybe there's a case for
> splitting reltuples into two columns. But then index AMs
> would be on the hook to determine how many of their entries
> are live, which is not really an index's concern.
The main concern behind this is that we're using
vacrel->new_rel_tuples for the IndexVacuumInfo.num_heap_tuples value
in amvacuumcleanup (but not in ambulkdelete), which is calculated
towards the end of lazy_scan_heap, like so:
/*
* Also compute the total number of surviving heap entries. In the
* (unlikely) scenario that new_live_tuples is -1, take it as zero.
*/
vacrel->new_rel_tuples =
Max(vacrel->new_live_tuples, 0) + vacrel->recently_dead_tuples +
vacrel->missed_dead_tuples;
I think that this doesn't really belong here; new_rel_tuples should
only be used for VACUUM VERBOSE/server log output, once we return to
heap_vacuum_rel from lazy_scan_heap. We should use
vacrel->new_live_tuples as our IndexVacuumInfo.num_heap_tuples value
in the amvacuumcleanup path (instead of new_rel_tuples). That way the
rule about IndexVacuumInfo.num_heap_tuples is simple: it's always
taken from pg_class.reltuples (for the heap rel). Either the existing
value, or the new value.
--
Peter Geoghegan