вт, 2 февр. 2021 г. в 05:27, Peter Geoghegan <pg@bowt.ie>:
And now here is the second thing I thought of, which is much better:
Sometimes 1% of the dead tuples in a heap relation will be spread across 90%+ of the pages. With other workloads 1% of dead tuples might be highly concentrated, and appear in no more than 1% of all heap pages. Obviously the distinction between these two cases/workloads matters a lot. And so the triggering criteria must be quantitative *and* qualitative. It should not be based on counting dead tuples, since that alone won't differentiate these two extreme cases - both of which are probably quite common (in the real world extremes are actually the normal and common case IME).
I like the idea of basing it on counting *heap blocks*, not dead tuples. We can count heap blocks that have *at least* one dead tuple (of course it doesn't matter how they're dead, whether it was this VACUUM operation or some earlier opportunistic pruning). Note in particular that it should not matter if it's a heap block that has only one LP_DEAD line pointer or a heap page that is near the MaxHeapTuplesPerPage limit for the page -- we count either type of page towards the heap-page based limit used to decide if index vacuuming goes ahead for all indexes during VACUUM.
I really like this idea!
It resembles the approach used in bottom-up index deletion, block-based
accounting provides a better estimate for the usefulness of the operation.
I suppose that 1% threshold should be configurable as a cluster-wide GUC