Thread: vacuum -vs reltuples on insert only index
Hello, I've found a behavior change with pg_class.reltuples on btree index. With only insert activity on a table, when an index is processed, its related reltuples is set to 0. Here is a demo script: -- force index cleanup set vacuum_cleanup_index_scale_factor to 0; drop table if exists t; create table t as select i from generate_series(1, 100) i; create index t_i on t(i); -- after index creation its reltuples is correct select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- vacuum set index reltuples to 0 vacuum t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 -- analyze set it back to correct value analyze t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- insert + vacuum reset it again to 0 insert into t values(101); vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 -- delete + vacuum set it back to correct value delete from t where i=10; vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 100 -- and back to 0 again with insert+vacuum insert into t values(102); vacuum (verbose off, analyze on, index_cleanup on) t; select reltuples from pg_class where relname = 't_i' -- result: reltuples | 0 Before 0d861bbb70, btvacuumpage was adding to relation stats the number of leaving lines in the block using: stats->num_index_tuples += maxoff - minoff + 1; After 0d861bbb70, it is set using new variable nhtidslive: stats->num_index_tuples += nhtidslive However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback) is set, which seems not to be the case on select-only workload. A naive fix might be to use "maxoff - minoff + 1" when callback==NULL. Thoughts? Regards,
On Fri, Oct 23, 2020 at 8:51 AM Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote: > Before 0d861bbb70, btvacuumpage was adding to relation stats the number of > leaving lines in the block using: > > stats->num_index_tuples += maxoff - minoff + 1; > > After 0d861bbb70, it is set using new variable nhtidslive: > > stats->num_index_tuples += nhtidslive > > However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback) > is set, which seems not to be the case on select-only workload. I agree that that's a bug. > A naive fix might be to use "maxoff - minoff + 1" when callback==NULL. The problem with that is that we really should use nhtidslive (or something like it), and we're not really willing to do the work to get that information when callback==NULL. We could use "maxoff - minoff + 1" in the way you suggest, but that will be only ~30% of what nhtidslive would be in pages where deduplication is maximally effective (which is not at all uncommon -- you only need about 10 TIDs per distinct value for the space savings to saturate like this). GIN does this for cleanup (but not for delete, which has a real count available): /* * XXX we always report the heap tuple count as the number of index * entries. This is bogus if the index is partial, but it's real hard to * tell how many distinct heap entries are referenced by a GIN index. */ stats->num_index_tuples = Max(info->num_heap_tuples, 0); stats->estimated_count = info->estimated_count; I suspect that we need to move in this direction within nbtree. I'm a bit concerned about the partial index problem, though. I suppose maybe we could do it the old way (which won't account for posting list tuples) during cleanup as you suggest, but only use the final figure when it turns out to have been a partial indexes. For other indexes we could do what GIN does here. Anybody else have thoughts on this? -- Peter Geoghegan
On Fri, Oct 23, 2020 at 11:10 AM Peter Geoghegan <pg@bowt.ie> wrote: > I suspect that we need to move in this direction within nbtree. I'm a > bit concerned about the partial index problem, though. I suppose maybe > we could do it the old way (which won't account for posting list > tuples) during cleanup as you suggest, but only use the final figure > when it turns out to have been a partial indexes. For other indexes we > could do what GIN does here. Actually, it seems better to always count num_index_tuples the old way during cleanup-only index VACUUMs, despite the inaccuracy that that creates with posting list tuples. The inaccuracy is at least a fixed and relatively small inaccuracy, since nbtree doesn't have posting list compression or a pending list mechanism (unlike GIN). This approach avoids calculating a num_index_tuples value that is less than the number of distinct values in the index, which seems important. Taking a more sophisticated approach seems unnecessary, especially given that we need something that can be backpatched to Postgres 13. Attached is my proposed fix, which takes this approach. I will commit this on Wednesday or Thursday, barring any objections. Thanks -- Peter Geoghegan
Attachment
On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is my proposed fix, which takes this approach. I will commit > this on Wednesday or Thursday, barring any objections. Just to be clear: I am not proposing that we set 'IndexBulkDeleteResult.estimated_count = false' here, even though there is a certain sense in which we now accept an unreliable figure in Postgres 13. This is not what GIN does. That approach doesn't seem appropriate for nbtree + deduplication, which is much closer to nbtree in Postgres 12 than to GIN. I believe that the final num_index_tuples value (generated during cleanup-only nbtree VACUUM) is in general sufficiently reliable to not be treated as an estimate by vacuumlazy.c -- the pg_class entry for the index should still be updated in update_index_statistics(). In other words, I think that the remaining posting-list related inaccuracies are comparable to the existing inaccuracies caused by concurrent page splits during nbtree vacuuming (I describe the problem right next to an old comment about that issue, in fact). What we have in both cases is an artifact of how the data is physically represented and the difficulty it causes us during vacuuming, in certain cases. There are known error bars. That's why we shouldn't treat num_index_tuples as merely an estimate. -- Peter Geoghegan
Just one more postscript... On Mon, Nov 2, 2020 at 12:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > Just to be clear: I am not proposing that we set > 'IndexBulkDeleteResult.estimated_count = false' here I meant 'IndexBulkDeleteResult.estimated_count = true'. So my patch doesn't touch that field at all. > In other words, I think that the remaining posting-list related > inaccuracies are comparable to the existing inaccuracies caused by > concurrent page splits during nbtree vacuuming (I describe the problem > right next to an old comment about that issue, in fact). I meant the inaccuracies that remain *once my patch is committed*. (Clearly the current behavior of setting pg_class.reltuples to zero during cleanup-only vacuuming is a bug.) -- Peter Geoghegan
On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > Actually, it seems better to always count num_index_tuples the old way > during cleanup-only index VACUUMs, despite the inaccuracy that that > creates with posting list tuples. Pushed a fix for this just now. Thanks for the report! -- Peter Geoghegan
On Wed, 4 Nov 2020 18:44:03 -0800 Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > > Actually, it seems better to always count num_index_tuples the old way > > during cleanup-only index VACUUMs, despite the inaccuracy that that > > creates with posting list tuples. > > Pushed a fix for this just now. > > Thanks for the report! Sorry I couldn't give some more feedback on your thoughts on time... Thank you for your investigation and fix! Regards,