Re: 64-bit XIDs in deleted nbtree pages - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: 64-bit XIDs in deleted nbtree pages |
Date | |
Msg-id | CAD21AoAtUxOLcmrxW_oU1_7+5s56kypQu3Wz5NOeGn=yC7XqGg@mail.gmail.com Whole thread Raw |
In response to | Re: 64-bit XIDs in deleted nbtree pages (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: 64-bit XIDs in deleted nbtree pages
|
List | pgsql-hackers |
On Tue, Feb 16, 2021 at 3:52 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Feb 15, 2021 at 7:26 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Actually, there is one more reason why I bring up idea 1 now: I want > > to hear your thoughts on the index AM API questions now, which idea 1 > > touches on. Ideally all of the details around the index AM VACUUM APIs > > (i.e. when and where the extra work happens -- btvacuumcleanup() vs > > btbulkdelete()) won't need to change much in the future. I worry about > > getting this index AM API stuff right, at least a little. > > Speaking of problems like this, I think I spotted an old one: we call > _bt_update_meta_cleanup_info() in either btbulkdelete() or > btvacuumcleanup(). I think that we should always call it in > btvacuumcleanup(), though -- even in cases where there is no call to > btvacuumscan() inside btvacuumcleanup() (because btvacuumscan() > happened earlier instead, during the btbulkdelete() call). > > This makes the value of IndexVacuumInfo.num_heap_tuples (which is what > we store in the metapage) much more accurate -- right now it's always > pg_class.reltuples from *before* the VACUUM started. And so the > btm_last_cleanup_num_heap_tuples value in a nbtree metapage is often > kind of inaccurate. > > This "estimate during ambulkdelete" issue is documented here (kind of): > > /* > * Struct for input arguments passed to ambulkdelete and amvacuumcleanup > * > * num_heap_tuples is accurate only when estimated_count is false; > * otherwise it's just an estimate (currently, the estimate is the > * prior value of the relation's pg_class.reltuples field, so it could > * even be -1). It will always just be an estimate during ambulkdelete. > */ > typedef struct IndexVacuumInfo > { > ... > } > > The name of the metapage field is already > btm_last_cleanup_num_heap_tuples, which already suggests the approach > that I propose now. So why don't we do it like that already? > > (Thinks some more...) > > I wonder: did this detail change at the last minute during the > development of the feature (just before commit 857f9c36) back in early > 2018? That change have made it easier to deal with > oldestBtpoXact/btm_oldest_btpo_xact, which IIRC was a late addition to > the patch -- so maybe it's truly an accident that the code doesn't > work the way that I suggest it should already. (It's annoying to make > state from btbulkdelete() appear in btvacuumcleanup(), unless it's > from IndexVacuumInfo or something -- I can imagine this changing at > the last minute, just for that reason.) > > Do you think that this needs to be treated as a bug in the > backbranches, Masahiko? I'm not sure... Ugh, yes, I think it's a bug. When developing this feature, in an old version patch, we used to set invalid values to both btm_oldest_btpo_xact and btm_last_cleanup_num_heap_tuples in btbulkdelete() to reset these values. But we decided to set valid values to both even in btbulkdelete(). I believe that decision was correct in terms of btm_oldest_btpo_xact because with the old version patch we will do an unnecessary index scan during btvacuumcleanup(). But it’s wrong in terms of btm_last_cleanup_num_heap_tuples, as you pointed out. This bug would make the check of vacuum_cleanup_index_scale_factor untrust. So I think it’s better to backpatch but I think we need to note that to fix this issue properly, in a case where a vacuum called btbulkdelete() earlier, probably we should update only btm_oldest_btpo_xact in btbulkdelete() and then update btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). In this case, we don’t know the oldest btpo.xact among the deleted pages in btvacuumcleanup(). This means that we would need to update the meta page twice, leading to WAL logging twice. Since we already could update the meta page more than once when a vacuum calls btbulkdelete() multiple times I think it would not be a problem, though. > > In any case we should probably make this change as part of Postgres > 14. Don't you think? It's certainly easy to do it this way now, since > there will be no need to keep around a oldestBtpoXact value until > btvacuumcleanup() (in the common case where btbulkdelete() is where we > call btvacuumscan()). The new btm_last_cleanup_num_delpages field > (which replaces btm_oldest_btpo_xact) has a value that just comes from > the bulk stats, which is easy anyway. Agreed. As I mentioned above, we might need to consider how btbulkdelete() can tell btvacuumcleanup() btm_last_cleanup_num_delpages in a case where a vacuum called btbulkdelete earlier. During parallel vacuum, two different processes could do btbulkdelete() and btvacuumcleanup() respectively. Updating those values separately in those callbacks would be straightforward. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: