Re: 64-bit XIDs in deleted nbtree pages - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: 64-bit XIDs in deleted nbtree pages |
Date | |
Msg-id | CAH2-Wzk_WkOkWQvsecVKx-1AGG8Zy7hzYb=2pvoienxnJzhAag@mail.gmail.com Whole thread Raw |
In response to | Re: 64-bit XIDs in deleted nbtree pages (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: 64-bit XIDs in deleted nbtree pages
|
List | pgsql-hackers |
On Tue, Feb 16, 2021 at 4:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Ugh, yes, I think it's a bug. I was actually thinking of a similar bug in nbtree deduplication when I spotted this one -- see commit 48e12913. The index AM API stuff is tricky. > When developing this feature, in an old version patch, we used to set > invalid values to both btm_oldest_btpo_xact and > btm_last_cleanup_num_heap_tuples in btbulkdelete() to reset these > values. But we decided to set valid values to both even in > btbulkdelete(). I believe that decision was correct in terms of > btm_oldest_btpo_xact because with the old version patch we will do an > unnecessary index scan during btvacuumcleanup(). But it’s wrong in > terms of btm_last_cleanup_num_heap_tuples, as you pointed out. Right. > This bug would make the check of vacuum_cleanup_index_scale_factor > untrust. So I think it’s better to backpatch but I think we need to > note that to fix this issue properly, in a case where a vacuum called > btbulkdelete() earlier, probably we should update only > btm_oldest_btpo_xact in btbulkdelete() and then update > btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). In this case, > we don’t know the oldest btpo.xact among the deleted pages in > btvacuumcleanup(). This means that we would need to update the meta > page twice, leading to WAL logging twice. Since we already could > update the meta page more than once when a vacuum calls btbulkdelete() > multiple times I think it would not be a problem, though. I agree that that approach is fine. Realistically, we won't even have to update the metapage twice in most cases. Because most indexes never have even one page deletion anyway. > As I mentioned above, we might need to consider how btbulkdelete() can > tell btvacuumcleanup() btm_last_cleanup_num_delpages in a case where a > vacuum called btbulkdelete earlier. During parallel vacuum, two > different processes could do btbulkdelete() and btvacuumcleanup() > respectively. Updating those values separately in those callbacks > would be straightforward. I don't see why it should be a problem for my patch/Postgres 14, because we don't have the same btpo.xact/oldestBtpoXact issue that the original Postgres 11 commit dealt with. The patch determines a value for btm_last_cleanup_num_delpages (which I call pages_deleted_not_free) by subtracting fields from the bulk delete stats: we just use "stats->pages_deleted - stats->pages_free". Isn't btvacuumcleanup() (or any other amvacuumcleanup() routine) entitled to rely on the bulk delete stats being set in the way I've described? I assumed that that was okay in general, but I haven't tested parallel VACUUM specifically. Will parallel VACUUM really fail to ensure that values in bulk stats fields (like pages_deleted and pages_free) get set correctly for amvacuumcleanup() callbacks? -- Peter Geoghegan
pgsql-hackers by date: