Re: 64-bit XIDs in deleted nbtree pages - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: 64-bit XIDs in deleted nbtree pages
Date
Msg-id CAH2-Wzk_WkOkWQvsecVKx-1AGG8Zy7hzYb=2pvoienxnJzhAag@mail.gmail.com
Whole thread Raw
In response to Re: 64-bit XIDs in deleted nbtree pages  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: 64-bit XIDs in deleted nbtree pages
List pgsql-hackers
On Tue, Feb 16, 2021 at 4:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Ugh, yes, I think it's a bug.

I was actually thinking of a similar bug in nbtree deduplication when
I spotted this one -- see commit 48e12913. The index AM API stuff is
tricky.

> When developing this feature, in an old version patch, we used to set
> invalid values to both btm_oldest_btpo_xact and
> btm_last_cleanup_num_heap_tuples in btbulkdelete() to reset these
> values. But we decided to set valid values to both even in
> btbulkdelete(). I believe that decision was correct in terms of
> btm_oldest_btpo_xact because with the old version patch we will do an
> unnecessary index scan during btvacuumcleanup(). But it’s wrong in
> terms of btm_last_cleanup_num_heap_tuples, as you pointed out.

Right.

> This bug would make the check of vacuum_cleanup_index_scale_factor
> untrust. So I think it’s better to backpatch but I think we need to
> note that to fix this issue properly, in a case where a vacuum called
> btbulkdelete() earlier, probably we should update only
> btm_oldest_btpo_xact in btbulkdelete() and then update
> btm_last_cleanup_num_heap_tuples in btvacuumcleanup(). In this case,
> we don’t know the oldest btpo.xact among the deleted pages in
> btvacuumcleanup(). This means that we would need to update the meta
> page twice, leading to WAL logging twice. Since we already could
> update the meta page more than once when a vacuum calls btbulkdelete()
> multiple times I think it would not be a problem, though.

I agree that that approach is fine. Realistically, we won't even have
to update the metapage twice in most cases. Because most indexes never
have even one page deletion anyway.

> As I mentioned above, we might need to consider how btbulkdelete() can
> tell btvacuumcleanup() btm_last_cleanup_num_delpages in a case where a
> vacuum called btbulkdelete earlier. During parallel vacuum, two
> different processes could do btbulkdelete() and btvacuumcleanup()
> respectively. Updating those values separately in those callbacks
> would be straightforward.

I don't see why it should be a problem for my patch/Postgres 14,
because we don't have the same btpo.xact/oldestBtpoXact issue that the
original Postgres 11 commit dealt with. The patch determines a value
for btm_last_cleanup_num_delpages (which I call
pages_deleted_not_free) by subtracting fields from the bulk delete
stats: we just use "stats->pages_deleted - stats->pages_free".

Isn't btvacuumcleanup() (or any other amvacuumcleanup() routine)
entitled to rely on the bulk delete stats being set in the way I've
described? I assumed that that was okay in general, but I haven't
tested parallel VACUUM specifically. Will parallel VACUUM really fail
to ensure that values in bulk stats fields (like pages_deleted and
pages_free) get set correctly for amvacuumcleanup() callbacks?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file
Next
From: Peter Geoghegan
Date:
Subject: Re: 64-bit XIDs in deleted nbtree pages