Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: New IndexAM API controlling index vacuum strategies
Date
Msg-id CAH2-WzmdbENrn687=d5Xs9ab5CFnh+J7EdMzDGT8WtGkx-=0rQ@mail.gmail.com
Whole thread Raw
In response to Re: New IndexAM API controlling index vacuum strategies  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Mar 22, 2021 at 6:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've looked at this 0001 patch and here are some review comments:

> + * Since we might have to prune a second time here, the code is structured to
> + * use a local per-page copy of the counters that caller accumulates.  We add
> + * our per-page counters to the per-VACUUM totals from caller last of all, to
> + * avoid double counting.
>
> Those comments should be a part of 0002 patch?

Right -- will fix.

> pc.num_tuples is incremented twice. ps->hastup = true is also duplicated.

Must have been a mistake when splitting the patch up -- will fix.

> ---
> In step 7, with the patch, we save the freespace of the page and do
> lazy_vacuum_page(). But should it be done in reverse?

> How about renaming to vacuum_two_pass_strategy() or something to clear
> this function is used to vacuum?

Okay. I will rename it to lazy_vacuum_pruned_items().

> vacrelstats->dead_tuples->num_tuples)));
>
> It seems that the comment needs to be updated.

Will fix.

> I’ll review the other two patches tomorrow.

And I'll respond to your remarks on those (which are already posted
now) separately.

> I didn't look at the 0002 patch in-depth but the main difference
> between those two WAL records is that XLOG_HEAP2_PRUNE has the offset
> numbers of unused, redirected, and dead whereas XLOG_HEAP2_VACUUM has
> only the offset numbers of unused?

That's one difference. Another difference is that there is no
latestRemovedXid field. And there is a third difference: we no longer
need a super-exclusive lock for heap page vacuuming (not pruning) with
this design -- which also means that we cannot defragment the page
during heap vacuuming (that's unsafe with only an exclusive lock
because it physically relocates tuples with storage that somebody else
may have a C pointer to that they expect to stay sane). These
differences during original execution of heap page vacuum necessitate
inventing a new REDO routine that does things in exactly the same way.

To put it another way, heap vacuuming is now very similar to index
vacuuming (both are dissimilar to heap pruning). They're very simple,
and 100% a matter of freeing space in physical data structures.
Clearly that's always something that we can put off if it makes sense
to do so. That high level simplicity seems important to me. I always
disliked the way the WAL records for vacuumlazy.c worked. Especially
the XLOG_HEAP2_CLEANUP_INFO record -- that one is terrible.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: A reloption for partitioned tables - parallel_workers
Next
From: Fujii Masao
Date:
Subject: Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)