Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CANWCAZbSp1txmTopBmn9Y7spQ0yM-wzh13OFbMTFFMswCpQL0w@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Jan 3, 2024 at 9:10 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > calls part. I think that even if we expose them, we will still need to
> > do something like "if (TidStoreIsShared(ts))
> > shared_rt_lock_share(ts->tree.shared)", no?
>
> I'll come back to this topic separately.

To answer your question, sure, but that "if (TidStoreIsShared(ts))"
part would be pushed down into a function so that only one place has
to care about it.

However, I'm starting to question whether we even need that. Meaning,
lock the tidstore separately. To "lock the tidstore" means to take a
lock, _separate_ from the radix tree's internal lock, to control
access to two fields in a separate "control object":

+typedef struct TidStoreControl
+{
+ /* the number of tids in the store */
+ int64 num_tids;
+
+ /* the maximum bytes a TidStore can use */
+ size_t max_bytes;

I'm pretty sure max_bytes does not need to be in shared memory, and
certainly not under a lock: Thinking of a hypothetical
parallel-prune-phase scenario, one way would be for a leader process
to pass out ranges of blocks to workers, and when the limit is
exceeded, stop passing out blocks and wait for all the workers to
finish.

As for num_tids, vacuum previously put the similar count in

@@ -176,7 +179,8 @@ struct ParallelVacuumState
  PVIndStats *indstats;

  /* Shared dead items space among parallel vacuum workers */
- VacDeadItems *dead_items;
+ TidStore *dead_items;

VacDeadItems contained "num_items". What was the reason to have new
infrastructure for that count? And it doesn't seem like access to it
was controlled by a lock -- can you confirm? If we did get parallel
pruning, maybe the count would belong inside PVShared?

The number of tids is not that tightly bound to the tidstore's job. I
believe tidbitmap.c (a possible future client) doesn't care about the
global number of tids -- not only that, but AND/OR operations can
change the number in a non-obvious way, so it would not be convenient
to keep an accurate number anyway. But the lock would still be
mandatory with this patch.

If we can make vacuum work a bit closer to how it does now, it'd be a
big step up in readability, I think. Namely, getting rid of all the
locking logic inside tidstore.c and let the radix tree's locking do
the right thing. We'd need to make that work correctly when receiving
pointers to values upon lookup, and I already shared ideas for that.
But I want to see if there is any obstacle in the way of removing the
tidstore control object and it's separate lock.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Next
From: Geoff Winkless
Date:
Subject: Re: weird GROUPING SETS and ORDER BY behaviour