Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoC+cSbacX6JAWj=JN8d5fe3XRE21WG1ROQ8XSuS-reyiQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Jan 8, 2024 at 8:35 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Wed, Jan 3, 2024 at 9:10 PM John Naylor <johncnaylorls@gmail.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > > calls part. I think that even if we expose them, we will still need to
> > > do something like "if (TidStoreIsShared(ts))
> > > shared_rt_lock_share(ts->tree.shared)", no?
> >
> > I'll come back to this topic separately.
>
> To answer your question, sure, but that "if (TidStoreIsShared(ts))"
> part would be pushed down into a function so that only one place has
> to care about it.
>
> However, I'm starting to question whether we even need that. Meaning,
> lock the tidstore separately. To "lock the tidstore" means to take a
> lock, _separate_ from the radix tree's internal lock, to control
> access to two fields in a separate "control object":
>
> +typedef struct TidStoreControl
> +{
> + /* the number of tids in the store */
> + int64 num_tids;
> +
> + /* the maximum bytes a TidStore can use */
> + size_t max_bytes;
>
> I'm pretty sure max_bytes does not need to be in shared memory, and
> certainly not under a lock: Thinking of a hypothetical
> parallel-prune-phase scenario, one way would be for a leader process
> to pass out ranges of blocks to workers, and when the limit is
> exceeded, stop passing out blocks and wait for all the workers to
> finish.

True. I agreed that it doesn't need to be under a lock anyway, as it's
read-only.

>
> As for num_tids, vacuum previously put the similar count in
>
> @@ -176,7 +179,8 @@ struct ParallelVacuumState
>   PVIndStats *indstats;
>
>   /* Shared dead items space among parallel vacuum workers */
> - VacDeadItems *dead_items;
> + TidStore *dead_items;
>
> VacDeadItems contained "num_items". What was the reason to have new
> infrastructure for that count? And it doesn't seem like access to it
> was controlled by a lock -- can you confirm? If we did get parallel
> pruning, maybe the count would belong inside PVShared?

I thought that since the tidstore is a general-purpose data structure
the shared counter should be protected by a lock. One thing I'm
concerned about is that we might need to update both the radix tree
and the counter atomically in some cases. But that's true we don't
need it for lazy vacuum at least for now. Even given the parallel scan
phase, probably we won't need to have workers check the total number
of stored tuples during a parallel scan.

>
> The number of tids is not that tightly bound to the tidstore's job. I
> believe tidbitmap.c (a possible future client) doesn't care about the
> global number of tids -- not only that, but AND/OR operations can
> change the number in a non-obvious way, so it would not be convenient
> to keep an accurate number anyway. But the lock would still be
> mandatory with this patch.

Very good point.

>
> If we can make vacuum work a bit closer to how it does now, it'd be a
> big step up in readability, I think. Namely, getting rid of all the
> locking logic inside tidstore.c and let the radix tree's locking do
> the right thing. We'd need to make that work correctly when receiving
> pointers to values upon lookup, and I already shared ideas for that.
> But I want to see if there is any obstacle in the way of removing the
> tidstore control object and it's separate lock.

So I agree to remove both max_bytes and num_items from the control
object.Also, as you mentioned, we can remove the tidstore control
object itself. TidStoreGetHandle() returns a radix tree handle, and we
can pass it to TidStoreAttach().  I'll try it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Show WAL write and fsync stats in pg_stat_io
Next
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations