Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CANWCAZatsDg54wCjpF-GvL0TiOcvSQpDk9PD=-gsMEo8ariPEQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Mar 20, 2024 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I forgot to report the results. Yes, I did some tests where I inserted
> many TIDs to make the tidstore use several GB memory. I did two cases:
>
> 1. insert 100M blocks of TIDs with an offset of 100.
> 2. insert 10M blocks of TIDs with an offset of 2048.
>
> The tidstore used about 4.8GB and 5.2GB, respectively, and all lookup
> and iteration results were expected.

Thanks for confirming!

> While reviewing the codes again, the following two things caught my eyes:
>
> in check_set_block_offset() function, we don't take a lock on the
> tidstore while checking all possible TIDs. I'll add
> TidStoreLockShare() and TidStoreUnlock() as follows:
>
> +           TidStoreLockShare(tidstore);
>             if (TidStoreIsMember(tidstore, &tid))
>                 ItemPointerSet(&items.lookup_tids[num_lookup_tids++],
> blkno, offset);
> +           TidStoreUnlock(tidstore);

In one sense, all locking in the test module is useless since there is
only a single process. On the other hand, it seems good to at least
run what we have written to run it trivially, and serve as an example
of usage. We should probably be consistent, and document at the top
that the locks are pro-forma only.

It's both a blessing and a curse that vacuum only has a single writer.
It makes development less of a hassle, but also means that tidstore
locking is done for API-completeness reasons, not (yet) as a practical
necessity. Even tidbitmap.c's hash table currently has a single
writer, and while using tidstore for that is still an engineering
challenge for other reasons, it wouldn't exercise locking
meaningfully, either, at least at first.

> Regarding TidStoreMemoryUsage(), IIUC the caller doesn't need to take
> a lock on the shared tidstore since dsa_get_total_size() (called by
> RT_MEMORY_USAGE()) does appropriate locking. I think we can mention it
> in the comment as follows:
>
> -/* Return the memory usage of TidStore */
> +/*
> + * Return the memory usage of TidStore.
> + *
> + * In shared TidStore cases, since shared_ts_memory_usage() does appropriate
> + * locking, the caller doesn't need to take a lock.
> + */
>
> What do you think?

That duplicates the underlying comment on the radix tree function that
this calls, so I'm inclined to leave it out. At this level it's
probably best to document when a caller _does_ need to take an action.

One thing I forgot to ask about earlier:

+-- Add tids in out of order.

Are they (the blocks to be precise) really out of order? The VALUES
statement is ordered, but after inserting it does not output that way.
I wondered if this is platform independent, but CI and our dev
machines haven't failed this test, and I haven't looked into what
determines the order. It's easy enough to hide the blocks if we ever
need to, as we do elsewhere...



pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: pg_upgrade --copy-file-range
Next
From: "Amonson, Paul D"
Date:
Subject: RE: Popcount optimization using AVX512