Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCcSZSj7O0ObwYXwKBfcs3W+Nd+fOLq2EmRFoOZrkdvbw@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Sat, Oct 28, 2023 at 5:56 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> I wrote:
>
> > Seems fine at a glance, thanks. I will build on this to implement variable-length values. I have already finished
oneprerequisite which is: public APIs passing pointers to values. 
>
> Since my publishing schedule has not kept up, I'm just going to share
> something similar to what I mentioned earlier, just to get things
> moving again.

Thanks for sharing the updates. I've returned to work today and will
resume working on this feature.

>
> 0001-0009 are from earlier versions, except for 0007 which makes a
> bunch of superficial naming updates, similar to those done in a recent
> other version. Somewhere along the way I fixed long-standing git
> whitespace warnings, but I don't remember if that's new here. In any
> case, let's try to preserve that.
>
> 0010 is some minor refactoring to reduce duplication
>
> 0011-0014 add public functions that give the caller more control over
> the input and responsibility for locking. They are not named well, but
> I plan these to be temporary: They are currently used for the tidstore
> only, since that has much simpler tests than the standard radix tree
> tests. One thing to note: since the tidstore has always done it's own
> locking within a larger structure, these patches don't bother to do
> locking at the radix tree level. Locking twice seems...not great.
> These patches are the main prerequisite for variable-length values.
> Once that is working well, we can switch the standard tests to the new
> APIs.

Since the variable-length values support is a big deal and would be
related to API design I'd like to discuss the API design first.
Currently, we have the following APIs:

---
RT_VALUE_TYPE
RT_GET(RT_RADIX_TREE *tree, uint64 key, bool *found);
or for variable-length value support,
RT_GET(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found);

If an entry already exists, return its pointer and set "found" to
true. Otherwize, insert an empty value with sz bytes, return its
pointer, and set "found" to false.

---
RT_VALUE_TYPE
RT_FIND(RT_RADIX_TREE *tree, uint64 key);

If an entry exists, return the pointer to the value, otherwise return NULL.

(I omitted RT_SEARCH() as it's essentially the same as RT_FIND() and
will probably get removed.)

---
bool
RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p);
or for variable-length value support,
RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p, size_t sz);

If an entry already exists, update its value to 'value_p' and return
true. Otherwise set the value and return false.

Given variable-length value support, RT_GET() would have to do
repalloc() if the existing value size is not big enough for the new
value, but it cannot as the radix tree doesn't know the size of each
stored value. Another idea is that the radix tree returns the pointer
to the slot and the caller updates the value accordingly. But it means
that the caller has to update the slot properly while considering the
value size (embedded vs. single-leave value), which seems not a good
idea.

To deal with this problem, I think we can somewhat change RT_GET() API
as follow:

RT_VALUE_TYPE
RT_INSERT(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found);

If the entry already exists, replace the value with a new empty value
with sz bytes and set "found" to true. Otherwise, insert an empty
value, return its pointer, and set "found" to false.

We probably will find a better name but I use RT_INSERT() for
discussion. RT_INSERT() returns an empty slot regardless of existing
values. It can be used to insert a new value or to replace the value
with a larger value.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Adding facility for injection points (or probe points?) for more advanced tests
Next
From: Alexander Lakhin
Date:
Subject: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)