Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCUSjRWN_AM6deg2QhuDbc_uuLxQpnYm0GF60Kbwm_8rQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
List pgsql-hackers
On Thu, Dec 7, 2023 at 12:27 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Mon, Nov 27, 2023 at 1:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Oct 28, 2023 at 5:56 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> > bool
> > RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p);
> > or for variable-length value support,
> > RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p, size_t sz);
> >
> > If an entry already exists, update its value to 'value_p' and return
> > true. Otherwise set the value and return false.
>
> > RT_VALUE_TYPE
> > RT_INSERT(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found);
> >
> > If the entry already exists, replace the value with a new empty value
> > with sz bytes and set "found" to true. Otherwise, insert an empty
> > value, return its pointer, and set "found" to false.
> >
> > We probably will find a better name but I use RT_INSERT() for
> > discussion. RT_INSERT() returns an empty slot regardless of existing
> > values. It can be used to insert a new value or to replace the value
> > with a larger value.
>
> Looking at TidStoreSetBlockOffsets again (in particular how it works
> with RT_GET), and thinking about issues we've discussed, I think
> RT_SET is sufficient for vacuum. Here's how it could work:
>
> TidStoreSetBlockOffsets could have a stack variable that's "almost
> always" large enough. When not, it can allocate in its own context. It
> sets the necessary bits there. Then, it passes the pointer to RT_SET
> with the number of bytes to copy. That seems very simple.

Right.

>
> At some future time, we can add a new function with the complex
> business about getting the current value to modify it, with the
> re-alloc'ing that it might require.
>
> In other words, from both an API perspective and a performance
> perspective, it makes sense for tid store to have a simple "set"
> interface for vacuum that can be optimized for its characteristics
> (insert only, ordered offsets). And also a more complex one for bitmap
> scan (setting/unsetting bits of existing values, in any order). They
> can share the same iteration interface, key types, and value types.
>
> What do you think, Masahiko?

Good point. RT_SET() would be faster than RT_GET() and updating the
value because RT_SET() would not need to take care of the existing
value (its size, embedded or not, realloc etc).

I think that we can separate the radix tree patch into two parts: the
main implementation with RT_SET(), and more complex APIs such as
RT_GET() etc. That way, it would probably make it easy to complete the
radix tree and tidstore first.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Next
From: Junwang Zhao
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations