Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoAyxDaNrvWwSVfPerQkP034FG0PjYb=7j51QVTiPAtqgQ@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum
List pgsql-hackers
On Thu, Jul 14, 2022 at 1:17 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Tue, Jul 12, 2022 at 8:16 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > > > I think that at this stage it's better to define the design first. For
> > > > example, key size and value size, and these sizes are fixed or can be
> > > > set the arbitary size?
> > >
> > > I don't think we need to start over. Andres' prototype had certain
> > > design decisions built in for the intended use case (although maybe
> > > not clearly documented as such). Subsequent patches in this thread
> > > substantially changed many design aspects. If there were any changes
> > > that made things wonderful for vacuum, it wasn't explained, but Andres
> > > did explain how some of these changes were not good for other uses.
> > > Going to fixed 64-bit keys and values should still allow many future
> > > applications, so let's do that if there's no reason not to.
> >
> > I thought Andres pointed out that given that we store BufferTag (or
> > part of that) into the key, the fixed 64-bit keys might not be enough
> > for buffer mapping use cases. If we want to use wider keys more than
> > 64-bit, we would need to consider it.
>
> It sounds like you've answered your own question, then. If so, I'm
> curious what your current thinking is.
>
> If we *did* want to have maximum flexibility, then "single-value
> leaves" method would be the way to go, since it seems to be the
> easiest way to have variable-length both keys and values. I do have a
> concern that the extra pointer traversal would be a drag on
> performance, and also require lots of small memory allocations.

Agreed.

> I also have some concerns about also simultaneously trying to design
> for the use for buffer mappings. I certainly want to make this good
> for as many future uses as possible, and I'd really like to preserve
> any optimizations already fought for. However, to make concrete
> progress on the thread subject, I also don't think it's the most
> productive use of time to get tied up about the fine details of
> something that will not likely happen for several years at the
> earliest.

I’d like to keep the first version simple. We can improve it and add
more optimizations later. Using radix tree for vacuum TID storage
would still be a big win comparing to using a flat array, even without
all these optimizations. In terms of single-value leaves method, I'm
also concerned about an extra pointer traversal and extra memory
allocation. It's most flexible but multi-value leaves method is also
flexible enough for many use cases. Using the single-value method
seems to be too much as the first step for me.

Overall, using 64-bit keys and 64-bit values would be a reasonable
choice for me as the first step . It can cover wider use cases
including vacuum TID use cases. And possibly it can cover use cases by
combining a hash table or using tree of tree, for example.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: PATCH: Add Table Access Method option to pgbench
Next
From: Andres Freund
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum