Home > mailing lists

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date	January 10, 2023 15:07:41
Msg-id	CAD21AoCe1Ch+y1_nDzTXF5w6VT=nnP33QXM5V9z-PE3-x0f_xw@mail.gmail.com Whole thread Raw
In response to	Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: [PoC] Improve dead tuple storage for lazy vacuum
List	pgsql-hackers

Tree view

On Mon, Jan 9, 2023 at 5:59 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
> > [working on templating]
>
> In the end, I decided to base my effort on v8, and not v12 (based on one of my less-well-thought-out ideas). The
latterwas a good experiment, but it did not lead to an increase in readability as I had hoped. The attached v17 is
stillrough, but it's in good enough shape to evaluate a mostly-complete templating implementation. 

I really appreciate your work!

>
> v13-0007 had some changes to the regression tests, but I haven't included those. The tests from v13-0003 do pass,
bothlocally and shared. I quickly hacked together changing shared/local tests by hand (need to recompile), but it would
begood for maintainability if tests could run once each with local and shmem, but use the same "expected" test output. 

Agreed.

> Also, I didn't look to see if there were any changes in v14/15 that didn't have to do with precise memory accounting.
>
> At this point, Masahiko, I'd appreciate your feedback on whether this is an improvement at all (or at least a good
basefor improvement), especially for integrating with the TID store. I think there are some advantages to the template
approach.One possible disadvantage is needing separate functions for each local and shared memory. 
>
> If we go this route, I do think the TID store should invoke the template as static functions. I'm not quite
comfortablewith a global function that may not fit well with future use cases. 

It looks no problem in terms of vacuum integration, although I've not
fully tested yet. TID store uses the radix tree as the main storage,
and with the template radix tree, the data types for shared and
non-shared will be different. TID store can have an union for the
radix tree and the structure would be like follows:

/* Per-backend state for a TidStore */
struct TidStore
{
    /*
     * Control object. This is allocated in DSA area 'area' in the shared
     * case, otherwise in backend-local memory.
     */
    TidStoreControl *control;

    /* Storage for Tids */
    union tree
    {
        local_radix_tree    *local;
        shared_radix_tree   *shared;
    };

    /* DSA area for TidStore if used */
    dsa_area    *area;
};

In the functions of TID store, we need to call either local or shared
radix tree functions depending on whether TID store is shared or not.
We need if-branch for each key-value pair insertion, but I think it
would not be a big performance problem in TID store use cases, since
vacuum is an I/O intensive operation in many cases. Overall, I think
there is no problem and I'll investigate it in depth.

Apart from that, I've been considering the lock support for shared
radix tree. As we discussed before, the current usage (i.e, only
parallel index vacuum) doesn't require locking support at all, so it
would be enough to have a single lock for simplicity. If we want to
use the shared radix tree for other use cases such as the parallel
heap vacuum or the replacement of the hash table for shared buffers,
we would need better lock support. For example, if we want to support
Optimistic Lock Coupling[1], we would need to change not only the node
structure but also the logic. Which probably leads to widen the gap
between the code for non-shared and shared radix tree. In this case,
once we have a better radix tree optimized for shared case, perhaps we
can replace the templated shared radix tree with it. I'd like to hear
your opinion on this line.

>
> One review point I'll mention: Somehow I didn't notice there is no use for the "chunk" field in the rt_node type --
it'sonly set to zero and copied when growing. What is the purpose? Removing it would allow the smallest node to take up
only32 bytes with a fanout of 3, by eliminating padding. 

Oh, I didn't notice that. The chunk field was originally used when
redirecting the child pointer in the parent node from old to new
(grown) node. When redirecting the pointer, since the corresponding
chunk surely exists on the parent we can skip existence checks.
Currently we use RT_NODE_UPDATE_INNER() for that (see
RT_REPLACE_NODE()) but having a dedicated function to update the
existing chunk and child pointer might improve the performance. Or
reducing the node size by getting rid of the chunk field might be
better.

> Also, v17-0005 has an optimization/simplification for growing into node125 (my version needs an assertion or
fallback,but works well now), found by another reading of Andres' prototype There is a lot of good engineering there,
weshould try to preserve it. 

Agreed.

Regards,

[1] https://db.in.tum.de/~leis/papers/artsync.pdf

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Dilip Kumar
Date: 10 January 2023, 14:47:41
Subject: Re: Perform streaming logical transactions by background workers and parallel apply

From: Jelte Fennema
Date: 10 January 2023, 15:09:44
Subject: Re: Allow +group in pg_ident.conf

Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

Previous

Next