Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoDD5VXYMyWdrSjtN01HdYq3Ja-yzTP=rD9315-5bdLzpA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Wed, Oct 5, 2022 at 6:40 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
>
> On Wed, Oct 5, 2022 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Sep 28, 2022 at 12:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Sep 23, 2022 at 12:11 AM John Naylor
> > > <john.naylor@enterprisedb.com> wrote:
> > > Yeah, node31 and node256 are bloated.  We probably could use slab for
> > > node256 independently. It's worth trying a benchmark to see how it
> > > affects the performance and the tree size.
>
> This wasn't the focus of your current email, but while experimenting with v6 I had another thought about local
allocation:If we use the default slab block size of 8192 bytes, then only 3 chunks of size 2088 can fit, right? If so,
sinceaset and DSA also waste at least a few hundred bytes, we could store a useless 256-byte slot array within node256.
Thatway, node128 and node256 share the same start of pointers/values array, so there would be one less branch for
gettingthat address. In v6, rt_node_get_values and rt_node_get_children are not inlined (asde: gcc uses a jump table
for5 kinds but not for 4), but possibly should be, and the smaller the better. 

It would be good for performance but I'm a bit concerned that it's
highly optimized to the design of aset and DSA. Since size 2088 will
be currently classed as 2616 in DSA, DSA wastes 528 bytes. However, if
we introduce a new class of 2304 (=2048 + 256) bytes we cannot store a
useless 256-byte and the assumption will be broken.

>
> > Regarding DSA support, IIUC we need to use dsa_pointer in inner nodes
> > to point to its child nodes, instead of C pointers (ig, backend-local
> > address). I'm thinking of a straightforward approach as the first
> > step; inner nodes have a union of rt_node* and dsa_pointer and we
> > choose either one based on whether the radix tree is shared or not. We
> > allocate and free the shared memory for individual nodes by
> > dsa_allocate() and dsa_free(), respectively. Therefore we need to get
> > a C pointer from dsa_pointer by using dsa_get_address() while
> > descending the tree. I'm a bit concerned that calling
> > dsa_get_address() for every descent could be performance overhead but
> > I'm going to measure it anyway.
>
> Are dsa pointers aligned the same as pointers to locally allocated memory? Meaning, is the offset portion always a
multipleof 4 (or 8)? 

I think so.

> It seems that way from a glance, but I can't say for sure. If the lower 2 bits of a DSA pointer are never set, we can
tagthem the same way as a regular pointer. That same technique could help hide the latency of converting the pointer,
bythe same way it would hide the latency of loading parts of a node into CPU registers. 
>
> One concern is, handling both local and dsa cases in the same code requires more (predictable) branches and reduces
codedensity. That might be a reason in favor of templating to handle each case in its own translation unit. 

Right. We also need to support locking for shared radix tree, which
would require more branches.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: shadow variables - pg15 edition
Next
From: Bharath Rupireddy
Date:
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication