Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CANWCAZbYw0d=6dO7WsVhMWoWUN+qyomJFmxBD23Ye2ZxLbhfeA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, Mar 21, 2024 at 1:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> Or we can have a new function for dsa.c to set the initial and max
> segment size (or either one) to the existing DSA area so that
> TidStoreCreate() can specify them at creation.

I didn't like this very much, because it's splitting an operation
across an API boundary. The caller already has all the information it
needs when it creates the DSA. Straw man proposal: it could do the
same for local memory, then they'd be more similar. But if we made
local contexts the responsibility of the caller, that would cause
duplication between creating and resetting.

> In shared TidStore
> cases, since all memory required by shared radix tree is allocated in
> the passed-in DSA area and the memory usage is the total segment size
> allocated in the DSA area

...plus apparently some overhead, I just found out today, but that's
beside the point.

On Thu, Mar 21, 2024 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Yet another idea is that TidStore creates its own DSA area in
> TidStoreCreate(). That is, In TidStoreCreate() we create a DSA area
> (using dsa_create()) and pass it to RT_CREATE(). Also, we need a new
> API to get the DSA area. The caller (e.g. parallel vacuum) gets the
> dsa_handle of the DSA and stores it in the shared memory (e.g. in
> PVShared). TidStoreAttach() will take two arguments: dsa_handle for
> the DSA area and dsa_pointer for the shared radix tree. This idea
> still requires controlling min/max segment sizes since dsa_create()
> uses the 1MB as the initial segment size. But the TidStoreCreate()
> would be more user friendly.

This seems like an overall simplification, aside from future size
configuration, so +1 to continue looking into this. If we go this
route, I'd like to avoid a boolean parameter and cleanly separate
TidStoreCreateLocal() and TidStoreCreateShared(). Every operation
after that can introspect, but it's a bit awkward to force these cases
into the same function. It always was a little bit, but this change
makes it more so.



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export
Next
From: Bharath Rupireddy
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts