Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAFBsxsHx9vwGuygw5eX6yryZ72FWeL-fL6GAnWmzYs8VP6HV_w@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Mon, Mar 13, 2023 at 8:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sun, Mar 12, 2023 at 12:54 AM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> >
> > On Fri, Mar 10, 2023 at 9:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> > > * Additional size classes. It's important for an alternative of path
> > > compression as well as supporting our decoupling approach. Middle
> > > priority.
> >
> > I'm going to push back a bit and claim this doesn't bring much gain, while it does have a complexity cost. The node1 from Andres's prototype is 32 bytes in size, same as our node3, so it's roughly equivalent as a way to ameliorate the lack of path compression.
>
> But does it mean that our node1 would help reduce the memory further
> since since our base node type (i.e. RT_NODE) is smaller than the base
> node type of Andres's prototype? The result I shared before showed
> 1.2GB vs. 1.9GB.

The benefit is found in a synthetic benchmark with random integers. I highly doubt that anyone would be willing to force us to keep binary-searching the 1GB array for one more cycle on account of not adding a size class here. I'll repeat myself and say that there are also maintenance costs.

In contrast, I'm fairly certain that our attempts thus far at memory accounting/limiting are not quite up to par, and lacking enough to jeopardize the feature. We're already discussing that, so I'll say no more.

> > I say "roughly" because the loop in node3 is probably noticeably slower. A new size class will by definition still use that loop.
>
> I've evaluated the performance of node1 but the result seems to show
> the opposite.

As an aside, I meant the loop in our node3 might make your node1 slower than the prototype's node1, which was coded for 1 member only. 

> > > * Node shrinking support. Low priority.
> >
> > This is an architectural wart that's been neglected since the tid store doesn't perform deletion. We'll need it sometime. If we're not going to make this work, why ship a deletion API at all?
> >
> > I took a look at this a couple weeks ago, and fixing it wouldn't be that hard. I even had an idea of how to detect when to shrink size class within a node kind, while keeping the header at 5 bytes. I'd be willing to put effort into that, but to have a chance of succeeding, I'm unwilling to make it more difficult by adding more size classes at this point.
>
> I think that the deletion (and locking support) doesn't have use cases
> in the core (i.e. tidstore) but is implemented so that external
> extensions can use it.

I think these cases are a bit different: Doing anything with a data structure stored in shared memory without a synchronization scheme is completely unthinkable and insane. I'm not yet sure if deleting-without-shrinking is a showstopper, or if it's preferable in v16 than no deletion at all.

Anything we don't implement now is a limit on future use cases, and thus a cause for objection. On the other hand, anything we implement also represents more stuff that will have to be rewritten for high-concurrency.

> FYI, I've run TPC-C workload over the weekend, and didn't get any
> failures of the assertion proving tidstore and the current tid lookup
> return the same result.

Great!

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Refactoring SysCacheGetAttr to know when attr cannot be NULL
Next
From: Dean Rasheed
Date:
Subject: Re: MERGE ... RETURNING