Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [PoC] Improve dead tuple storage for lazy vacuum
Date
Msg-id CAD21AoCbnN6JiXm4sWqAh5kH656oiHhi=HjFwtzKTr=g53YmnA@mail.gmail.com
Whole thread Raw
In response to Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Fri, Mar 10, 2023 at 3:42 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> On Thu, Mar 9, 2023 at 1:51 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've attached the new version patches. I merged improvements and fixes
> > I did in the v29 patch.
>
> I haven't yet had a chance to look at those closely, since I've had to devote time to other commitments. I remember I
wasn'tparticularly impressed that v29-0008 mixed my requested name-casing changes with a bunch of other random things.
Separatingthose out would be an obvious way to make it easier for me to look at, whenever I can get back to this. I
needto look at the iteration changes as well, in addition to testing memory measurement (thanks for the new results,
theylook encouraging). 

Okay, I'll separate them again.

>
> > Apart from the memory measurement stuff, I've done another todo item
> > on my list; adding min max classes for node3 and node125. I've done
>
> This didn't help us move us closer to something committable the first time you coded this without making sure it was
agood idea. It's still not helping and arguably makes it worse. To be fair, I did speak positively about _considering_
additionalsize classes some months ago, but that has a very obvious maintenance cost, something we can least afford
rightnow. 
>
> I'm frankly baffled you thought this was important enough to work on again, yet thought it was a waste of time to try
toprove to ourselves that autovacuum in a realistic, non-deterministic workload gave the same answer as the current tid
lookup.Even if we had gone that far, it doesn't seem like a good idea to add non-essential code to critical paths right
now.

I didn't think that proving tidstore and the current tid lookup return
the same result was a waste of time. I've shared a patch to do that in
tidstore before. I agreed not to add it to the tree but we can test
that using this patch. Actually I've done a test that ran pgbench
workload for a few days.

IIUC it's still important to consider whether to have node1 since it
could be a good alternative for the path compression. The prototype
also implemented it. Of course we can leave it for future improvement.
But considering this item with the performance tests helps us to prove
our decoupling approach is promising.

> We're rapidly running out of time, and we're at the point in the cycle where it's impossible to get meaningful review
fromanyone not already intimately familiar with the patch series. I only want to see progress on addressing possible
(especiallyarchitectural) objections from the community, because if they don't notice them now, they surely will after
commit.

Right, we've been making many design decisions. Some of them are
agreed just between you and me and some are agreed with other hackers.
There are some irrevertible design decisions due to the remaining
time.

>  I have my own list of possible objections as well as bikeshedding points, which I'll clean up and share next week.

Thanks.

>  I plan to invite Andres to look at that list and give his impressions, because it's a lot quicker than reading the
patches.Based on that, I'll hopefully be able to decide whether we have enough time to address any feedback and do
remainingpolishing in time for feature freeze. 
>
> I'd suggest sharing your todo list in the meanwhile, it'd be good to discuss what's worth doing and what is not.

Apart from more rounds of reviews and tests, my todo items that need
discussion and possibly implementation are:

* The memory measurement in radix trees and the memory limit in
tidstores. I've implemented it in v30-0007 through 0009 but we need to
review it. This is the highest priority for me.

* Additional size classes. It's important for an alternative of path
compression as well as supporting our decoupling approach. Middle
priority.

* Node shrinking support. Low priority.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] Add pretty-printed XML output option