Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WznTDLUX2qpOHqcK5Qiv6S+xQeBtdoJuLNrNNfrs_ig5pQ@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Jan 25, 2021 at 10:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I need to spend more time on benchmarking to study the behavior and I
> think without that it would be difficult to make a conclusion in this
> regard. So, let's not consider any action on this front till I spend
> more time to find the details.

It is true that I committed the patch without thorough review, which
was less than ideal. I welcome additional review from you now.

I will say one more thing about it for now: Start with a workload, not
with the code. Without bottom-up deletion (e.g. when using Postgres
13) with a simple though extreme workload that will experience version
churn in indexes after a while, it still takes quite a few minutes for
the first page to split (when the table is at least a few GB in size
to begin with). When I was testing the patch I would notice that it
could take 10 or 15 minutes for the deletion mechanism to kick in for
the first time -- the patch really didn't do anything at all until
perhaps 15 minutes into the benchmark, despite helping *enormously* by
the 60 minute mark. And this is with significant skew, so presumably
the first page that would split (in the absence of the bottom-up
deletion feature) was approximately the page with the most skew --
most individual pages might have taken 30 minutes or more to split
without the intervention of bottom-up deletion.

Relatively rare events (in this case would-be page splits) can have
very significant long term consequences for the sustainability of a
workload, so relatively simple targeted interventions can make all the
difference. The idea behind bottom-up deletion is to allow the
workload to figure out the best way of fixing its bloat problems
*naturally*. The heuristics must be simple precisely because workloads
are so varied and complicated. We must be willing to pay small fixed
costs for negative feedback  -- it has to be okay for the mechanism to
occasionally fail in order to learn what works. I freely admit that I
don't understand all workloads. But I don't think anybody can. This
holistic/organic approach has a lot of advantages, especially given
the general uncertainty about workload characteristics. Your suspicion
of the simple nature of the heuristics actually makes a lot of sense
to me. I do get it.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: Re: Error on failed COMMIT
Next
From: John Naylor
Date:
Subject: Re: WIP: BRIN multi-range indexes