Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Deleting older versions in unique indexes to avoid page splits
Date
Msg-id CAH2-WzkfGuiU1oqOXSdbawYUtwCACOvmnwuog1nt71seN7YXEA@mail.gmail.com
Whole thread Raw
In response to Re: Deleting older versions in unique indexes to avoid page splits  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Jan 20, 2021 at 10:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
> This patch is unusual in that you really need to think about emergent
> behaviors to understand it. That is certainly a difficult thing to do,
> and it's understandable that even an expert might not grok it without
> considering it carefully.

I happened to stumble upon a recent blog post that seems like a light,
approachable introduction to some of the key concepts here:

https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/

Bottom-up index deletion enhances a complex system whose maintenance
costs are *dramatically* nonlinear, at least in many important cases.
If you apply linear thinking to such a system then you'll probably end
up with a bad design.

The system as a whole is made efficient by making sure that we're lazy
when that makes sense, while also making sure that we're eager when
that makes sense. So it almost *has* to be structured as a bottom-up,
reactive mechanism -- no other approach is able to ramp up or down in
exactly the right way. Talking about small cost differences (things
that can easily be empirically measured, perhaps with a
microbenchmark) is almost irrelevant to the big picture. It's even
irrelevant to the "medium picture".

What's more, it's basically a mistake to think of heap page accesses
that don't yield any deletable index tuples as wasted effort (even
though that's how I describe them myself!). Here's why: we have to
access the heap page to learn that it has nothing for us in the first
place place! If we somehow knew ahead of time that some useless-to-us
heap block was useless, then the whole system wouldn't be bottom-up
(by definition). In other words, failing to get any index tuple
deletes from an entire heap page *is itself a form of feedback* at the
local level -- it guides the entire system's behavior over time. Why
should we expect to get that information at zero cost?

This is somehow both simple and complicated, which creates huge
potential for miscommunications. I tried to describe this in various
ways at various points. Perhaps I could have done better with that.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: poc - possibility to write window function in PL languages
Next
From: James Hilliard
Date:
Subject: Re: [PATCH 1/1] Fix detection of pwritev support for OSX.