Re: Teaching users how they can get the most out of HOT in Postgres 14 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Teaching users how they can get the most out of HOT in Postgres 14
Date
Msg-id CAH2-WzkCYR0U7zXqXo0CgFaFwUDz1WbKq8ngjzKi4+AQ5f-mYQ@mail.gmail.com
Whole thread Raw
In response to Re: Teaching users how they can get the most out of HOT in Postgres 14  (Andres Freund <andres@anarazel.de>)
Responses Re: Teaching users how they can get the most out of HOT in Postgres 14  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Mon, Apr 12, 2021 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
> Well, one argument is that you made a fairly significant behavioural
> change, with hard-coded logic for when the optimization kicks in. It's
> not at all clear that your constants are the right ones for every
> workload.

(Apparently nobody wants to talk about HOT and the documentation.)

The BYPASS_THRESHOLD_PAGES cutoff was chosen conservatively, so that
it would avoid index vacuuming in truly marginal cases -- and it tends
to only delay it there.

A table-level threshold is not the best way of constraining the
problem. In the future, the table threshold should be treated as only
one factor among several. Plus there will be more than a simple yes/no
question to consider. We should eventually be able to do index
vacuuming for some indexes but not others. Bottom-up index deletion
has totally changed things here, because roughly speaking it makes
index bloat proportionate to the number of logical changes to indexed
columns -- you could have one super-bloated index on the table, but
several others that perfectly retain their original size. You still
need to do heap vacuuming eventually, which necessitates vacuuming
indexes too, but the right strategy is probably to vacuum much more
frequently, vacuuming the bloated index each time. You only do a full
round of index vacuuming when the table starts to accumulate way too
many LP_DEAD items. You need a much more sophisticated model for this.
It might also need to hook into autovacuums scheduling.

One of the dangers of high BYPASS_THRESHOLD_PAGES settings is that
it'll work well for some indexes but not others. To a dramatic degree,
even.

That said, nbtree isn't the only index AM, and it is hard to be
completely sure that you've caught everything. So an off switch seems
like a good idea now.

> We'll likely on get to know whether they're right in > 1 year
> - not having a real out at that point imo is somewhat scary.
>
> That said, adding more and more reloptions has a significant cost, so I
> don't think it's clear cut that it's the right decision to add
> one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for
> BYPASS_THRESHOLD_PAGES?

I think that the right way to do this is to generalize INDEX_CLEANUP
to support a mode of operation that disallows vacuumlazy.c from
applying this optimization, as well as any similar optimizations which
will be added in the future.

Even if you don't buy my argument about directly parameterizing
BYPASS_THRESHOLD_PAGES undermining future work, allowing it to be set
much higher than 5% - 10% would be a pretty big footgun. It might
appear to help at first, but risks destabilizing things much later on.


--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Possible SSI bug in heap_update
Next
From: Bharath Rupireddy
Date:
Subject: Re: TRUNCATE on foreign table