Re: Teaching users how they can get the most out of HOT in Postgres 14 - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Teaching users how they can get the most out of HOT in Postgres 14
Date
Msg-id CAH2-Wzn4b=osqDi4wa2ajuePzZLTWFL_kz2rz1--dNZe9+SdeQ@mail.gmail.com
Whole thread Raw
In response to Re: Teaching users how they can get the most out of HOT in Postgres 14  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Teaching users how they can get the most out of HOT in Postgres 14
List pgsql-hackers
On Thu, Jun 3, 2021 at 11:15 PM Michael Paquier <michael@paquier.xyz> wrote:
> I have read through the patch, and I am surprised to see that this
> only makes possible to control the optimization at relation level.
> The origin of the complaints is that this index cleanup optimization
> has been introduced as a new rule that gets enforced at *system*
> level, so I think that we should have an equivalent with a GUC to
> control the behavior for the whole system.

*Why* does it have to work at the system level? I don't understand
what you mean about the system level.

As Masahiko pointed out, adding a GUC isn't what we've done in other
similar cases -- that's how DISABLE_PAGE_SKIPPING works, which was a
defensive option that seems similar enough to what we want to add now.
To give another example, the TRUNCATE VACUUM option (or the related
reloption) can be used to disable relation truncation, a behavior that
sometimes causes *big* issues in production. The truncate behavior is
determined dynamically in most situations -- which is another
similarity to the optimization we've added here.

Why is this fundamentally different to those two things?

> With what you are
> presenting here, one could only disable the optimization for each
> relation, one-by-one.  If this optimization proves to be a problem,
> it's just going to be harder to users to go through all the relations
> and re-tune autovacuum.  Am I missing something?

Why would you expect autovacuum to run even when the optimization is
unavailable (e.g. with Postgres 13)? After all, the specifics of when
the bypass optimization kicks in make it very unlikely that ANALYZE
will ever be able to notice enough dead tuples to trigger an
autovacuum (barring antiwraparound and insert-driven autovacuums).
There will probably be very few LP_DEAD items remaining. Occasionally
there will be somewhat more LP_DEAD items, that happen to be
concentrated in less than 2% of the table's blocks -- but block-based
sampling by ANALYZE is likely to miss most of them and underestimate
the total number. The sampling approach taken by acquire_sample_rows()
ensures this with larger tables. With small tables the chances of the
optimization kicking in are very low, unless perhaps fillfactor has
been tuned very aggressively.

There has never been a guarantee that autovacuum will be triggered
(and do index vacuuming) in cases that have very few LP_DEAD items, no
matter how the system has been tuned. The main reason why making the
optimization behavior controllable is for the VACUUM command.
Principally for hackers. I can imagine myself using the VACUUM option
to disable the optimization when I was interested in testing VACUUM or
space utilization in some specific, narrow way.

Of course it's true that there is still some uncertainty about the
optimization harming production workloads -- that is to be expected
with an enhancement like this one. But there is still no actual
example or test case that shows the optimization doing the wrong
thing, or anything like it. Anything is possible, but I am not
expecting there to be even one user complaint about the feature.
Naturally I don't want to add something as heavyweight as a GUC, given
all that.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_stat_progress_create_index vs. parallel index builds
Next
From: Jeff Davis
Date:
Subject: Re: Replication protocol doc fix