Re: Teaching users how they can get the most out of HOT in Postgres 14 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Teaching users how they can get the most out of HOT in Postgres 14 |
Date | |
Msg-id | CAH2-WzkqPHK+YAa6bEEF+gfJ43M1rdBxmmsd_bf0DP8OhFeOqA@mail.gmail.com Whole thread Raw |
In response to | Re: Teaching users how they can get the most out of HOT in Postgres 14 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Teaching users how they can get the most out of HOT in Postgres 14
|
List | pgsql-hackers |
On Tue, May 18, 2021 at 7:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I prefer to have an on/off switch just in case. I remember I also > commented the same thing before. We’ve discussed a way to control > whether or not to enable the skipping optimization by adding a new > mode to INDEX_CLEANUP option, as Peter mentioned. For example, we can > use the new mode “auto” (or “smart”) mode by default, enabling all > skipping optimizations, and specifying “on” disables them. Or we can > add “force” mode to disable all skipping optimizations while leaving > the existing modes as they are. Anyway, I think it’s not a good idea > to add a new GUC parameter that we’re not sure how to tune. > > IIUC skipping index vacuum when less than 2% of relation pages with at > least one LP_DEAD is a table’s optimization rather than a btree > index’s optimization. Right. There *is* an excellent way to tune this behavior: by adjusting heap fillfactor to make HOT more effective. That was why I started this thread! If you leave heap fillfactor at the default of 100, and have lots of updates (that don't modify indexed columns) and no deletes, then you're almost certainly not going to have VACUUM skip indexes anyway -- in practice you're bound to exceed having 2% of pages with an LP_DEAD item before very long. Tuning heap fillfactor is practically essential to see a real benefit, regardless of the exact BYPASS_THRESHOLD_PAGES. (There may be some rare exceptions, but for the most part this mechanism helps with tables that get many updates that are expected to use HOT, and will use HOT barring a tiny number of cases where the new tuple won't' quite fit, etc.) The idea of tuning the behavior directly (e.g. with a reloption that lets the user specify a BYPASS_THRESHOLD_PAGES style threshold) is exactly backwards. The point for the user should not be to skip indexes during VACUUM. The point for the user is to get lots of non-HOT updates to *avoid heap fragmentation*, guided by the new autovacuum instrumentation. That also means that there will be much less index vacuuming. But that's a pretty minor side-benefit. Why should the user *expect* largely unnecessary index vacuuming to take place? To put it another way, the index bypass mechanism added to vacuumlazy.c was not intended to add a new good behavior. It was actually intended to subtract an old bad behavior. The patch is mostly useful because it allows the user to make VACUUM *more* aggressive with freezing and VM bit setting (not less aggressive with indexes). The BYPASS_THRESHOLD_PAGES threshold of 0.02 is a little arbitrary -- but only a little. > Since we’re not likely to set many pages > all-visible or collect many dead tuples in that case, we can skip > index vacuuming and table vacuuming. IIUC this case, fortunately, goes > well together btree indexes’ bottom-up deletion. It's true that bottom-up index deletion provides additional insurance against problems, but I don't think that that insurance is strictly necessary. It's nice to have insurance, though. > If this skipping > behavior badly affects other indexes AMs, this optimization should be > considered within btree indexes, although we will need a way for index > AMs to consider and tell the vacuum strategy. But I guess this works > well in most cases so having an on/off switch suffice. Right. I doubt that it will actually turn out to be necessary to have such a switch. But I try to be modest when it comes to predicting what will be important to some user workload -- it's way too complicated to have total confidence about something like that. It is a risk to be managed. -- Peter Geoghegan
pgsql-hackers by date: