Re: Teaching users how they can get the most out of HOT in Postgres 14 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Teaching users how they can get the most out of HOT in Postgres 14 |
Date | |
Msg-id | CAH2-WzkCYR0U7zXqXo0CgFaFwUDz1WbKq8ngjzKi4+AQ5f-mYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Teaching users how they can get the most out of HOT in Postgres 14 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Teaching users how they can get the most out of HOT in Postgres 14
|
List | pgsql-hackers |
On Mon, Apr 12, 2021 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: > Well, one argument is that you made a fairly significant behavioural > change, with hard-coded logic for when the optimization kicks in. It's > not at all clear that your constants are the right ones for every > workload. (Apparently nobody wants to talk about HOT and the documentation.) The BYPASS_THRESHOLD_PAGES cutoff was chosen conservatively, so that it would avoid index vacuuming in truly marginal cases -- and it tends to only delay it there. A table-level threshold is not the best way of constraining the problem. In the future, the table threshold should be treated as only one factor among several. Plus there will be more than a simple yes/no question to consider. We should eventually be able to do index vacuuming for some indexes but not others. Bottom-up index deletion has totally changed things here, because roughly speaking it makes index bloat proportionate to the number of logical changes to indexed columns -- you could have one super-bloated index on the table, but several others that perfectly retain their original size. You still need to do heap vacuuming eventually, which necessitates vacuuming indexes too, but the right strategy is probably to vacuum much more frequently, vacuuming the bloated index each time. You only do a full round of index vacuuming when the table starts to accumulate way too many LP_DEAD items. You need a much more sophisticated model for this. It might also need to hook into autovacuums scheduling. One of the dangers of high BYPASS_THRESHOLD_PAGES settings is that it'll work well for some indexes but not others. To a dramatic degree, even. That said, nbtree isn't the only index AM, and it is hard to be completely sure that you've caught everything. So an off switch seems like a good idea now. > We'll likely on get to know whether they're right in > 1 year > - not having a real out at that point imo is somewhat scary. > > That said, adding more and more reloptions has a significant cost, so I > don't think it's clear cut that it's the right decision to add > one. Perhaps vacuum_cleanup_index_scale_factor should just be reused for > BYPASS_THRESHOLD_PAGES? I think that the right way to do this is to generalize INDEX_CLEANUP to support a mode of operation that disallows vacuumlazy.c from applying this optimization, as well as any similar optimizations which will be added in the future. Even if you don't buy my argument about directly parameterizing BYPASS_THRESHOLD_PAGES undermining future work, allowing it to be set much higher than 5% - 10% would be a pretty big footgun. It might appear to help at first, but risks destabilizing things much later on. -- Peter Geoghegan
pgsql-hackers by date: