Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CAD21AoA56GgKbe9xmyUR0KTeNmvvXAqxD+2iSqxXprjKGDHLXA@mail.gmail.com Whole thread Raw |
In response to | New IndexAM API controlling index vacuum strategies (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
On Wed, Mar 17, 2021 at 7:21 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Mar 16, 2021 at 6:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > Note that I've merged multiple existing functions in vacuumlazy.c into > > > one: the patch merges lazy_vacuum_all_indexes() and lazy_vacuum_heap() > > > into a single function named vacuum_indexes_mark_unused() > > > I agree to create a function like vacuum_indexes_mark_unused() that > > makes a decision and does index and heap vacumming accordingly. But > > what is the point of removing both lazy_vacuum_all_indexes() and > > lazy_vacuum_heap()? I think we can simply have > > vacuum_indexes_mark_unused() call those functions. I'm concerned that > > if we add additional criteria to vacuum_indexes_mark_unused() in the > > future the function will become very large. > > I agree now. I became overly excited about advertising the fact that > these two functions are logically one thing. This is important, but it > isn't necessary to go as far as actually making everything into one > function. Adding some comments would also make that point clear, but > without making vacuumlazy.c even more spaghetti-like. I'll fix it. > > > > I wonder if we can add some kind of emergency anti-wraparound vacuum > > > logic to what I have here, for Postgres 14. > > > +1 > > > > I think we can set VACOPT_TERNARY_DISABLED to > > tab->at_params.index_cleanup in table_recheck_autovac() or increase > > the thresholds used to not skipping index vacuuming. > > I was worried about the "tupgone = true" special case causing problems > when we decide to do some index vacuuming and some heap > vacuuming/LP_UNUSED-marking but then later decide to end the VACUUM. > But I now think that getting rid of "tupgone = true" gives us total > freedom to > choose what to do, including the freedom to start with index vacuuming > and then give up on it later -- even after we do some amount of > LP_UNUSED-marking (during a VACUUM with multiple index passes, perhaps > due to a low maintenance_work_mem setting). That isn't actually > special at all, because everything will be 100% decoupled. > > Whether or not it's a good idea to either not start index vacuuming or > to end it early (e.g. due to XID wraparound pressure) is a complicated > question, and the right approach will be debatable in each case/when > thinking about each issue. However, deciding on the best behavior to > address these problems should have nothing to do with implementation > details and everything to do with the costs and benefits to users. > Which now seems possible. > > A sophisticated version of the "XID wraparound pressure" > implementation could increase reliability without ever being > conservative, just by evaluating the situation regularly and being > prepared to change course when necessary -- until it is truly a matter > of shutting down new XID allocations/the server. It should be possible > to decide to end VACUUM early and advance relfrozenxid (provided we > have reached the point of finishing all required pruning and freezing, > of course). Highly agile behavior seems quite possible, even if it > takes a while to agree on a good design. Since I was thinking that always skipping index vacuuming on anti-wraparound autovacuum is legitimate, skipping index vacuuming only when we're really close to the point of going into read-only mode seems a bit conservative, but maybe a good start. I've attached a PoC patch to disable index vacuuming if the table's relfrozenxid is too older than autovacuum_freeze_max_age (older than 1.5x of autovacuum_freeze_max_age). > > > Aside from whether it's good or bad, strictly speaking, it could > > change the index AM API contract. The documentation of > > amvacuumcleanup() says: > > > > --- > > stats is whatever the last ambulkdelete call returned, or NULL if > > ambulkdelete was not called because no tuples needed to be deleted. > > --- > > > > With this change, we could pass NULL to amvacuumcleanup even though > > the index might have tuples needed to be deleted. > > I think that we should add a "Note" box to the documentation that > notes the difference here. Though FWIW my interpretation of the words > "no tuples needed to be deleted" was always "no tuples needed to be > deleted because vacuumlazy.c didn't call ambulkdelete()". After all, > VACUUM can add to tups_vacuumed through pruning inside > heap_prune_chain(). It is already possible (though only barely) to not > call ambulkdelete() for indexes (to only call amvacuumcleanup() during > cleanup) despite the fact that heap vacuuming did "delete tuples". Agreed. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
pgsql-hackers by date: