Re: New IndexAM API controlling index vacuum strategies - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: New IndexAM API controlling index vacuum strategies |
Date | |
Msg-id | CAH2-Wz=+m+L3AYAL4DV=SMRr2SCjEBcXuebMd-n0Aevdemj0ZQ@mail.gmail.com Whole thread Raw |
In response to | Re: New IndexAM API controlling index vacuum strategies (Andres Freund <andres@anarazel.de>) |
Responses |
Re: New IndexAM API controlling index vacuum strategies
|
List | pgsql-hackers |
On Wed, Apr 14, 2021 at 8:38 PM Andres Freund <andres@anarazel.de> wrote: > The reason I didn't do further reviews for things in this thread was > that I was trying really hard to get the shared memory stats patch into > a committable shape - there were just not enough hours in the day. I > think it's to be expected that, during the final CF, there aren't a lot > of resources for reviewing patches that are substantially new. Why > should these new patches have gotten priority over a much older patch > set that also address significant operational issues? We're all doing our best. > It's very common for larger / busier databases to *substantially* > increase autovacuum_freeze_max_age, so there won't be 1.6 billion XIDs > of headroom, but a few hundred million. The cost of doing unnecessary > anti-wraparound vacuums is just too great. And databases on the busier & > larger side of things are precisely the ones that are more likely to hit > wraparound issues (otherwise you're just not that likely to burn through > that many xids). I think that this was once true, but is now much less common, mostly due to the freeze map stuff in 9.6. And due a general recognition that the *risk* of increasing them is just too great (a risk that we can hope was diminished by the failsafe, incidentally). As an example of this, Christophe Pettus had a Damascene conversion when it came to increasing autovacuum_freeze_max_age aggressively, which we explains here: https://thebuild.com/blog/2019/02/08/do-not-change-autovacuum-age-settings/ In short, he went from regularly advising clients to increase autovacuum_freeze_max_age to telling them to specifically advising them to never touch them. Even if we assume that I'm 100% wrong about autovacuum_freeze_max_age, it's still true that the vacuum_failsafe_age GUC is interpreted with reference to autovacuum_freeze_max_age -- it will always be interpreted as if it was set to 105% of whatever the current value of autovacuum_freeze_max_age happens to be (so it's symmetric with the freeze_table_age GUC and its 95% behavior). So it's never completely unreasonable in the sense that it directly clashes with an existing autovacuum_freeze_max_age setting from before the upgrade. Of course this doesn't mean that there couldn't possibly be any problems with the new mechanism clashing with autovacuum_freeze_max_age in some unforeseen way. But, the worst that can happen is that a user that is sophisticated enough to very aggressively increase autovacuum_freeze_max_age upgrades to Postgres 14, and then finds that index vacuuming is sometimes skipped. Which they'll see lots of annoying and scary messages about if they ever look in the logs. I think that that's an acceptable price to pay to protect the majority of less sophisticated users. > And my concern isn't really that vacuum would have finished without a > problem if cost limiting hadn't been disabled, but that having multiple > autovacuum workers going all out will cause problems. Like the system > slowing down so much that it's hard to fix the actual root cause of the > wraparound - I've seen systems with a bunch unthrottled autovacuum > overwhelme the IO subsystem so much that simply opening a connection to > fix the issue took 10+ minutes. Especially on systems with provisioned > IO (i.e. just about all cloud storage) that's not too hard to hit. I don't think that it's reasonable to expect an intervention like this to perfectly eliminate all risk, while at the same time never introducing any new theoretical risks. (Especially while also being simple and obviously correct.) > > If it's intrinsically impossible to advance relfrozenxid, then surely > > all bets are off. But even in this scenario it's very unlikely that we > > wouldn't at least do index vacuuming for those index tuples that are > > dead and safe to delete according to the OldestXmin cutoff. You still > > have 1.6 billion XIDs before the failsafe first kicks in, regardless > > of the issue of the OldestXmin/FreezeLimit being excessively far in > > the past. > > As I said above, I don't think the "1.6 billion XIDs" argument has > merit, because it's so reasonable (and common) to set > autovacuum_freeze_max_age to something much larger. No merit? Really? Not even a teeny, tiny, microscopic little bit of merit? You're sure? As I said, we handle the case where autovacuum_freeze_max_age is set to something larger than vacuum_failsafe_age is a straightforward and pretty sensible way. I am curious, though: what autovacuum_freeze_max_age setting is "much higher" than 1.6 billion, but somehow also not extremely ill-advised and dangerous? What number is that, precisely? Apparently this is common, but I must confess that it's the first I've heard about it. -- Peter Geoghegan
pgsql-hackers by date: