Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers
From | Laurenz Albe |
---|---|
Subject | Re: Berserk Autovacuum (let's save next Mandrill) |
Date | |
Msg-id | a6b9a97b4b1dfcb61b6e3ccd66566d2da005eac3.camel@cybertec.at Whole thread Raw |
In response to | Re: Berserk Autovacuum (let's save next Mandrill) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Berserk Autovacuum (let's save next Mandrill)
|
List | pgsql-hackers |
On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote: > I am doubtful it should be committed with the current settings. See below. > > > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001 > > From: Laurenz Albe <laurenz.albe@cybertec.at> > > Date: Thu, 19 Mar 2020 20:26:43 +0100 > > Subject: [PATCH] Autovacuum tables that have received only inserts > > > > This avoids the known problem that insert-only tables > > are never autovacuumed until they need to have their > > anti-wraparound autovacuum, which then can be massive > > and disruptive. > > Shouldn't this also mention index only scans? IMO that's at least as big > a problem as the "large vacuum" problem. Yes, that would be good. > I am *VERY* doubtful that the attempt of using a large threshold, and a > tiny scale factor, is going to work out well. I'm not confident enough > in my gut feeling to full throatedly object, but confident enough that > I'd immediately change it on any important database I operated. > > Independent of how large a constant you set the threshold to, for > databases with substantially bigger tables this will lead to [near] > constant vacuuming. As soon as you hit 1 billion rows - which isn't > actually that much - this is equivalent to setting > autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where > that can be a sensible setting, but I don't think anybody would suggest > it as a default. In that, you are assuming that the bigger a table is, the more data modifications it will get, so that making the scale factor the dominant element will work out better. My experience is that it is more likely for the change rate (inserts, I am less certain about updates and deletes) to be independent of the table size. (Too) many large databases are so large not because the data influx grows linearly over time, but because people don't want to get rid of old data (or would very much like to do so, but never planned for it). This second scenario would be much better served by a high threshold and a low scale factor. > After thinking about it for a while, I think it's fundamentally flawed > to use large constant thresholds to avoid unnecessary vacuums. It's easy > to see cases where it's bad for common databases of today, but it'll be > much worse a few years down the line where common table sizes have grown > by a magnitude or two. Nor do they address the difference between tables > of a certain size with e.g. 2kb wide rows, and a same sized table with > 28 byte wide rows. The point of constant thresholds imo can only be to > avoid unnecessary work at the *small* (even tiny) end, not the opposite. > > > I think there's too much "reinventing" autovacuum scheduling in a > "local" insert-only manner happening in this thread. And as far as I can > tell additionally only looking at a somewhat narrow slice of insert only > workloads. Perhaps. The traditional "high scale factor, low threshold" system is (in my perception) mostly based on the objective of cleaning up dead tuples. When autovacuum was introduced, index only scans were only a dream. With the objective of getting rid of dead tuples, having the scale factor be the dominant part makes sense: it is OK for bloat to be a certain percentage of the table size. Also, as you say, tables were much smaller then, and they will only become bigger in the future. But I find that to be an argument *for* making the threshold the dominant element: otherwise, you vacuum less and less often, and the individual runs become larger and larger. Now that vacuum skips pages where it knows it has nothing to do, doesn't take away much of the pain of vacuuming large tables where nothing much has changed? > I, again, strongly suggest using much more conservative values here. And > then try to address the shortcomings - like not freezing aggressively > enough - in separate patches (and by now separate releases, in all > likelihood). There is much to say for that, I agree. > This will have a huge impact on a lot of postgres > installations. Autovacuum already is perceived as one of the biggest > issues around postgres. If the ratio of cases where these changes > improve things to the cases it regresses isn't huge, it'll be painful > (silent improvements are obviously less noticed than breakages). Yes, that makes it scary to mess with autovacuum. One of the problems I see in the course of this discussion is that one can always come up with examples that make any choice look bad. It is impossible to do it right for everybody. In the light of that, I won't object to a more conservative default value for the parameters, even though my considerations above suggest to me the opposite. But perhaps my conclusions are based on flawed premises. Yours, Laurenz Albe
pgsql-hackers by date: