Re: What is "wraparound failure", really? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: What is "wraparound failure", really? |
Date | |
Msg-id | CA+TgmoZsvdo42wsviLCiK2bUuA3ge0V7=SNNh8sx1WgZxNpEkw@mail.gmail.com Whole thread Raw |
In response to | Re: What is "wraparound failure", really? (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: What is "wraparound failure", really?
|
List | pgsql-hackers |
On Mon, Jun 28, 2021 at 8:52 AM Andrew Dunstan <andrew@dunslane.net> wrote: > But if you're really worried about people setting > autovacuum_freeze_max_age too high, then maybe we should be talking > about capping it at a lower level rather than adjusting the docs that > most users don't read. The problem is that the setting is measuring something that is a pretty poor proxy for the thing we actually care about. It's measuring the XID age at which we're going to start forcing vacuums on tables that don't otherwise need to be vacuumed, but the thing we care about is the XID age at which those vacuums are going to *finish*. Now maybe you think that's a minor difference, and if your tables are small, it is, but if they're really big, it's not. If you have only tables that are say 1GB in size and your system is otherwise well-configured, you could probably crank autovacuum_freeze_max_age up all the way to the max without a problem. But if you have 1TB tables, you are going to need a lot more headroom. The exact amount of headroom you need depends especially on the size of your largest tables, but also on how well-distributed the relfrozenxid values are, and on the total sizes of all your tables, on your I/O subsystem, on your XID consumption rate, on your vacuum delay settings, and on whether you want to make any allowance for the rare but possible scenario where vacuum dies to an ERROR. This means that in practice nobody knows whether a particular setting of autovacuum_freeze_max_age on a particular system is safe or not, except in the absolutely most obvious cases. Capping it at a lower level would prevent some people from doing things that are perfectly safe and still not prevent other people from doing things that are horribly dangerous. I think what we really need here is some kind of deadline-based scheduler. As Peter says, the problem is that we might run out of XIDs. The system should be constantly thinking about that and taking appropriate emergency actions to make sure it doesn't happen. Right now it's really pretty chill about the possibility of looming disaster. Imagine that you hire a babysitter and tell them to get the kids out of the house if there's a fire. While you're out, a volcano erupts down the block. A giant cloud of ash forms and there's lava everywhere, even touching the house, which begins to smolder, but the babysitter just sits there and watches TV. As soon as the first flames appear, the babysitter stops watching TV, gets the kids, and tries to leave the premises. That's our autovacuum scheduler! It has no inclination or ability to see the future; it makes decisions entirely based on the present state of things. In a lot of cases that's OK, but sometimes it leads to a completely ridiculous outcome. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: