Re: Add 64-bit XIDs into PostgreSQL 15 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add 64-bit XIDs into PostgreSQL 15 |
Date | |
Msg-id | CA+TgmoaFv3kco0+XqaO7csm_VpQLb5y_XSM090xhTxo4r8Zd-Q@mail.gmail.com Whole thread Raw |
In response to | Re: Add 64-bit XIDs into PostgreSQL 15 (Chris Travers <chris@orioledata.com>) |
Responses |
Re: Add 64-bit XIDs into PostgreSQL 15
Re: Add 64-bit XIDs into PostgreSQL 15 |
List | pgsql-hackers |
On Sat, Nov 26, 2022 at 4:08 AM Chris Travers <chris@orioledata.com> wrote: > I didn't see any changes to pg_upgrade to make this change possible on upgrade. Is that also outside of the scope of yourpatch set? If so how is that continuity supposed to be ensured? The scheme is documented in their 0006 patch, in a README.XID file. I'm not entirely confident that it's the best design and have argued against it in the past, but it's not crazy. More generally, while I think there's plenty of stuff to be concerned about in this patch set and while I'm somewhat skeptical about the likelihood of its getting or staying committed, I can't really understand your concerns in particular. The thrust of your concern seems to be that if we allow people to get further behind, recovery will be more difficult. I'm not sure I see the problem. Suppose that we adopt this proposal and that it is bug-free. Now, consider a user who gets 8 billion XIDs behind. They probably have to vacuum pretty much every page in the database to do that, or least every page in the tables that haven't been vacuumed recently. But that would likely also be true if they were 800 million XIDs behind, as is possible today. The effort to catch up doesn't increase linearly with how far behind you are, and is always bounded by the DB size. It is true that if the table is progressively bloating, it is likely to be more bloated by the time you are 8 billion XIDs behind than it was when you were 800 million XIDs behind. I don't see that as a very good reason not to adopt this patch, because you can bloat the table by an arbitrarily large amount while consuming only a small number of XiDs, even just 1 XID. Protecting against bloat is good, but shutting down the database when the XID age reaches a certain value is not a particularly effective way of doing that, so saying that we'll be hurting people by not shutting down the database at the point where we do so today doesn't ring true to me. I think that most people who get to the point of wraparound shutdown have workloads where bloat isn't a huge issue, because those who do start having problems with the bloat way before they run out of XIDs. It would be entirely possible to add a parameter to the system that says "hey, you know we can keep running even if we're a shazillion XiDs behind, but instead shut down when we are behind by this number of XIDs." Then, if somebody wants to force an automatic shutdown at that point, they could, and I think that then the scenario you're worried about just can't happen any more . But isn't that a little bit silly? You could also just monitor how far behind you are and page the DBA when you get behind by more than a certain number of XIDs. Then, you wouldn't be risking a shutdown, and you'd still be able to stay on top of the XID ages of your tables. Philosophically, I disagree with the idea of shutting down the database completely in any situation in which a reasonable alternative exists. Losing read and write availability is really bad, and I don't think it's what users want. I think that most users want the database to degrade gracefully when things do not go according to plan. Ideally, they'd like everything to Just Work, but reasonable users understand that sometimes there are going to be problems, and in my experience, what makes them happy is when the database acts to contain the scope of the problem so that it affects their workload as little as possible, rather than acting to magnify the problem so that it impacts their workload as much as possible. This patch, implementation and design concerns to one side, does that. I don't believe there's a single right answer to the question of what to do about vacuum falling behind, and I think it's worth exploring multiple avenues to improve the situation. You can have vacuum never run on a table at all, say because all of the workers are busy elsewhere, or because the table is locked until the heat death of the universe. You can have vacuum run on a table but too slowly to do any good, because of the vacuum cost delay mechanism. You can have vacuum run and finish but do little good because of prepared transactions or replication slots or long-running queries. It's reasonable to think about what kinds of steps might help in those different scenarios, and especially to think about what kind of steps might help in multiple cases. We should do that. But, I don't think any of that means that we can ignore the need for some kind of expansion of the XID space forever. Computers are getting faster. It's already possible to burn through the XID space in hours, and the number of hours is going to go down over time and maybe eventually the right unit will be minutes, or even seconds. Sometime before then, we need to do something to make the runway bigger, or else just give up on PostgreSQL being a relevant piece of software. Perhaps the thing we need to do is not exactly this, but if not, it's probably a sibling or cousin of this. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: