Thread: The case against multixact GUCs
Hackers, In the 9.3.3 updates, we added three new GUCs to control multixact freezing. This was an unprecented move in my memory -- I can't recall ever adding a GUC to a minor release which wasn't backwards compatibility for a security fix. This was a mistake. What makes these GUCs worse is that nobody knows how to set them; nobody on this list and nobody in the field. Heck, I doubt 1 in 1000 of our users (or 1 in 10 people on this list) know what a multixact *is*. Further, there's no clear justification why these cannot be set to be the same as our other freeze ages (which our users also don't understand), or a constant calculated portion of them, or just a constant. Since nobody anticipated someone adding a GUC in a minor release, there was no discussion of this topic that I can find; the new GUCs were added as a "side effect" of fixing the multixact vacuum issue.Certainly I would have raised a red flag if the discussionof the new GUCs hadn't been buried deep inside really long emails. Adding new GUCs which nobody has any idea how to set, or can even explain to new users, is not a service to our users. These should be removed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Sigh ... Josh Berkus wrote: > Further, there's no clear justification why these cannot be set to be > the same as our other freeze ages (which our users also don't > understand), or a constant calculated portion of them, or just a > constant. Calculated portion was my first proposal. The objection that was raised was that there's no actual correlation between Xid consumption rate and multixact consumption rate. That observation is correct; in some use cases multixacts will be consumed faster, in others they will be consumed slower. So there's no way to have multixact cleanup not cause extra autovacuum load if we don't have the parameters. > Since nobody anticipated someone adding a GUC in a minor > release, there was no discussion of this topic that I can find; the new > GUCs were added as a "side effect" of fixing the multixact vacuum issue. > Certainly I would have raised a red flag if the discussion of the new > GUCs hadn't been buried deep inside really long emails. When problems are tough, explanations get long. There's no way around that. I cannot go highlighting text in red hoping that people will read those parts. > Adding new GUCs which nobody has any idea how to set, or can even > explain to new users, is not a service to our users. These should be > removed. I don't think we're going to remove the parameters. My interpretation of the paragraph above is "can we please have some documentation that explains how to set these parameters". To that, the answer is sure, we can. However, I don't have time to write it at this point. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Josh Berkus wrote > Hackers, > > In the 9.3.3 updates, we added three new GUCs to control multixact > freezing. This was an unprecented move in my memory -- I can't recall > ever adding a GUC to a minor release which wasn't backwards > compatibility for a security fix. This was a mistake. It probably should have been handled better but the decision to make these parameters visible itself doesn't seem to be the wrong decision - especially when limited to a fairly recently released back-branch. > What makes these GUCs worse is that nobody knows how to set them; nobody > on this list and nobody in the field. Heck, I doubt 1 in 1000 of our > users (or 1 in 10 people on this list) know what a multixact *is*. That isn't a reason in itself to not have the capability if it is actually needed. > Further, there's no clear justification why these cannot be set to be > the same as our other freeze ages (which our users also don't > understand), or a constant calculated portion of them, or just a > constant. Since nobody anticipated someone adding a GUC in a minor > release, there was no discussion of this topic that I can find; the new > GUCs were added as a "side effect" of fixing the multixact vacuum issue. > Certainly I would have raised a red flag if the discussion of the new > GUCs hadn't been buried deep inside really long emails. The release documentation makes a pointed claim that the situation WAS that the two were identical; but the different consumption rates dictated making the multi-xact configuration independently configurable. So in effect the GUC was always present - just not user-visible. Even if there are not any current "best practices" surrounding this topic at least this way as methods are developed there is an actual place to put the derived value. As a starting point one can simply look at the defaults and, if they have change the value for the non-multi value apply the same factor to the custom multi-version. Now, obviously someone has to think to actually do that - and the release notes probably should have provided such guidance - but as I state explicitly below the issue is more about insufficient communication and education and less about providing the flexibility. > Adding new GUCs which nobody has any idea how to set, or can even > explain to new users, is not a service to our users. These should be > removed. Or we should insist that those few that do have an understanding create some kind of wiki document, or even a documentation section, to educate those that are not as knowledgeable in this area. For good reason much of the recent focus in this area has been actually getting the feature to work. Presuming that it is a desirable feature - which it hopefully is given it made it into the wild - to have then such focus has obviously been necessary given the apparent complexity of this feature (as evidenced by the recent serious bug reports) but hopefully the feature itself is mostly locked down and education will begin. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/The-case-against-multixact-GUCs-tp5795561p5795573.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Josh Berkus wrote: > What makes these GUCs worse is that nobody knows how to set them; nobody > on this list and nobody in the field. Heck, I doubt 1 in 1000 of our > users (or 1 in 10 people on this list) know what a multixact *is*. I won't contend your first statement, but multixacts are explained in the documentation: http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND Yours, Laurenz Albe
On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote: > In the 9.3.3 updates, we added three new GUCs to control multixact > freezing. This was an unprecented move in my memory -- I can't recall > ever adding a GUC to a minor release which wasn't backwards > compatibility for a security fix. This was a mistake. I disagree. I think it was the right decision. I think it was a mistake not including all of that stuff in the first place, and I think it's good that we've now corrected that oversight. > What makes these GUCs worse is that nobody knows how to set them; nobody > on this list and nobody in the field. Heck, I doubt 1 in 1000 of our > users (or 1 in 10 people on this list) know what a multixact *is*. Yeah, and that's a problem. See, it turns out that we love periodic full-table scans to freeze xmin so much that, in 9.3, we committed to a design that requires us to make periodic full-table scans to freeze xmax, too. That may or may not have been a good decision, but at this point we're stuck with it. People are going to have to come to understand the requirements there just as they do for freezing xmin. Denying the user the ability to adjust the thresholds is not going to accelerate the process of figuring out how they should be set. > Further, there's no clear justification why these cannot be set to be > the same as our other freeze ages (which our users also don't > understand), or a constant calculated portion of them, or just a > constant. On most systems, mxid consumption will be much slower than xid consumption because most users won't use tuple locks all that heavily.If we made all the defaults the same, then a full-tablescan for xid freezing would likely conclude that the many or all of the mxids weren't old enough to be frozen yet. To the greatest extent possible, we want full-table vacuums for either XID freezing or MXID freezing to advance both relfrozenxid and relminmxid so that we don't go through and freeze for one reason and then have to come back and freeze for the other reasons shortly thereafter. Nobody knows exactly how to set the settings to make that happen just yet, so we need settings at least until people can determine what values work well in practice - and probably permanently, because unfortunately I think the answer is likely workload-dependent. > Since nobody anticipated someone adding a GUC in a minor > release, there was no discussion of this topic that I can find; the new > GUCs were added as a "side effect" of fixing the multixact vacuum issue. > Certainly I would have raised a red flag if the discussion of the new > GUCs hadn't been buried deep inside really long emails. Alvaro did explicitly ask if anyone wanted to oppose back-patching. I don't think you can really blame him if you didn't see/read that email. > Adding new GUCs which nobody has any idea how to set, or can even > explain to new users, is not a service to our users. These should be > removed. The need for these GUCs is an outgrowth of the fkey locking stuff. Unless we rip that out again or rewrite it completely, the need for them doesn't seem likely to go away - so we're going to need to learn to live with it, not pretend like it isn't a problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/12/2014 06:26 PM, Robert Haas wrote: > On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote: >> In the 9.3.3 updates, we added three new GUCs to control multixact >> freezing. This was an unprecented move in my memory -- I can't recall >> ever adding a GUC to a minor release which wasn't backwards >> compatibility for a security fix. This was a mistake. > > I disagree. I think it was the right decision. I think it was a > mistake not including all of that stuff in the first place, and I > think it's good that we've now corrected that oversight. In hindsight, I think permanent multixids in their current form was a mistake. Before 9.3, the thing that made multixids special was that they could just be thrown away at a restart. They didn't need freezing. Now that they do, why not just use regular XIDs for them? We had to duplicate much of the wraparound and freezing logic for multixids that simply would not have been an issue if we had used regular XIDs instead. We could've perhaps kept the old multixids for their original purpose, as transient xids that can be forgotten about after all the old snapshots are gone. But for the permanent ones, it would've been simpler if we handled them more like subxids; make them part of the same XID space as regular XIDs. This is pretty hand-wavy of course, and it's too late now. - Heikki
On Wed, Mar 12, 2014 at 12:45 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 03/12/2014 06:26 PM, Robert Haas wrote: >> On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> In the 9.3.3 updates, we added three new GUCs to control multixact >>> freezing. This was an unprecented move in my memory -- I can't recall >>> ever adding a GUC to a minor release which wasn't backwards >>> compatibility for a security fix. This was a mistake. >> >> I disagree. I think it was the right decision. I think it was a >> mistake not including all of that stuff in the first place, and I >> think it's good that we've now corrected that oversight. > > In hindsight, I think permanent multixids in their current form was a > mistake. Before 9.3, the thing that made multixids special was that they > could just be thrown away at a restart. They didn't need freezing. Now that > they do, why not just use regular XIDs for them? Well, the numbering of MXIDs is closely bound up with their storage format. To do what you're proposing, we'd need to invent some new way of associating an XID-used-as-MXID with update XID, list of lockers, and lock modes. Which is certainly possible, but it's not obvious that it's a good idea. I *am* concerned that we didn't adequately weigh the costs of adding another thing that has to be frozen before we did it. Clearly, the feature has a lot of benefit, or will once we've flushed out most of the bugs. But it's hard to say at this point how much the cost is going to be, and I do think that's cause for concern. But I'm not convinced that unifying the XID and MXID spaces would have addressed that concern to any measurable degree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/12/2014 09:45 AM, Heikki Linnakangas wrote: > In hindsight, I think permanent multixids in their current form was a > mistake. Before 9.3, the thing that made multixids special was that they > could just be thrown away at a restart. They didn't need freezing. Now > that they do, why not just use regular XIDs for them? We had to > duplicate much of the wraparound and freezing logic for multixids that > simply would not have been an issue if we had used regular XIDs instead. > > We could've perhaps kept the old multixids for their original purpose, > as transient xids that can be forgotten about after all the old > snapshots are gone. But for the permanent ones, it would've been simpler > if we handled them more like subxids; make them part of the same XID > space as regular XIDs. > > This is pretty hand-wavy of course, and it's too late now. So, if we ripped out all the multixact stuff for 9.4, what would that cost us? I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Seems like this was just a bad patch and we should rip it out. What features do we lose? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-04-16 11:10:52 -0700, Josh Berkus wrote: > On 03/12/2014 09:45 AM, Heikki Linnakangas wrote: > > In hindsight, I think permanent multixids in their current form was a > > mistake. Before 9.3, the thing that made multixids special was that they > > could just be thrown away at a restart. They didn't need freezing. Now > > that they do, why not just use regular XIDs for them? We had to > > duplicate much of the wraparound and freezing logic for multixids that > > simply would not have been an issue if we had used regular XIDs instead. > > > > We could've perhaps kept the old multixids for their original purpose, > > as transient xids that can be forgotten about after all the old > > snapshots are gone. But for the permanent ones, it would've been simpler > > if we handled them more like subxids; make them part of the same XID > > space as regular XIDs. > > > > This is pretty hand-wavy of course, and it's too late now. > > So, if we ripped out all the multixact stuff for 9.4, what would that > cost us? Ripping multixacts out in general? Err, right. We'd loose shared row level locks... I think ripping out stuff at this point would be the cause of many, many more bugs than it'd prevent. > I'm serious. The multixact stuff has been broken since 9.3 > was released, and it's *still* broken. We can't give users any guidance > or tools on how to set multixact stuff, and autovacuum doesn't handle it > properly. Sorry, but I think you're blowing some GUCs *WAY* out of proportion. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/16/2014 11:22 AM, Andres Freund wrote: >> I'm serious. The multixact stuff has been broken since 9.3 >> was released, and it's *still* broken. We can't give users any guidance >> or tools on how to set multixact stuff, and autovacuum doesn't handle it >> properly. > > Sorry, but I think you're blowing some GUCs *WAY* out of proportion. I'm not talking about the GUCs. I'm talking about the data corruption bugs. Including the new one this week. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-04-16 11:25:49 -0700, Josh Berkus wrote: > On 04/16/2014 11:22 AM, Andres Freund wrote: > >> I'm serious. The multixact stuff has been broken since 9.3 > >> was released, and it's *still* broken. We can't give users any guidance > >> or tools on how to set multixact stuff, and autovacuum doesn't handle it > >> properly. > > > > Sorry, but I think you're blowing some GUCs *WAY* out of proportion. > > I'm not talking about the GUCs. That was about: "We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly." > I'm talking about the data corruption bugs. That was covered by "at this point ripping this out seems likely to cause many more bugs than it would solve". > Including the new one this week. Lets hold our horses a bit, we don't know what's happening there for now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/16/2014 11:30 AM, Andres Freund wrote: > On 2014-04-16 11:25:49 -0700, Josh Berkus wrote: >> On 04/16/2014 11:22 AM, Andres Freund wrote: >>>> I'm serious. The multixact stuff has been broken since 9.3 >>>> was released, and it's *still* broken. We can't give users any guidance >>>> or tools on how to set multixact stuff, and autovacuum doesn't handle it >>>> properly. >>> >>> Sorry, but I think you're blowing some GUCs *WAY* out of proportion. >> >> I'm not talking about the GUCs. > > That was about: > "We can't give users any guidance or tools on how to set multixact > stuff, and autovacuum doesn't handle it properly." OK. I will point out that if multixact freeze was an *intentional* feature, we'd never have accepted it given the total lack of either documentation or monitorability. > >> I'm talking about the data corruption bugs. > > That was covered by "at this point ripping this out seems likely to > cause many more bugs than it would solve". That's certainly possible. I just don't think the option of reversing those patches should be off the table. Things have been bad enough that that might be the best option. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com