Thread: The case against multixact GUCs

The case against multixact GUCs

From
Josh Berkus
Date:
Hackers,

In the 9.3.3 updates, we added three new GUCs to control multixact
freezing.  This was an unprecented move in my memory -- I can't recall
ever adding a GUC to a minor release which wasn't backwards
compatibility for a security fix.  This was a mistake.

What makes these GUCs worse is that nobody knows how to set them; nobody
on this list and nobody in the field.  Heck, I doubt 1 in 1000 of our
users (or 1 in 10 people on this list) know what a multixact *is*.

Further, there's no clear justification why these cannot be set to be
the same as our other freeze ages (which our users also don't
understand), or a constant calculated portion of them, or just a
constant.  Since nobody anticipated someone adding a GUC in a minor
release, there was no discussion of this topic that I can find; the new
GUCs were added as a "side effect" of fixing the multixact vacuum issue.Certainly I would have raised a red flag if the
discussionof the new
 
GUCs hadn't been buried deep inside really long emails.

Adding new GUCs which nobody has any idea how to set, or can even
explain to new users, is not a service to our users.  These should be
removed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: The case against multixact GUCs

From
Alvaro Herrera
Date:
Sigh ...

Josh Berkus wrote:

> Further, there's no clear justification why these cannot be set to be
> the same as our other freeze ages (which our users also don't
> understand), or a constant calculated portion of them, or just a
> constant.

Calculated portion was my first proposal.  The objection that was raised
was that there's no actual correlation between Xid consumption rate and
multixact consumption rate.  That observation is correct; in some use
cases multixacts will be consumed faster, in others they will be
consumed slower.  So there's no way to have multixact cleanup not cause
extra autovacuum load if we don't have the parameters.

> Since nobody anticipated someone adding a GUC in a minor
> release, there was no discussion of this topic that I can find; the new
> GUCs were added as a "side effect" of fixing the multixact vacuum issue.
>  Certainly I would have raised a red flag if the discussion of the new
> GUCs hadn't been buried deep inside really long emails.

When problems are tough, explanations get long.  There's no way around
that.  I cannot go highlighting text in red hoping that people will read
those parts.

> Adding new GUCs which nobody has any idea how to set, or can even
> explain to new users, is not a service to our users.  These should be
> removed.

I don't think we're going to remove the parameters.  My interpretation
of the paragraph above is "can we please have some documentation that
explains how to set these parameters".  To that, the answer is sure, we
can.  However, I don't have time to write it at this point.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: The case against multixact GUCs

From
David Johnston
Date:
Josh Berkus wrote
> Hackers,
> 
> In the 9.3.3 updates, we added three new GUCs to control multixact
> freezing.  This was an unprecented move in my memory -- I can't recall
> ever adding a GUC to a minor release which wasn't backwards
> compatibility for a security fix.  This was a mistake.

It probably should have been handled better but the decision to make these
parameters visible itself doesn't seem to be the wrong decision - especially
when limited to a fairly recently released back-branch.


> What makes these GUCs worse is that nobody knows how to set them; nobody
> on this list and nobody in the field.  Heck, I doubt 1 in 1000 of our
> users (or 1 in 10 people on this list) know what a multixact *is*.

That isn't a reason in itself to not have the capability if it is actually
needed.


> Further, there's no clear justification why these cannot be set to be
> the same as our other freeze ages (which our users also don't
> understand), or a constant calculated portion of them, or just a
> constant.  Since nobody anticipated someone adding a GUC in a minor
> release, there was no discussion of this topic that I can find; the new
> GUCs were added as a "side effect" of fixing the multixact vacuum issue.
>  Certainly I would have raised a red flag if the discussion of the new
> GUCs hadn't been buried deep inside really long emails.

The release documentation makes a pointed claim that the situation WAS that
the two were identical; but the different consumption rates dictated making
the multi-xact configuration independently configurable.  So in effect the
GUC was always present - just not user-visible.

Even if there are not any current "best practices" surrounding this topic at
least this way as methods are developed there is an actual place to put the
derived value.  As a starting point one can simply look at the defaults and,
if they have change the value for the non-multi value apply the same factor
to the custom multi-version.

Now, obviously someone has to think to actually do that - and the release
notes probably should have provided such guidance - but as I state
explicitly below the issue is more about insufficient communication and
education and less about providing the flexibility.


> Adding new GUCs which nobody has any idea how to set, or can even
> explain to new users, is not a service to our users.  These should be
> removed.

Or we should insist that those few that do have an understanding create some
kind of wiki document, or even a documentation section, to educate those
that are not as knowledgeable in this area.

For good reason much of the recent focus in this area has been actually
getting the feature to work.  Presuming that it is a desirable feature -
which it hopefully is given it made it into the wild - to have then such
focus has obviously been necessary given the apparent complexity of this
feature (as evidenced by the recent serious bug reports) but hopefully the
feature itself is mostly locked down and education will begin.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/The-case-against-multixact-GUCs-tp5795561p5795573.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: The case against multixact GUCs

From
Albe Laurenz
Date:
Josh Berkus wrote:
> What makes these GUCs worse is that nobody knows how to set them; nobody
> on this list and nobody in the field.  Heck, I doubt 1 in 1000 of our
> users (or 1 in 10 people on this list) know what a multixact *is*.

I won't contend your first statement, but multixacts are explained
in the documentation:

http://www.postgresql.org/docs/9.3/static/routine-vacuuming.html#VACUUM-FOR-MULTIXACT-WRAPAROUND

Yours,
Laurenz Albe

Re: The case against multixact GUCs

From
Robert Haas
Date:
On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
> In the 9.3.3 updates, we added three new GUCs to control multixact
> freezing.  This was an unprecented move in my memory -- I can't recall
> ever adding a GUC to a minor release which wasn't backwards
> compatibility for a security fix.  This was a mistake.

I disagree.  I think it was the right decision.  I think it was a
mistake not including all of that stuff in the first place, and I
think it's good that we've now corrected that oversight.

> What makes these GUCs worse is that nobody knows how to set them; nobody
> on this list and nobody in the field.  Heck, I doubt 1 in 1000 of our
> users (or 1 in 10 people on this list) know what a multixact *is*.

Yeah, and that's a problem.   See, it turns out that we love periodic
full-table scans to freeze xmin so much that, in 9.3, we committed to
a design that requires us to make periodic full-table scans to freeze
xmax, too.  That may or may not have been a good decision, but at this
point we're stuck with it.  People are going to have to come to
understand the requirements there just as they do for freezing xmin.
Denying the user the ability to adjust the thresholds is not going to
accelerate the process of figuring out how they should be set.

> Further, there's no clear justification why these cannot be set to be
> the same as our other freeze ages (which our users also don't
> understand), or a constant calculated portion of them, or just a
> constant.

On most systems, mxid consumption will be much slower than xid
consumption because most users won't use tuple locks all that heavily.If we made all the defaults the same, then a
full-tablescan for xid
 
freezing would likely conclude that the many or all of the mxids
weren't old enough to be frozen yet.  To the greatest extent possible,
we want full-table vacuums for either XID freezing or MXID freezing to
advance both relfrozenxid and relminmxid so that we don't go through
and freeze for one reason and then have to come back and freeze for
the other reasons shortly thereafter.  Nobody knows exactly how to set
the settings to make that happen just yet, so we need settings at
least until people can determine what values work well in practice -
and probably permanently, because unfortunately I think the answer is
likely workload-dependent.

> Since nobody anticipated someone adding a GUC in a minor
> release, there was no discussion of this topic that I can find; the new
> GUCs were added as a "side effect" of fixing the multixact vacuum issue.
>  Certainly I would have raised a red flag if the discussion of the new
> GUCs hadn't been buried deep inside really long emails.

Alvaro did explicitly ask if anyone wanted to oppose back-patching.  I
don't think you can really blame him if you didn't see/read that
email.

> Adding new GUCs which nobody has any idea how to set, or can even
> explain to new users, is not a service to our users.  These should be
> removed.

The need for these GUCs is an outgrowth of the fkey locking stuff.
Unless we rip that out again or rewrite it completely, the need for
them doesn't seem likely to go away - so we're going to need to learn
to live with it, not pretend like it isn't a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The case against multixact GUCs

From
Heikki Linnakangas
Date:
On 03/12/2014 06:26 PM, Robert Haas wrote:
> On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> In the 9.3.3 updates, we added three new GUCs to control multixact
>> freezing.  This was an unprecented move in my memory -- I can't recall
>> ever adding a GUC to a minor release which wasn't backwards
>> compatibility for a security fix.  This was a mistake.
>
> I disagree.  I think it was the right decision.  I think it was a
> mistake not including all of that stuff in the first place, and I
> think it's good that we've now corrected that oversight.

In hindsight, I think permanent multixids in their current form was a 
mistake. Before 9.3, the thing that made multixids special was that they 
could just be thrown away at a restart. They didn't need freezing. Now 
that they do, why not just use regular XIDs for them? We had to 
duplicate much of the wraparound and freezing logic for multixids that 
simply would not have been an issue if we had used regular XIDs instead.

We could've perhaps kept the old multixids for their original purpose, 
as transient xids that can be forgotten about after all the old 
snapshots are gone. But for the permanent ones, it would've been simpler 
if we handled them more like subxids; make them part of the same XID 
space as regular XIDs.

This is pretty hand-wavy of course, and it's too late now.

- Heikki



Re: The case against multixact GUCs

From
Robert Haas
Date:
On Wed, Mar 12, 2014 at 12:45 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 03/12/2014 06:26 PM, Robert Haas wrote:
>> On Tue, Mar 11, 2014 at 3:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> In the 9.3.3 updates, we added three new GUCs to control multixact
>>> freezing.  This was an unprecented move in my memory -- I can't recall
>>> ever adding a GUC to a minor release which wasn't backwards
>>> compatibility for a security fix.  This was a mistake.
>>
>> I disagree.  I think it was the right decision.  I think it was a
>> mistake not including all of that stuff in the first place, and I
>> think it's good that we've now corrected that oversight.
>
> In hindsight, I think permanent multixids in their current form was a
> mistake. Before 9.3, the thing that made multixids special was that they
> could just be thrown away at a restart. They didn't need freezing. Now that
> they do, why not just use regular XIDs for them?

Well, the numbering of MXIDs is closely bound up with their storage
format.  To do what you're proposing, we'd need to invent some new way
of associating an XID-used-as-MXID with update XID, list of lockers,
and lock modes.  Which is certainly possible, but it's not obvious
that it's a good idea.

I *am* concerned that we didn't adequately weigh the costs of adding
another thing that has to be frozen before we did it.  Clearly, the
feature has a lot of benefit, or will once we've flushed out most of
the bugs.  But it's hard to say at this point how much the cost is
going to be, and I do think that's cause for concern.  But I'm not
convinced that unifying the XID and MXID spaces would have addressed
that concern to any measurable degree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The case against multixact GUCs

From
Josh Berkus
Date:
On 03/12/2014 09:45 AM, Heikki Linnakangas wrote:
> In hindsight, I think permanent multixids in their current form was a
> mistake. Before 9.3, the thing that made multixids special was that they
> could just be thrown away at a restart. They didn't need freezing. Now
> that they do, why not just use regular XIDs for them? We had to
> duplicate much of the wraparound and freezing logic for multixids that
> simply would not have been an issue if we had used regular XIDs instead.
> 
> We could've perhaps kept the old multixids for their original purpose,
> as transient xids that can be forgotten about after all the old
> snapshots are gone. But for the permanent ones, it would've been simpler
> if we handled them more like subxids; make them part of the same XID
> space as regular XIDs.
> 
> This is pretty hand-wavy of course, and it's too late now.

So, if we ripped out all the multixact stuff for 9.4, what would that
cost us?  I'm serious.  The multixact stuff has been broken since 9.3
was released, and it's *still* broken.  We can't give users any guidance
or tools on how to set multixact stuff, and autovacuum doesn't handle it
properly.

Seems like this was just a bad patch and we should rip it out.  What
features do we lose?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: The case against multixact GUCs

From
Andres Freund
Date:
On 2014-04-16 11:10:52 -0700, Josh Berkus wrote:
> On 03/12/2014 09:45 AM, Heikki Linnakangas wrote:
> > In hindsight, I think permanent multixids in their current form was a
> > mistake. Before 9.3, the thing that made multixids special was that they
> > could just be thrown away at a restart. They didn't need freezing. Now
> > that they do, why not just use regular XIDs for them? We had to
> > duplicate much of the wraparound and freezing logic for multixids that
> > simply would not have been an issue if we had used regular XIDs instead.
> > 
> > We could've perhaps kept the old multixids for their original purpose,
> > as transient xids that can be forgotten about after all the old
> > snapshots are gone. But for the permanent ones, it would've been simpler
> > if we handled them more like subxids; make them part of the same XID
> > space as regular XIDs.
> > 
> > This is pretty hand-wavy of course, and it's too late now.
> 
> So, if we ripped out all the multixact stuff for 9.4, what would that
> cost us?

Ripping multixacts out in general? Err, right. We'd loose shared row
level locks...

I think ripping out stuff at this point would be the cause of many, many
more bugs than it'd prevent.

> I'm serious.  The multixact stuff has been broken since 9.3
> was released, and it's *still* broken. We can't give users any guidance
> or tools on how to set multixact stuff, and autovacuum doesn't handle it
> properly.

Sorry, but I think you're blowing some GUCs *WAY* out of proportion.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: The case against multixact GUCs

From
Josh Berkus
Date:
On 04/16/2014 11:22 AM, Andres Freund wrote:
>> I'm serious.  The multixact stuff has been broken since 9.3
>> was released, and it's *still* broken. We can't give users any guidance
>> or tools on how to set multixact stuff, and autovacuum doesn't handle it
>> properly.
> 
> Sorry, but I think you're blowing some GUCs *WAY* out of proportion.

I'm not talking about the GUCs. I'm talking about the data corruption
bugs.  Including the new one this week.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: The case against multixact GUCs

From
Andres Freund
Date:
On 2014-04-16 11:25:49 -0700, Josh Berkus wrote:
> On 04/16/2014 11:22 AM, Andres Freund wrote:
> >> I'm serious.  The multixact stuff has been broken since 9.3
> >> was released, and it's *still* broken. We can't give users any guidance
> >> or tools on how to set multixact stuff, and autovacuum doesn't handle it
> >> properly.
> > 
> > Sorry, but I think you're blowing some GUCs *WAY* out of proportion.
> 
> I'm not talking about the GUCs.

That was about:
"We can't give users any guidance or tools on how to set multixact
stuff, and autovacuum doesn't handle it properly."

> I'm talking about the data corruption bugs.

That was covered by "at this point ripping this out seems likely to
cause many more bugs than it would solve".

> Including the new one this week.

Lets hold our horses a bit, we don't know what's happening there for
now.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: The case against multixact GUCs

From
Josh Berkus
Date:
On 04/16/2014 11:30 AM, Andres Freund wrote:
> On 2014-04-16 11:25:49 -0700, Josh Berkus wrote:
>> On 04/16/2014 11:22 AM, Andres Freund wrote:
>>>> I'm serious.  The multixact stuff has been broken since 9.3
>>>> was released, and it's *still* broken. We can't give users any guidance
>>>> or tools on how to set multixact stuff, and autovacuum doesn't handle it
>>>> properly.
>>>
>>> Sorry, but I think you're blowing some GUCs *WAY* out of proportion.
>>
>> I'm not talking about the GUCs.
> 
> That was about:
> "We can't give users any guidance or tools on how to set multixact
> stuff, and autovacuum doesn't handle it properly."

OK.  I will point out that if multixact freeze was an *intentional*
feature, we'd never have accepted it given the total lack of either
documentation or monitorability.

> 
>> I'm talking about the data corruption bugs.
> 
> That was covered by "at this point ripping this out seems likely to
> cause many more bugs than it would solve".

That's certainly possible.  I just don't think the option of reversing
those patches should be off the table.  Things have been bad enough that
that might be the best option.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com