Re: Heads-Up: multixact freezing bug - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Heads-Up: multixact freezing bug
Date
Msg-id 20131128173507.GZ31748@awork2.anarazel.de
Whole thread Raw
In response to Re: Heads-Up: multixact freezing bug  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2013-11-28 14:10:43 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > Instead of calculating the multixact cutoff xid by using the global
> > minimum of OldestMemberMXactId[] and OldestVisibleMXactId[] and then
> > subtracting vacuum_freeze_min_age compute it solely as the minimum of
> > OldestMemberMXactId[]. If we do that computation *after* doing the
> > GetOldestXmin() in vacuum_set_xid_limits() we can be sure no mxact above
> > the new mxact cutoff will contain a xid below the xid cutoff. This is so
> > since it would otherwise have been reported as running by
> > GetOldestXmin().
> > With that change we can leave heap_tuple_needs_freeze() and
> > heap_freeze_tuple() unchanged since using the mxact cutoff is
> > sufficient.

> 2. Freezing too much has the disadvantage that you lose info possibly
> useful for forensics.  And I believe that freezing just after a multi
> has gone below the immediate visibility horizon will make them live far
> too little.  Now the performance guys are always saying how they would
> like tuples to even start life frozen, let alone delay any number of
> transactions before them being frozen; but to help the case for those
> who investigate and fix corrupted databases, we need a higher freeze
> horizon.  Heck, maybe even 100k multis would be enough to keep enough
> evidence to track bugs down.  I propose we keep at least a million.
> This is an even more important argument currently, given how buggy the
> current multixact code has proven to be.

The above proposal wouldn't trigger a full table vacuum, or similar. So
it's not like we'd eagerly remove multi xmaxs - and it wouldn't
influence freezing of anything but the multis.

> 3. I'm not sure I understand how the proposal above fixes things during
> recovery.  If we keep the multi values above the freeze horizon you
> propose above, are we certain no old Xid values will remain?

It would fix things because we could simply set the multi cutoff value
in the xl_heap_freeze record which would make it trivial to re-perform
freezing on the standby.

The big - and possibly fatal - problem is the amount of conflicts it
creates on standbys.

> 5. the new multixact stuff seems way too buggy.  Should we rip it all
> out and return to the old tuple locking scheme?  We spent a huge amount
> of time writing it and reviewing it and now maintaining, but I haven't
> seen a *single* performance report saying how awesome 9.3 is compared to
> older releases due to this change;

I think ripping it out at this point would create as many bugs as it
would fix :(. Especially as we'd still need a large part of the code in
to handle existing multis.

> the 9.3 request for testing, at the
> start of the beta period, didn't even mention to try it out *at all*.

That's pretty sad given it was a) one of the more awesome b) one of the
more complicated features.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Heads-Up: multixact freezing bug
Next
From: Tom Lane
Date:
Subject: Marginal performance improvement for fast-path locking