Robert Haas wrote:
> My colleague Thomas Munro and I have been working with Alvaro, and
> also with Kevin and Amit, to fix bug #12990, a multixact-related data
> corruption bug.
Thanks for this great summary of the situation.
> 1. I believe that there is still a narrow race condition that cause
> the multixact code to go crazy and delete all of its data when
> operating very near the threshold for member space exhaustion. See
> http://www.postgresql.org/message-id/CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
> for the scenario and proposed fix.
I agree that there is a problem here.
> 2. We have some logic that causes autovacuum to run in spite of
> autovacuum=off when wraparound threatens. My commit
> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
> anti-wraparound protections for multixact members that exist for
> multixact IDs and for regular XIDs, but this remains an outstanding
> issue. I believe I know how to fix this, and will work up an
> appropriate patch based on some of Thomas's earlier work.
I believe autovacuum=off is fortunately uncommon, but certainly getting
this issue fixed is a good idea.
> 3. It seems to me that there is a danger that some users could see
> extremely frequent anti-mxid-member-wraparound vacuums as a result of
> this work.
I agree with the idea that the long term solution to this issue is to
make the freeze process cheaper. I don't have any good ideas on how to
make this less severe in the interim. You say the fix for #8470 is not
tested thoroughly enough to back-patch it just yet, and I can behind
that; so let's wait until 9.5 has been tested a bit more.
Another avenue not mentioned and possibly worth exploring is making some
more use of the multixact cache, and reuse multixacts that were
previously issued and have the same effects as the one you're interested
in: for instance, if you want a multixact with locking members
(10,20,30) and you have one for (5,10,20,30) but transaction 5 has
finished, then essentially both have the same semantics (because locks
don't have any effect the transaction has finished) so we can use it
instead of creating a new one. I have no idea how to implement this;
obviously, having to run TransactionIdIsCurrentTransactionId for each
member on each multixact in the cache each time you want to create a new
multixact is not very reasonable.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services