Re: pgsql: Fix a couple of bugs in MultiXactId freezing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgsql: Fix a couple of bugs in MultiXactId freezing
Date
Msg-id 20131212012330.GA5209@awork2.anarazel.de
Whole thread Raw
In response to Re: pgsql: Fix a couple of bugs in MultiXactId freezing  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: pgsql: Fix a couple of bugs in MultiXactId freezing
Re: pgsql: Fix a couple of bugs in MultiXactId freezing
List pgsql-hackers
On 2013-12-11 22:08:41 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-12-11 14:00:05 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > On 2013-12-09 19:14:58 -0300, Alvaro Herrera wrote:
> > > > > !         if (ISUPDATE_from_mxstatus(members[i].status) &&
> > > > > !             !TransactionIdDidAbort(members[i].xid))#
> > > > 
> > > > It makes me wary to see a DidAbort() without a previous InProgress()
> > > > call.  Also, after we crashed, doesn't DidAbort() possibly return
> > > > false for transactions that were in progress before we crashed? At
> > > > least that's how I always understood it, and how tqual.c is written.
> > > 
> > > Yes, that's correct.  But note that here we're not doing a tuple
> > > liveliness test, which is what tqual.c is doing.  What we do with this
> > > info is to keep the Xid as part of the multi if it's still running or
> > > committed.  We also keep it if the xact crashed, which is fine because
> > > the Xid will be removed by some later step.  If we know for certain that
> > > the update Xid is aborted, then we can ignore it, but this is just an
> > > optimization and not needed for correctness.
> > 
> > But why deviate that way? It doesn't seem to save us much?
> 
> Well, it does save something -- there not being a live update means we
> are likely to be able to invalidate the Xmax completely if there are no
> lockers; and even in the case where there are lockers, we will be able
> to set LOCK_ONLY which means faster access in several places.

What I mean is that we could just query TransactionIdIsInProgress() like
usual. I most of the cases it will be very cheap because of the
RecentXmin() check at its beginning.

> > I am not really sure what you are talking about. That you cannot
> > properly decode records before they have been processed by XLogInsert()?
> > If so, yes, that's pretty clear and I am pretty sure it will break in
> > lots of places if you try?
> 
> Well, not sure about "lots of places".  The only misbahavior I have seen
> is in those desc routines.  Of course, the redo routines might also
> fail, but then there's no way for them to be running ...

Hm. I would guess that e.g. display xl_xact_commit fails majorly.

> > > Right now there is one case in this code that returns
> > > FRM_INVALIDATE_XMAX when it's not strictly necessary, i.e. it would also
> > > work to keep the Multi as is and return FRM_NOOP instead; and it also
> > > returns FRM_NOOP in one case when we could return FRM_INVALIDATE_XMAX
> > > instead.  Neither does any great damage, but there is a consideration
> > > that future examiners of the tuple would have to resolve the MultiXact
> > > by themselves (==> performance hit).  On the other hand, returning
> > > INVALIDATE causes the block to be dirtied, which is undesirable if not
> > > already dirty.
> > 
> > Otherwise it will be marked dirty the next time reads the page, so I
> > don't think this is problematic.
> 
> Not necessarily.  I mean, if somebody sees a multi, they might just
> resolve it to its members and otherwise leave the page alone.  Or in
> some cases not even resolve to members (if it's LOCK_ONLY and old enough
> to be behind the oldest visible multi).

But the work has to be done anyway, even if possibly slightly later?

> > > !                  * Tell caller to set HEAP_XMAX_COMMITTED hint while we have
> > > !                  * the Xid in cache.  Again, this is just an optimization, so
> > > !                  * it's not a problem if the transaction is still running and
> > > !                  * in the process of committing.
> > > !                  */
> > > !                 if (TransactionIdDidCommit(update_xid))
> > > !                     update_committed = true;
> > > ! 
> > > !                 update_xid = newmembers[i].xid;
> > > !             }
> > 
> > I don't think the conclusions here are correct - we might be setting
> > HEAP_XMAX_COMMITTED a smudge to early that way. If the updating
> > transaction is in progress, there's the situation that we have updated
> > the clog, but have not yet removed ourselves from the procarray. I.e. a
> > situation in which TransactionIdIsInProgress() and
> > TransactionIdDidCommit() both return true. Afaik it is only correct to
> > set HEAP_XMAX_COMMITTED once TransactionIdIsInProgress() returns false.
> 
> Hmm ... Is there an actual difference?  I mean, a transaction that
> marked itself as committed in pg_clog cannot return to any other state,
> regardless of what happens elsewhere.

But it could lead to other transactions seing the row as committed, even
though it isn't really yet.
tqual.c sayeth:* NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)* before
TransactionIdDidCommit/TransactionIdDidAbort(which look in* pg_clog).  Otherwise we have a race condition: we might
decidethat a* just-committed transaction crashed, because none of the tests succeed.* xact.c is careful to record
commit/abortin pg_clog before it unsets* MyPgXact->xid in PGXACT array.  That fixes that problem, but it also* means
thereis a window where TransactionIdIsInProgress and* TransactionIdDidCommit will both return true.  If we check only*
TransactionIdDidCommit,we could consider a tuple committed when a* later GetSnapshotData call will still think the
originatingtransaction* is in progress, which leads to application-level inconsistency.    The* upshot is that we gotta
checkTransactionIdIsInProgress first in all* code paths, except for a few cases where we are looking at*
subtransactionsof our own main transaction and so there can't be any* race condition.
 

I don't think there's any reason to deviate from this pattern here. For
old xids TransactionIdIsInProgress() should be really cheap.

> > I worry that this MultiXactIdSetOldestMember() will be problematic in
> > longrunning vacuums like a anti-wraparound vacuum of a huge
> > table. There's no real need to set MultiXactIdSetOldestMember() here,
> > since we will not become the member of a multi. So I think you should
> > either move the Assert() in MultiXactIdCreateFromMembers() to it's other
> > callers, or add a parameter to skip it.
> 
> I would like to have the Assert() work automatically, that is, check the
> PROC_IN_VACUUM flag in MyProc->vacuumflags ... but this probably won't
> work with CLUSTER.  That said, I think we *should* call SetOldestMember
> in CLUSTER.  So maybe both things should be conditional on
> PROC_IN_VACUUM.

Why should it be dependent on cluster? SetOldestMember() defines the
oldest multi we can be a member of. Even in cluster, the freezing will
not make us a member of a multi. If the transaction does something else
requiring SetOldestMember(), that will do it?

> > I worry that all these multixact accesses will create huge performance
> > problems due to the inefficiency of the multixactid cache. If you scan a
> > huge table there very well might be millions of different multis we
> > touch and afaics most of them will end up in the multixactid cache. That
> > can't end well.
> > I think we need to either regularly delete that cache when it goes past,
> > say, 100 entries, or just bypass it entirely.
> 
> Delete the whole cache, or just prune it of the least recently used
> entries?  Maybe the cache should be a dlist instead of the open-coded
> stuff that's there now; that would enable pruning of the oldest entries.
> I think a blanket deletion might be a cure worse than the disease.  I
> see your point anyhow.

I was thinking of just deleting the whole thing. Revamping the cache
mechanism to be more efficient, is an important goal, but it imo
shouldn't be lumped together with this. Now you could argue that purging
the cache shouldn't be either - but 9.3.2+ the worst case essentially is
O(n^2) in the number of rows in a table. Don't think that can be
acceptable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pgsql: Fix a couple of bugs in MultiXactId freezing
Next
From: Kyotaro HORIGUCHI
Date:
Subject: [BUG] Archive recovery failure on 9.3+.