Thread: MultiXact pessmization in 9.3

MultiXact pessmization in 9.3

From
Andres Freund
Date:
Hi,

While looking at the multixact truncation code (mail nearby), I've
noticed that there's a significant difference in the way multixact
members are accessed since fkey locks were introduced:

<9.3 when heap_lock_tuple finds a XMAX_IS_MULTI tuple it executes
MultiXactIdWait() which in turn uses GetMultiXactIdMembers() to get the
xids to wait for. But it skips the lookup if the mxid we lookup is older
than OldestVisibleMXactId.

9.3+ instead always looks up the members because GetMultiXactIdMembers()
is also used in cases where we need to access old xids (to check whether
members have commited in non-key updates). I've seen several profiles -
independent from this investigation - where GetMultiXactIdMembers() is
high up in the profiles. So this seems to be a significant
pessimization.
I think we need to re-introduce that optimization for the - hopefully
very common - case of LOCK_ONLY multixacts.

The file header's comment about OldestVisibleMXactId[] seems to be
out-of-date btw, since it's now perfectly normal to access mxids that
are older than what MultiXactIdSetOldestVisible() computed.

Maybe I am just missing something?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: MultiXact pessmization in 9.3

From
Alvaro Herrera
Date:
Andres Freund wrote:

> While looking at the multixact truncation code (mail nearby), I've
> noticed that there's a significant difference in the way multixact
> members are accessed since fkey locks were introduced:
> 
> <9.3 when heap_lock_tuple finds a XMAX_IS_MULTI tuple it executes
> MultiXactIdWait() which in turn uses GetMultiXactIdMembers() to get the
> xids to wait for. But it skips the lookup if the mxid we lookup is older
> than OldestVisibleMXactId.
> 
> 9.3+ instead always looks up the members because GetMultiXactIdMembers()
> is also used in cases where we need to access old xids (to check whether
> members have commited in non-key updates).

But HeapTupleSatisfiesUpdate(), called at the top of heap_lock_tuple,
has already called MultiXactIdIsRunning() (which calls GetMembers)
before that's even considered.  So the call heap_lock_tuple should have
members obtained from the multixact.c cache.

Am I misunderstanding which code path you mean?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: MultiXact pessmization in 9.3

From
Andres Freund
Date:
On 2013-11-22 12:04:31 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > While looking at the multixact truncation code (mail nearby), I've
> > noticed that there's a significant difference in the way multixact
> > members are accessed since fkey locks were introduced:
> > 
> > <9.3 when heap_lock_tuple finds a XMAX_IS_MULTI tuple it executes
> > MultiXactIdWait() which in turn uses GetMultiXactIdMembers() to get the
> > xids to wait for. But it skips the lookup if the mxid we lookup is older
> > than OldestVisibleMXactId.
> > 
> > 9.3+ instead always looks up the members because GetMultiXactIdMembers()
> > is also used in cases where we need to access old xids (to check whether
> > members have commited in non-key updates).
> 
> But HeapTupleSatisfiesUpdate(), called at the top of heap_lock_tuple,
> has already called MultiXactIdIsRunning() (which calls GetMembers)
> before that's even considered.  So the call heap_lock_tuple should have
> members obtained from the multixact.c cache.
> 
> Am I misunderstanding which code path you mean?

Yes, somewhat: <9.3 GetMultiXactIdMembers() didn't do any work if the
multixact was old enough:if (MultiXactIdPrecedes(multi, OldestVisibleMXactId[MyBackendId])){    debug_elog2(DEBUG2,
"GetMembers:it's too old");    *xids = NULL;    return -1;}
 
so, in OLTP style workloads, doing a heap_lock_tuple() often didn't have
to perform any IO when locking a tuple that previously had been locked
using a multixact. We knew that none of the members could still be
running and thus didn't have to ask the SLRU for them since a
not-running member cannot still have a row locked.

Now, fkey locks changed that because for !HEAP_XMAX_LOCK_ONLY mxacts, we
need to look up the members to get the update xid and check whether it
has committed or aborted, even if we know that it isn't currently
running anymore due do OldestVisibleMXactId. But there's absolutely no
need to do that for HEAP_XMAX_LOCK_ONLY mxacts, the old reasoning for
not looking up the members if old is just fine.

Additionally, we don't ever set hint bits indicating that a
HEAP_XMAX_IS_MULTI & !HEAP_XMAX_LOCK_ONLY mxact has commited, so we'll
do HeapTupleGetUpdateXid(), TransactionIdIsCurrentTransactionId(),
TransactionIdIsInProgress(), TransactionIdDidCommit() calls for every
HeapTupleSatisfiesMVCC().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: MultiXact pessmization in 9.3

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2013-11-22 12:04:31 -0300, Alvaro Herrera wrote:

> Yes, somewhat: <9.3 GetMultiXactIdMembers() didn't do any work if the
> multixact was old enough:
>     if (MultiXactIdPrecedes(multi, OldestVisibleMXactId[MyBackendId]))
>     {
>         debug_elog2(DEBUG2, "GetMembers: it's too old");
>         *xids = NULL;
>         return -1;
>     }
> so, in OLTP style workloads, doing a heap_lock_tuple() often didn't have
> to perform any IO when locking a tuple that previously had been locked
> using a multixact. We knew that none of the members could still be
> running and thus didn't have to ask the SLRU for them since a
> not-running member cannot still have a row locked.

Right.

> Now, fkey locks changed that because for !HEAP_XMAX_LOCK_ONLY mxacts, we
> need to look up the members to get the update xid and check whether it
> has committed or aborted, even if we know that it isn't currently
> running anymore due do OldestVisibleMXactId. But there's absolutely no
> need to do that for HEAP_XMAX_LOCK_ONLY mxacts, the old reasoning for
> not looking up the members if old is just fine.

Correct.  The only difficulty here is that we would need to pass down
the fact that we know for certain that this is only a locking Multixact.
There are some callers that go to it indirectly via MultiXactIdWait or
MultiXactIdExpand, but now that I look I think it's fine for those to
pass false (i.e. assume there might be an update and disable the
optimization), since those aren't hot compared to the other cases.

This patch implements this idea, but I haven't tested it much beyond
compiling and ensuring it passes the existing tests.

Regarding the outdated comment, I had a rewritten version of that
somewhere which I evidently forgot to push :-(  I noticed it was
outdated a couple of weeks after I pushed the main patch.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: MultiXact pessmization in 9.3

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Correct.  The only difficulty here is that we would need to pass down
> the fact that we know for certain that this is only a locking Multixact.
> There are some callers that go to it indirectly via MultiXactIdWait or
> MultiXactIdExpand, but now that I look I think it's fine for those to
> pass false (i.e. assume there might be an update and disable the
> optimization), since those aren't hot compared to the other cases.
>
> This patch implements this idea, but I haven't tested it much beyond
> compiling and ensuring it passes the existing tests.

.. and it turns out it doesn't work.  To be really effective, we need
MultiXactIdIsRunning to be passed down the flag too, so it can pass it
to GetMultiXactIdMembers.

One other thought is that MultiXactIdIsRunning and GetMultiXactIdMembers
are public functions, so this patch would represent an API change in
9.3.  I doubt any external modules would be relying on these functions,
but there's so many care and thought put into avoiding API changes on
released versions that I'm nervous about doing it here.  So I think we'd
need to provide a compatibility shim to avoid that.

(I generally dislike to keep compatibility stuff forever, so I would
provide this backward-compatible functions in 9.3 only.  Anyone using it
would have to fix the code and recompile for 9.4+.  This means a #ifdef
in code meant to work on top of both 9.3 and 9.4.  Anyone opines
otherwise?)

The other idea is to just not backpatch this.

Other than that, this patch implements the optimization suggested here.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: MultiXact pessmization in 9.3

From
Andres Freund
Date:
On 2013-11-27 19:24:35 -0300, Alvaro Herrera wrote:
> One other thought is that MultiXactIdIsRunning and GetMultiXactIdMembers
> are public functions, so this patch would represent an API change in
> 9.3.  I doubt any external modules would be relying on these functions,
> but there's so many care and thought put into avoiding API changes on
> released versions that I'm nervous about doing it here.  So I think we'd
> need to provide a compatibility shim to avoid that.

-0.5 for providing compatibility shims for this. There really doesn't
seem to be legitimate use for that api outside the guts of heapam.c
besides crude debugging hacks. So it seems like wasted effort.

> (I generally dislike to keep compatibility stuff forever, so I would
> provide this backward-compatible functions in 9.3 only.  Anyone using it
> would have to fix the code and recompile for 9.4+.  This means a #ifdef
> in code meant to work on top of both 9.3 and 9.4.  Anyone opines
> otherwise?)

If at all, we definitely should only do it for 9.3.

> The other idea is to just not backpatch this.

I think backpatching is a good idea, I have seen GetMultiXactIdMembers()
+ slru code take up 80% cpu in strange workloads. But it possibly might
be a good idea to wait till after the next point release to give people
at least a minimal chance of catching problems.

> Other than that, this patch implements the optimization suggested here.

Great!

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: MultiXact pessmization in 9.3

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-11-27 19:24:35 -0300, Alvaro Herrera wrote:
>> The other idea is to just not backpatch this.

> I think backpatching is a good idea, I have seen GetMultiXactIdMembers()
> + slru code take up 80% cpu in strange workloads. But it possibly might
> be a good idea to wait till after the next point release to give people
> at least a minimal chance of catching problems.

Agreed on both counts.  We're close enough now to Monday's wrap that we
should avoid anything that risks destabilizing 9.3.x, unless it's to fix
a serious bug.  AIUI this is just a performance issue, so let's wait till
after 9.3.2 is done to push in the fix.  But since it is a performance
regression from pre-9.3, never back-patching the fix at all isn't very
attractive.
        regards, tom lane