Bug in wait time when waiting on nested subtransaction - Mailing list pgsql-hackers

From Simon Riggs
Subject Bug in wait time when waiting on nested subtransaction
Date
Msg-id CANbhV-HYfP0ebZRERkpt84ZCDsNX-UYJGYsjfS88jtbYzY+KcQ@mail.gmail.com
Whole thread Raw
Responses Re: Bug in wait time when waiting on nested subtransaction
Re: Bug in wait time when waiting on nested subtransaction
List pgsql-hackers
Issue in current XactLockTableWait, starting with 3c27944fb2141,
discovered while reviewing https://commitfest.postgresql.org/40/3806/

Test demonstrating the problem is 001-isotest-tuplelock-subxact.v1.patch

A narrative description of the issue follows:
session1 - requests multiple nested subtransactions like this:
BEGIN; ...
SAVEPOINT subxid1; ...
SAVEPOINT subxid2; ...

If another session2 sees an xid from subxid2, it waits. If subxid2
aborts, then session2 sees the abort and can continue processing
normally.
However, if subxid2 subcommits, then the lock wait moves from subxid2
to the topxid. If subxid1 subsequently aborts, it will also abort
subxid2, but session2 waiting for subxid2 to complete doesn't see this
and waits for topxid instead. Which means that it waits longer than it
should, and later arriving lock waiters may actually get served first.

So it's a bug, but not too awful, since in many cases people don't use
nested subtransactions, and if they do use SAVEPOINTs, they don't
often close them using RELEASE. And in most cases the extra wait time
is not very long, hence why nobody ever reported this issue.

Thanks to Robert Haas and Julien Tachoires for asking the question
"are you sure the existing coding is correct?". You were both right;
it is not.

How to fix? Correct lock wait can be achieved while a subxid is
running if we do either
* a lock table entry for the subxid OR
* a subtrans entry that points to its immediate parent

So we have these options

1. Removing the XactLockTableDelete() call in CommitSubTransaction().
That releases lock waiters earlier than expected, which requires
pushups in XactLockTableWait() to cope with that (which are currently
broken). Why do we do it? To save shmem space in the lock table should
anyone want to run a transaction that contains thousands of
subtransactions, or more. So option (1) alone would eventually cause
us to run out of space in the lock table and a transaction would
receive ERRORs rather than be allowed to run for a long time.

2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
we go up the levels one by one as we did before. However, (2) causes
huge subtrans contention and if we implemented that and backpatched it
the performance issues could be significant. So my feeling is that if
we do (2) then we should not backpatch it.

So both (1) and (2) have issues.

The main result from patch https://commitfest.postgresql.org/40/3806/
is that having subtrans point direct to topxid is very good for
performance in XidIsInMVCCSnapshot(), and presumably other places
also. This bug prevents the  simple application of a patch to improve
performance. So now we need a stronger mix of ideas to both resolve
the bug and fix the subtrans contention issue in HEAD.

My preferred solution would be a mix of the above, call it option (3)

3.
a) Include the lock table entry for the first 64 subtransactions only,
so that we limit shmem. For those first 64 entries, have the subtrans
point direct to top, since this makes a subtrans lookup into an O(1)
operation, which is important for performance of later actions.

b) For any subtransactions after first 64, delete the subxid lock on
subcommit, to save shmem, but make subtrans point to the immediate
parent (only), not the topxid. That fixes the bug, but causes
performance problems in XidInMVCCSnapshot() and others, so we also do
c) and d)

c) At top level commit, go back and alter subtrans again for subxids
so now it points to the topxid, so that we avoid O(N) behavior in
XidInMVCCSnapshot() and other callers. Additional cost for longer
transactions, but it saves considerable cost in later callers that
need to call  GetTopmostTransaction.

d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
page-at-a-time. This will reduce the contention of repeatedly
re-visiting the same page(s) and ensure that a page is less often
paged out when we are still using it.

Thoughts?

--
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Remove a unused argument from qual_is_pushdown_safe
Next
From: Nikita Malakhov
Date:
Subject: [PATCH] Infinite loop while acquiring new TOAST Oid