Re: Bug in wait time when waiting on nested subtransaction - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Bug in wait time when waiting on nested subtransaction
Date
Msg-id CANbhV-E9adw5rmj4rYY==-kUNLkwOdWadPYubEcfjmgFteR54g@mail.gmail.com
Whole thread Raw
In response to Re: Bug in wait time when waiting on nested subtransaction  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Bug in wait time when waiting on nested subtransaction
List pgsql-hackers
On Mon, 28 Nov 2022 at 17:38, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 28, 2022 at 10:28 AM Simon Riggs
> <simon.riggs@enterprisedb.com> wrote:
> > So we have these options
> >
> > 1. Removing the XactLockTableDelete() call in CommitSubTransaction().
> > That releases lock waiters earlier than expected, which requires
> > pushups in XactLockTableWait() to cope with that (which are currently
> > broken). Why do we do it? To save shmem space in the lock table should
> > anyone want to run a transaction that contains thousands of
> > subtransactions, or more. So option (1) alone would eventually cause
> > us to run out of space in the lock table and a transaction would
> > receive ERRORs rather than be allowed to run for a long time.
>
> This seems unprincipled to me. The point of having a lock on the
> subtransaction in the lock table is so that people can wait for the
> subtransaction to end. If we don't remove the lock table entry at
> subtransaction end, then that lock table entry doesn't serve that
> purpose any more.

An easy point to confuse:
"subtransaction to end": The subtransaction is "still running" to
other backends even AFTER it has been subcommitted, but its state now
transfers to the parent.

So the subtransaction doesn't cease running until it aborts, one of
its parent aborts or top level commit. The subxid lock should, on
principle, exist until one of those events occurs. It doesn't, but
that is an optimization, for the stated reason.

(All of the above is current behavior).

> > 2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
> > we go up the levels one by one as we did before. However, (2) causes
> > huge subtrans contention and if we implemented that and backpatched it
> > the performance issues could be significant. So my feeling is that if
> > we do (2) then we should not backpatch it.
>
> What I find suspicious about the coding of this function is that it
> doesn't distinguish between commits and aborts at all. Like, if a
> subtransaction commits, the changes don't become globally visible
> until the parent transaction also commits. If a subtransaction aborts,
> though, what happens to the top-level XID doesn't seem to matter at
> all. The comment seems to agree:
>
>  * Note that this does the right thing for subtransactions: if we wait on a
>  * subtransaction, we will exit as soon as it aborts or its top parent commits.
>
> I feel like what I'd expect to see given this comment is code which
> (1) waits until the supplied XID is no longer running, (2) checks
> whether the XID aborted, and if so return at once, and (3) otherwise
> recurse to the parent XID. But the code doesn't do that. Perhaps
> that's not actually the right thing to do, since it seems like a big
> behavior change, but then I don't understand the comment.

As I mention above, the correct behavior is that the subxact doesn't
cease running until it aborts, one of its parent aborts or top level
commit.

Which is slightly different from the comment, which may explain why
the bug exists.

> Incidentally, one possible optimization here to try to release locking
> traffic would be to try waiting for the top-parent first using a
> conditional lock acquisition. If that works, cool. If not, go back
> around and work up the tree level by level. Since that path would only
> be taken in the unhappy case where we're probably going to have to
> wait anyway, the cost probably wouldn't be too bad.

That sounds like a potential bug fix (not just an optimization).

> > My preferred solution would be a mix of the above, call it option (3)
> >
> > 3.
> > a) Include the lock table entry for the first 64 subtransactions only,
> > so that we limit shmem. For those first 64 entries, have the subtrans
> > point direct to top, since this makes a subtrans lookup into an O(1)
> > operation, which is important for performance of later actions.
> >
> > b) For any subtransactions after first 64, delete the subxid lock on
> > subcommit, to save shmem, but make subtrans point to the immediate
> > parent (only), not the topxid. That fixes the bug, but causes
> > performance problems in XidInMVCCSnapshot() and others, so we also do
> > c) and d)
> >
> > c) At top level commit, go back and alter subtrans again for subxids
> > so now it points to the topxid, so that we avoid O(N) behavior in
> > XidInMVCCSnapshot() and other callers. Additional cost for longer
> > transactions, but it saves considerable cost in later callers that
> > need to call  GetTopmostTransaction.
> >
> > d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
> > page-at-a-time. This will reduce the contention of repeatedly
> > re-visiting the same page(s) and ensure that a page is less often
> > paged out when we are still using it.
>
> I'm honestly not very sanguine about changing pg_subtrans to point
> straight to the topmost XID. It's only OK to do that if there's
> absolutely nothing that needs to know the full tree structure, and the
> present case looks like an instance where we would like to have the
> full tree structure. I would not be surprised if there are others.

There are no others. It's a short list and I've checked. I ask others
to do the same.

> That said, it seems a lot less likely that this would be an issue once
> the top-level transaction is no longer running. At that point, all the
> subtransactions are no longer running either: they either committed or
> they rolled back, and I can't see a reason why any code should care
> about anything other than which of those two things happened. So I
> think your idea in (c) might have some possibilities.
>
> You could also flip that idea around and have readers replace
> immediate parent pointers with top-parent pointers opportunistically,
> but I'm not sure that's actually any better. As you present it in (c)
> above, there's a risk of going back and updating CLOG state that no
> one will ever look at. But if you flipped it around and did it on the
> read side, then you'd have the risk of a bunch of backends trying to
> do it at the same time. I'm not sure whether either of those things is
> a big problem in practice, or whether both are, or neither.

(Subtrans state, not clog state)

But that puts the work on the reader rather than the writer,.

> I agree that it looks possible to optimize
> SubTransGetTopmostTransaction better for the case where we want to
> traverse multiple or all levels, so that fewer pages are read. I don't
> know to what degree that would affect user-observible performance, but
> I can believe that it could be a win.

Good.

Thanks for commenting.

--
Simon Riggs                http://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: walther@technowledgy.de
Date:
Subject: Re: fixing CREATEROLE
Next
From: Robert Haas
Date:
Subject: Re: Bug in wait time when waiting on nested subtransaction