Thread: pgsql: Fix race condition in snapshot caching when 2PC is used.
Fix race condition in snapshot caching when 2PC is used. When preparing a transaction xactCompletionCount needs to be incremented, even though the transaction has not committed yet. Otherwise the snapshot used within the transaction otherwise can get reused outside of the prepared transaction. As GetSnapshotData() does not include the current xid when building a snapshot, reuse would not be correct. Somewhat surprisingly the regression tests only rarely show incorrect results without the fix. The reason for that is that often the snapshot's xmax will be >= the backend xid, yielding a snapshot that is correct, despite the bug. I'm working on a reliable test for the bug, but it seems worth seeing whether this fixes all the BF failures while I do. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/E1k7tGP-0005V0-5k@gemulon.postgresql.org Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/07f32fcd23ac81898ed47f88beb569c631a2f223 Modified Files -------------- src/backend/storage/ipc/procarray.c | 9 +++++++++ 1 file changed, 9 insertions(+)
Andres Freund <andres@anarazel.de> writes: > Fix race condition in snapshot caching when 2PC is used. Uh, is it really okay to modify that variable with only shared ProcArrayLock? regards, tom lane
Hi, On 2020-08-18 19:43:24 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Fix race condition in snapshot caching when 2PC is used. > > Uh, is it really okay to modify that variable with only shared > ProcArrayLock? Uh, you're right. I assume it'll fix the buildfarm visible issue regardless, but we surely don't want to rely on that. I'm inclined to just make ClearTransaction take an exclusive lock - the rest of the 2PC operations are so heavyweight that I can't imagine making a difference. When I tested the locking changes in ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all visible. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-08-18 19:43:24 -0400, Tom Lane wrote: >> Uh, is it really okay to modify that variable with only shared >> ProcArrayLock? > Uh, you're right. I assume it'll fix the buildfarm visible issue > regardless, but we surely don't want to rely on that. Yeah, having a failure show on the buildfarm would take an order of magnitude (at least) tighter timing than what we're seeing now. But I think it could possibly be a problem on big production iron. > I'm inclined to just make ClearTransaction take an exclusive lock - the > rest of the 2PC operations are so heavyweight that I can't imagine > making a difference. When I tested the locking changes in > ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all > visible. I was wondering if it'd be sensible to convert that counter into an atomic variable. That's not real clear, but worth thinking about. regards, tom lane
Hi, On 2020-08-18 19:55:50 -0400, Tom Lane wrote: > > I'm inclined to just make ClearTransaction take an exclusive lock - the > > rest of the 2PC operations are so heavyweight that I can't imagine > > making a difference. When I tested the locking changes in > > ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all > > visible. > > I was wondering if it'd be sensible to convert that counter into an > atomic variable. That's not real clear, but worth thinking about. I had it that way originally, but because xactCompletionCount otherwise needs to be modified with ProcArrayLock exclusively held, I moved it to a plain variable. It'd be a bad deal to actually modify it atomically unless we got something out of that ([1]), because that'd increase the number of atomic operations for commits, something we surely don't want. We could make the locking rules be that modifications with a shared ProcArrayLock have to be done with an atomic op, and operations with ProcArrayLock held exclusively may do it without an atomic op. But that seems a bit weird. A slightly bigger hammer that we could use would be to remove the somewhat weird split of 2PC commit between ProcArrayClearTransaction() and ProcArrayAdd(). As far as I can tell we could have EndPrepare()/MarkAsPrepared() do both operations with one exclusive ProcArrayLock acquisition. On balance I'd change it to an exclusive lock with a comment mentioning that it'd not be too hard to downgrade to a shared lock, should it ever become a performance issue. Greetings, Andres Freund [1] e.g. not needing to hold ProcArrayLock to check whether a cached snapshot can be used. The complexity there is that I think it adds a "race condition" where the global xmin horizon can go backward. It'd be detected and "reverted", but that still could make the horizon appear to go backwar for a concurrent horizon determination.
On Wed, 19 Aug 2020 at 12:37, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2020-08-18 19:55:50 -0400, Tom Lane wrote: > > > I'm inclined to just make ClearTransaction take an exclusive lock - the > > > rest of the 2PC operations are so heavyweight that I can't imagine > > > making a difference. When I tested the locking changes in > > > ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all > > > visible. > > > > I was wondering if it'd be sensible to convert that counter into an > > atomic variable. That's not real clear, but worth thinking about. > > I had it that way originally, but because xactCompletionCount otherwise > needs to be modified with ProcArrayLock exclusively held, I moved it to > a plain variable. It'd be a bad deal to actually modify it atomically > unless we got something out of that ([1]), because that'd increase the > number of atomic operations for commits, something we surely don't want. > > We could make the locking rules be that modifications with a shared > ProcArrayLock have to be done with an atomic op, and operations with > ProcArrayLock held exclusively may do it without an atomic op. But that > seems a bit weird. Couldn't it be done by creating two inline functions, one to call to atomically increment and the other to just increment? Can backup that the correct version of the function is being called with an Assert(LWLockHeldByMeInMode(ProcArrayLock, ...)); The weirdness of doing var++ on an atomic variable can be removed by mentioning why it's ok in a comment in the inline function. David
David Rowley <dgrowleyml@gmail.com> writes: >> On 2020-08-18 19:55:50 -0400, Tom Lane wrote: >>> On Wed, 19 Aug 2020 at 12:37, Andres Freund <andres@anarazel.de> wrote: >>>> I'm inclined to just make ClearTransaction take an exclusive lock - the >>>> rest of the 2PC operations are so heavyweight that I can't imagine >>>> making a difference. When I tested the locking changes in >>>> ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all >>>> visible. >>> I was wondering if it'd be sensible to convert that counter into an >>> atomic variable. That's not real clear, but worth thinking about. > Couldn't it be done by creating two inline functions, one to call to > atomically increment and the other to just increment? Can backup that > the correct version of the function is being called with an > Assert(LWLockHeldByMeInMode(ProcArrayLock, ...)); On reflection I agree with Andres' thought that just taking the lock exclusively in ProcArrayClearTransaction is the right solution. It's silly to imagine that a 2PC commit (plus all the other stuff that needs to happen around that) is fast enough that that'll be a serious performance hit. Keeping things simple for the other code paths is a more useful goal. regards, tom lane