DropAllPredicateLocksFromTable, PredicateLockPageSplit, and CheckTableForSerializableConflictIn all assume that if PredXact->SxactGlobalXmin is invalid there are no active serializable transactions and that they can return early as there's no work to do. They don't acquire a LWLock on SerializableXactHashLock because they already have locks on the relevant buffer page or relation, which should protect them from conflicts with new concurrent transactions. However, this justification doesn't protect against concurrent updates of SxactGlobalXmin which SetNewSxactGlobalXmin can do.
SetNewSxactGlobalXmin sets SxactGlobalXmin to InvalidTransactionId then iterates over the active serializable transactions searching for a new xmin.
static void
SetNewSxactGlobalXmin(void)
{
...
PredXact->SxactGlobalXmin = InvalidTransactionId;
PredXact->SxactGlobalXminCount = 0;
dlist_foreach(iter, &PredXact->activeList)
{
// find new xmin
}
...
}
If SetNewSxactGlobalXmin is interrupted before a new xmin has been set, processes that read SxactGlobalXmin during this time will see InvalidTransactionId and might incorrectly conclude there are no concurrent serializable transactions. In the case of PredicateLockPageSplit, it will return early and fail to transfer over information about SIRead locks on the old page as a result.
I have a small repo that demonstrates serialization anomalies when PredicateLockPageSplit is incorrectly skipped (they will show up much more frequently with a small sleep after SetNewSxactGlobalXmin sets SxactGlobalXmin to invalid, but they still show up eventually on normal builds). [1]
I see a couple of options to fix this
1. Guard the reads with a lock. I've attached a patch to do this.
2. Update SetNewSxactGlobalXmin so concurrent lockless reads of SxactGlobalXmin will never see an invalid transaction ID unless there genuinely are no more active serializable transactions. To do this, delay assigning PredXact->SxactGlobalXmin until all active transactions have been examined.
static void
SetNewSxactGlobalXmin(void)
{
...
Assert(TransactionIdIsValid(PredXact->SxactGlobalXmin))
TransactionId xmin = InvalidTransactionId;
....
dlist_foreach(iter, &PredXact->activeList)
{
// iterate over transactions and find new xmin
}
PredXact->SxactGlobalXmin = xmin;
....
}
This is definitely a bit more complex. It requires that SetNewSxactGlobalXmin is never called when SxactGlobalXmin is invalid to prevent readers from seeing an invalid transaction ID when they should see a valid one -- I think this is the case now since before SetNewSxactGlobalXmin is called postgres checks that PredXact->SxactGlobalXminCount > 0. I assume this entails that SxactGlobalXmin is valid, but I have not checked every place the two variables are modified.
I imagine the vast majority of postgres instances aren't using serializable isolation, so I figured I'd throw this out as a possible optimization. I'm not sure how the project trades off between simplicity and (very minor?) performance optimizations, so I'll defer to the community.
Best,
Josh Curtis
[1]
https://github.com/joshcurtis/ssi_race_condition_reproduction