Robert Haas wrote:
> On Mon, Aug 17, 2009 at 6:50 AM, Robert Haas<robertmhaas@gmail.com> wrote:
>>> I think there's a race condition in the way LogCurrentRunningXacts() is
>>> called at the end of checkpoint. This can happen in the master:
>>>
>>> 1. Checkpoint starts
>>> 2. Transaction 123 begins, and does some updates
>>> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
>>> 4. LogCurrentRunningXacts() gets the list of currently running
>>> transactions by calling GetCurrentTransactionData().
>>> 5. Transaction 123 ends, writing commit record to WAL
>>> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
>>> includes XID 123, since that was still running at step 4.
>>>
>>> When that is replayed, ProcArrayUpdateTransactions() will zap the
>>> unobserved xids array with the list that includes XID 123, even though
>>> we already saw a commit record for it.
>> Sounds like we need some locking there, then. This exceeds my current
>> depth of understanding of the patch, but I'll see if I can figure it
>> out.
>
> I looked at this a little more. I'm wondering if we can fix this by
> making ProcArrayUpdateRecoveryTransactions() smarter. Can we just
> refuse to add an Xid to the UnobservedXids() array if in fact we've
> already observed it? (Not sure how to check that.)
There's also the opposite problem: If a transaction starts (and writes a
WAL record) between LogCurrentRunningXacts() and XLogInstrt(), it is not
present in the RunningXacts record. When the standby replays the
RunningXacts record, it removes the XID of that transaction from the
array, even though it's still running.
> Fixing this on the
> master would seem to require acquiring the WALInsertLock before
> calling GetRunningTransactionData() and holding it until we finish
> writing that data to WAL, which I suspect someone's going to complain
> about...
Yeah, it's hard to get that locking right without risking deadlock. As
the patch stands, we only write a RunningXacts record once per
checkpoint, so it's not performance critical, but we must avoid deadlocks.
If there's a way, I would prefer a solution where the RunningXacts
snapshot represents the situation the moment it appears in WAL, not some
moment before it. It makes the logic easier to understand.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com