Re: hot standby - merged up to CVS HEAD - Mailing list pgsql-hackers

From Robert Haas
Subject Re: hot standby - merged up to CVS HEAD
Date
Msg-id 603c8f070908192019q705dee12qfa4f187c2eaf7e40@mail.gmail.com
Whole thread Raw
In response to Re: hot standby - merged up to CVS HEAD  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: hot standby - merged up to CVS HEAD  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Mon, Aug 17, 2009 at 6:50 AM, Robert Haas<robertmhaas@gmail.com> wrote:
>> I think there's a race condition in the way LogCurrentRunningXacts() is
>> called at the end of checkpoint. This can happen in the master:
>>
>> 1. Checkpoint starts
>> 2. Transaction 123 begins, and does some updates
>> 3. Checkpoint ends. LogCurrentRunningXacts() is called.
>> 4. LogCurrentRunningXacts() gets the list of currently running
>> transactions by calling GetCurrentTransactionData().
>> 5. Transaction 123 ends, writing commit record to WAL
>> 6. LogCurrentRunningXacts() writes the list of running XIDs to WAL. This
>> includes XID 123, since that was still running at step 4.
>>
>> When that is replayed, ProcArrayUpdateTransactions() will zap the
>> unobserved xids array with the list that includes XID 123, even though
>> we already saw a commit record for it.
>
> Sounds like we need some locking there, then.  This exceeds my current
> depth of understanding of the patch, but I'll see if I can figure it
> out.

I looked at this a little more.  I'm wondering if we can fix this by
making ProcArrayUpdateRecoveryTransactions() smarter.  Can we just
refuse to add an Xid to the UnobservedXids() array if in fact we've
already observed it?  (Not sure how to check that.) Fixing this on the
master would seem to require acquiring the WALInsertLock before
calling GetRunningTransactionData() and holding it until we finish
writing that data to WAL, which I suspect someone's going to complain
about...

For the archives, I'm also attaching a copy of this patch with my
latest (very minor) cleanups, and Heikki's.  This should apply to CVS
HEAD, if anyone wants to test.  git repo is available at:

http://git.postgresql.org/gitweb?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/hs

This is useful: git log --no-merges master...hs

...Robert

Attachment

pgsql-hackers by date:

Previous
From: Itagaki Takahiro
Date:
Subject: Re: auto_explain log_verbose causes regression failure
Next
From: Heikki Linnakangas
Date:
Subject: Re: hot standby - merged up to CVS HEAD