On 02.12.2010 13:25, Simon Riggs wrote:
> On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
>> On 02.12.2010 11:02, Simon Riggs wrote:
>>> The cause of the issue is that replay starts at one LSN and there is a
>>> delay until the RunningXacts WAL record occurs. If there was no delay,
>>> there would be no issue at all. In CreateCheckpoint() we start by
>>> grabbing the WAInsertLock and later recording that pointer as part of
>>> the checkpoint record. My proposal is to replace the "grab the lock"
>>> code with the insert of the RunningXacts WAL record (when wal_level
>>> set), so that recovery always starts with that record type.
>>
>> Oh, interesting idea. But AFAICS closing the gap between acquiring the
>> running-xacts snapshot and writing it to the log is sufficient, I don't
>> see what moving the running-xacts record buys us. Does it allow some
>> further simplifications somewhere?
>
> Your patch is quite long and you do a lot more than just alter the
> locking. I don't think we need those changes at all and especially would
> not wish to backpatch that.
Most of the changes to procarray.c were about removing code that's no
longer necessary when we close the gap between acquiring and writing the
running-xacts WAL record. You can leave it as it is as a historical
curiosity, but I'd prefer to simplify it, given that we now know that it
doesn't actually work correctly if the gap is not closed.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com