On 02.12.2010 12:31, Heikki Linnakangas wrote:
> On 02.12.2010 13:25, Simon Riggs wrote:
>> On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
>>> On 02.12.2010 11:02, Simon Riggs wrote:
>>>> The cause of the issue is that replay starts at one LSN and there is a
>>>> delay until the RunningXacts WAL record occurs. If there was no delay,
>>>> there would be no issue at all. In CreateCheckpoint() we start by
>>>> grabbing the WAInsertLock and later recording that pointer as part of
>>>> the checkpoint record. My proposal is to replace the "grab the lock"
>>>> code with the insert of the RunningXacts WAL record (when wal_level
>>>> set), so that recovery always starts with that record type.
>>>
>>> Oh, interesting idea. But AFAICS closing the gap between acquiring the
>>> running-xacts snapshot and writing it to the log is sufficient, I don't
>>> see what moving the running-xacts record buys us. Does it allow some
>>> further simplifications somewhere?
>>
>> Your patch is quite long and you do a lot more than just alter the
>> locking. I don't think we need those changes at all and especially would
>> not wish to backpatch that.
>
> Most of the changes to procarray.c were about removing code that's no
> longer necessary when we close the gap between acquiring and writing the
> running-xacts WAL record. You can leave it as it is as a historical
> curiosity, but I'd prefer to simplify it, given that we now know that it
> doesn't actually work correctly if the gap is not closed.
Ok, I've committed this patch now.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com