Re: B-tree parent pointer and checkpoints - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: B-tree parent pointer and checkpoints
Date
Msg-id 4E6623FF.1070500@enterprisedb.com
Whole thread Raw
In response to Re: B-tree parent pointer and checkpoints  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: B-tree parent pointer and checkpoints
List pgsql-hackers
On 06.09.2011 16:40, Robert Haas wrote:
> On Tue, Sep 6, 2011 at 6:21 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> The way it would work is that on page split the right page is flagged with
>> MISSING_DOWNLINK flag. When the downlink is inserted into the parent, the
>> flag is cleared in the same critical section as the WAL record for the
>> insertion of the parent is written. Normally, a backend would never see the
>> flag set, because the locks on the split pages are not released until the
>> parent record is written and the flag cleared again. But if inserting the
>> downlink fails for any reason, the next inserter or vacuum that steps on the
>> page can finish the split by inserting the downlink.
>>
>> Unfortunately that means holding the locks on the split pages longer than we
>> do at the moment. Currently they are released as soon as the parent page is
>> locked; with this change they would need to be held until the WAL record of
>> the downlink insertion is done. B-tree is so heavily used that I'm a bit
>> hesitant to sacrifice any concurrency there, but I don't think it would be
>> noticeable in practice.
>
> Do you really need to hold the page locks for all that time, or could
> you cheat?  Like... release the locks on the split pages but then go
> back and reacquire them to clear the flag...

Hmm, there's two issues with that:

1. While you're not holding the locks on the child pages, someone can 
step onto the page and see that the MISSING_DOWNLINK flag is set, and 
try to finish the split for you.

2. If you don't hold the page locked while you clear the flag, someone 
can start and finish a checkpoint after you've inserted the downlink, 
and before you've cleared the flag. You end up in a scenario where the 
flag is set, but the page in fact *does* have a downlink in the parent.

So, nope, we can't cheat.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Stefan Keller
Date:
Subject: Re: WIP: Fast GiST index build
Next
From: Andrew Dunstan
Date:
Subject: Re: [v9.1] sepgsql - userspace access vector cache