Re: B-tree parent pointer and checkpoints - Mailing list pgsql-hackers

From Robert Haas
Subject Re: B-tree parent pointer and checkpoints
Date
Msg-id CA+TgmoaH0Qy10CdRMDQH7-ELkD9FawnkYOOzE06oMYB4n3Ac9g@mail.gmail.com
Whole thread Raw
In response to Re: B-tree parent pointer and checkpoints  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: B-tree parent pointer and checkpoints
List pgsql-hackers
On Tue, Sep 6, 2011 at 6:21 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Nope.
>
> On a closer look, this isn't only a problem for page deletion. Page
> splitting also barfs if it can't find the parent of a page. As the code
> stands, a missing downlink is not harmless, but causes all sorts of trouble.
>
> The window for this to happen with a checkpoint is extremely tight, but
> there's another situation where you can end up with a missing downlink: if
> you run out of disk space while splitting a parent page, to insert a
> downlink to it.
>
> I think we should do a similar fix to b-tree that I did to GiST, and put a
> flag on pages with missing downlinks. Then we can fix the missing downlinks
> in vacuum and insertion, and get rid of the code to fix incomplete splits
> after WAL replay.
>
> The way it would work is that on page split the right page is flagged with
> MISSING_DOWNLINK flag. When the downlink is inserted into the parent, the
> flag is cleared in the same critical section as the WAL record for the
> insertion of the parent is written. Normally, a backend would never see the
> flag set, because the locks on the split pages are not released until the
> parent record is written and the flag cleared again. But if inserting the
> downlink fails for any reason, the next inserter or vacuum that steps on the
> page can finish the split by inserting the downlink.
>
> Unfortunately that means holding the locks on the split pages longer than we
> do at the moment. Currently they are released as soon as the parent page is
> locked; with this change they would need to be held until the WAL record of
> the downlink insertion is done. B-tree is so heavily used that I'm a bit
> hesitant to sacrifice any concurrency there, but I don't think it would be
> noticeable in practice.

Do you really need to hold the page locks for all that time, or could
you cheat?  Like... release the locks on the split pages but then go
back and reacquire them to clear the flag...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [v9.1] sepgsql - userspace access vector cache
Next
From: Stefan Keller
Date:
Subject: Re: WIP: Fast GiST index build