Home > mailing lists

Re: B-tree parent pointer and checkpoints - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: B-tree parent pointer and checkpoints
Date	March 10, 2011 19:47:39
Msg-id	201103102047.p2AKlLa02940@momjian.us Whole thread Raw
In response to	Re: B-tree parent pointer and checkpoints (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: B-tree parent pointer and checkpoints
List	pgsql-hackers

Tree view

Has this been addressed?

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> On 13.11.2010 00:34, Greg Stark wrote:
> > On Fri, Nov 12, 2010 at 7:20 PM, Heikki Linnakangas
> > <heikki.linnakangas@enterprisedb.com>  wrote:
> >> I think we can work around that with a small modification to the page split
> >> algorithm. In a nutshell, when the child page is split, put a flag on the
> >> left half indicating that the rightlink must always be followed, regardless
> >> of the NSN. When the downlink is inserted to the parent, clear the flag.
> >> Setting and clearing of these flags need to be performed during WAL replay
> >> as well.
> >
> > Does this not cause duplicate results? Or does GIST already have to be
> > prepared to deal with duplicate results?
> 
> The GiST search algorithm avoids duplicate results by remembering the 
> LSN on the parent page when it follows a downlink. The split currently 
> happens like this:
> 
> 0. (the child page is locked)
> 1. The parent page is locked.
> 2. The child page is split. The original page becomes the left half, and 
> new buffers are allocated for the right halves.
> 3. The downlink is inserted on the parent page (and the original 
> downlink is updated to reflect only the keys that stayed on the left 
> page). While keeping the child pages locked, the NSN field on the 
> children are updated with the new LSN of the parent page.
> 
> To avoid duplicates, when a scan looks at the child page, it needs to 
> know if it saw the parent page before or after the downlink was 
> inserted. If it saw it before, the scan needs to follow the rightlink to 
> the right half, otherwise it will follow the downlink as usual (if it 
> matched). The scan checks that by comparing the LSN it saw on the parent 
> page with the NSN on the child page. If parent LSN < NSN, we saw the 
> parent before the downlink was inserted.
> 
> Now, the problem with crash recovery is that the above algorithm depends 
> on the split to keep the parent and child locked until the downlink is 
> inserted in the parent. If you crash between steps 2 and 3, the locks 
> are gone. If a later insert then updates the parent page, because of a 
> split on some unrelated child page, that will bump the LSN of the parent 
> above the NSN on the child. Scans will see that the parent LSN > child 
> NSN, and will no longer follow the rightlink.
> 
> And the fix for that is to set a flag on the child page indicating that 
> rightlink has to be always followed regardless of the LSN/NSN, because 
> the downlink hasn't been inserted yet. When the downlink is inserted, 
> the flag is cleared and we rely on the existing LSN/NSN mechanism to 
> avoid duplicate results.
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

pgsql-hackers by date:

From: Bruce Momjian
Date: 10 March 2011, 19:42:25
Subject: Re: configure gaps

From: Andrew Dunstan
Date: 10 March 2011, 19:48:16
Subject: Re: configure gaps

Re: B-tree parent pointer and checkpoints - Mailing list pgsql-hackers

Previous

Next