Home > mailing lists

Re: Bug in new buffering GiST build code - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Bug in new buffering GiST build code
Date	May 21, 2012 17:56:32
Msg-id	4FBAABE5.3000200@enterprisedb.com Whole thread
In response to	Re: Bug in new buffering GiST build code (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: Bug in new buffering GiST build code
List	pgsql-hackers

Tree view

On 18.05.2012 20:34, Alexander Korotkov wrote:
> On Fri, May 18, 2012 at 8:27 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>>
>> After fixing that, however, I'm now getting another error, much later in
>> the build process:
>>
>> ERROR:  failed to re-find parent for block 123002
>> STATEMENT:  create index i_gisttest on gisttest using gist (t collate "C")
>> WITH (fillfactor=10);
>>
>> I'll continue debugging that, but it seems to be another, unrelated, bug.
>
> Thanks for debugging and fixing that. I'm going to take a look on the
> remaining bug.

After staring at graphs built from gist trees for the whole day, I think 
I finally understand what's wrong:

There's a thinko in the way we maintain the parent paths during 
insertions. It boils down to the fact that in a GiST index, the 
left-to-right ordering as determined by the right-links on the upper 
level does not necessarily match the left-to-right ordering at a lower 
level. I'm afraid we've inadvertently made that assumption in the code.

This can happen:

1. Let's imagine that we have a tree that looks like this:
   root    |   ...    |    A   (internal node at upper level)    |    |    B    |    |    C   (internal node at a lower
level)   |   ...

2. While we descend down the tree to insert a tuple, we memorize the 
path A..B..C. This is stored in the node buffer associated with node C.

3. More tuples are inserted to another subtree below B (not shown), 
until node B needs to be split. This produces tree:
    A    |\    | \    B->B2       |       |       C

We still have the path A..B..C memorized in C's node buffer. The 
downlink for C is now actually in B2, but that's ok, because we have the 
code to follow the right links if we can't find the downlink for a node 
in the memorized parent.

4. More tuples are added to another subtree of A, until A has to be 
split. Picksplit decides to keep the downlink for B2 on the original 
page, and moves the downlink for B on the new page, A2:
    A->A2     \ /      X     / \    B->B2       |       |       C

Remember that we still have the path A..B..C memorized in C's node buffer.

5. More tuples are buffered, and we traverse down the tree along the 
path A2->B->... When we look up the node buffer for page B, we update 
the path stored there. It's now A2..B. This fragment of the path is 
shared by the path in C's node buffer.

6. At this point, the path memorized in C's node buffer is A2..B..C. 
This is where things go wrong. While it's true that A2 is the parent of 
B, and it's true that the parent of C can be found by following the 
rightlink from B, A2 is *not* a grandparent of C.

7. More tuples are added below C, and C has to be split. To insert the 
downlink for the new sibling, we re-find the parent for C. The memorized 
path is A2..B..C. We begin by searching for the downlink for C in page 
B. It's not there, so we move right, and find it in B2. The path we're 
working with is now A2..B2..C. When we insert the new downlink into B2, 
it also fills up and has to be split, so recurse up and have to refind 
the parent of B2. We begin looking in the memorized parent, A2. The 
downlink is not there, so we move right. But the downlink for B2 is to 
the left from A2, so we never find it. We walk right until we fall off 
the cliff, and you get the "failed to re-find parent" error.

I think the minimal fix is that after moving right, look up the node 
buffer of the page we moved onto, and use the path memorized for that if 
we have to recurse further up the tree. So in the above example, at step 
7 after we've moved right to node B2, we should look up the memorized 
path for B2 in B2's node buffer. That would give us the correct path, 
A..B2..C.

The management of the path stacks is a bit complicated, anyway. I'll 
think about this some more tomorrow, maybe we can make it simpler, 
knowing that we have to do those extra lookups.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Josh Berkus
Date: 21 May 2012, 17:42:48
Subject: Re: Why is indexonlyscan so darned slow?

From: Heikki Linnakangas
Date: 21 May 2012, 17:56:47
Subject: Re: Bug in new buffering GiST build code

Re: Bug in new buffering GiST build code - Mailing list pgsql-hackers

Previous

Next