Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: Fast GiST index build
Date
Msg-id CAPpHfdvhoYKUUCVaC-yFbDaE5j2QTa94cVgYuKnzX4xwWUjSTg@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Fast GiST index build  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: WIP: Fast GiST index build
List pgsql-hackers
On Wed, Aug 10, 2011 at 11:45 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
unloadNodeBuffers() is now dead code.
processEmptyingStack calls it.

LEAF_PAGES_STATS_* are unused now.
Removed.
 
Should avoid calling smgrnblocks() on every tuple, the overhead of that could add up.
Now calling at each BUFFERING_MODE_SWITCH_CHECK_STEP(256) tuples.
 
I wonder, how hard would it be to merge gistBufferingBuildPlaceToPage() with the gistplacetopage() function used in the main codepath? There's very little difference between them, and it would be nice to maintain just one function. At the very least I think there should be a comment in both along the lines of "NOTE: if you change this function, make sure you update XXXX (the other function) as well!"
I doubt they can be effectively merged, but will try.
 
In gistbuild(), in the final emptying stage, there's this special-case handling for the root block before looping through the buffers in the buffersOnLevels lists:

               for (;;)
               {
                       nodeBuffer = getNodeBuffer(gfbb, &buildstate.giststate, GIST_ROOT_BLKNO,
                                                                          InvalidOffsetNumber, NULL, false);
                       if (!nodeBuffer || nodeBuffer->blocksCount <= 0)
                               break;
                       MemoryContextSwitchTo(gfbb->context);
                       gfbb->bufferEmptyingStack = lcons(nodeBuffer, gfbb->bufferEmptyingStack);
                       MemoryContextSwitchTo(buildstate.tmpCtx);
                       processEmptyingStack(&buildstate.giststate, &insertstate);
               }

What's the purpose of that? Wouldn't the loop through buffersOnLevels lists take care of the root node too?
I was worried about node buffer deletion from list while scanning that list gistbuild(). That's why I avoided deletion from lists. 
Now I've added additional check for root node in loop over list.
 
The calculations in initBuffering() desperately need comments. As does the rest of the code too, but the heuristics in that function are particularly hard to understand without some explanation.
Some comments were added. I'm working on more of them.
 
------
With best regards,
Alexander Korotkov.  
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: "pgstat wait timeout" warnings
Next
From: Heikki Linnakangas
Date:
Subject: Re: WIP: Fast GiST index build