Home > mailing lists

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: WIP: Fast GiST index build
Date	August 16, 2011 19:11:02
Msg-id	4E4AC0B9.8070203@enterprisedb.com Whole thread Raw
In response to	Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: WIP: Fast GiST index build
List	pgsql-hackers

Tree view

On 16.08.2011 21:46, Alexander Korotkov wrote:
> On Tue, Aug 16, 2011 at 9:43 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Why is there ever a buffer on the root node? It seems like a waste of time
>> to load N tuples from the heap into the root buffer, only to empty the
>> buffer after it fills up. You might as well pull tuples directly from the
>> heap.
>>
> Yes, seems reasonable. Buffer on the root node was in the paper. But now I
> don't see the need of it too.

Here's an version of the patch with a bunch of minor changes:

* No more buffer on root node. Aside from the root buffer being 
pointless, this simplifies gistRelocateBuildBuffersOnSplit slightly as 
it doesn't need the special case for root block anymore.

* Moved the code to create new root item from gistplacetopage() to 
gistRelocateBuildBuffersOnSplit(). Seems better to keep the 
buffering-related code away from the normal codepath, for the sake of 
readability.

* Changed the levelStep calculation to use the more accurate upper bound 
on subtree size that we discussed.

* Changed the levelStep calculation so that it doesn't do just 
min(maintenance_work_mem, effective_cache_size) and calculate the 
levelStep from that. Maintenance_work_mem matters determines the max. 
number of page buffers that can be held in memory at a time, while 
effective_cache_size determines the max size of the subtree. Those are 
subtly different things.

* Renamed NodeBuffer to GISTNodeBuffer, to avoid cluttering the namespace

* Plus misc comment, whitespace, formatting and naming changes.

I think this patch is in pretty good shape now. Could you re-run the 
performance tests you have on the wiki page, please, to make sure the 
performance hasn't regressed? It would also be nice to get some testing 
on the levelStep and pagesPerBuffer estimates, and the point where we 
switch to the buffering method. I'm particularly interested to know if 
there's any corner-cases with very skewed data distributions or strange 
GUC settings, where the estimates fails badly.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Alexander Korotkov
Date: 16 August 2011, 18:46:34
Subject: Re: WIP: Fast GiST index build

From: Heikki Linnakangas
Date: 16 August 2011, 19:15:48
Subject: Re: WIP: Fast GiST index build

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

Previous

Next