Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP: Fast GiST index build
Date
Msg-id 4E1EABD9.5000308@enterprisedb.com
Whole thread Raw
In response to Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: Fast GiST index build
List pgsql-hackers
On 14.07.2011 11:33, Alexander Korotkov wrote:
> On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> One thing that caught my eye is that when you empty a buffer, you load the
>> entire subtree below that buffer, down to the next buffered or leaf level,
>> into memory. Every page in that subtree is kept pinned. That is a problem;
>> in the general case, the buffer manager can only hold a modest number of
>> pages pinned at a time. Consider that the minimum value for shared_buffers
>> is just 16. That's unrealistically low for any real system, but the default
>> is only 32MB, which equals to just 4096 buffers. A subtree could easily be
>> larger than that.
>>
> With level step = 1 we need only 2 levels in subtree. With mininun index
> tuple size (12 bytes) each page can have at maximum 675. Thus I think
> default shared_buffers is enough for level step = 1.

Hundreds of buffer pins is still a lot. And with_level_step=2, the 
number of pins required explodes to 675^2 = 455625.

Pinning a buffer that's already in the shared buffer cache is cheap, I 
doubt you're gaining much by keeping the private hash table in front of 
the buffer cache. Also, it's possible that not all of the subtree is 
actually required during the emptying, so in the worst case pre-loading 
them is counter-productive.

> I believe it's enough
> to add check we have sufficient shared_buffers, isn't it?

Well, what do you do if you deem that shared_buffers is too small? Fall 
back to the old method? Also, shared_buffers is shared by all backends, 
so you can't assume that you get to use all of it for the index build. 
You'd need a wide safety margin.

>> I don't think you're benefiting at all from the buffering that BufFile does
>> for you, since you're reading/writing a full block at a time anyway. You
>> might as well use the file API in fd.c directly, ie.
>> OpenTemporaryFile/FileRead/**FileWrite.
>
> BufFile is distributing temporary data through several files. AFAICS
> postgres avoids working with files larger than 1GB. Size of tree buffers can
> easily be greater. Without BufFile I need to maintain set of files manually.

Ah, I see. Makes sense.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Reduced power consumption in WAL Writer process
Next
From: Heikki Linnakangas
Date:
Subject: Re: Small patch for GiST: move childoffnum to child