Home > mailing lists

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: WIP: Fast GiST index build
Date	July 14, 2011 08:44:54
Msg-id	4E1EABD9.5000308@enterprisedb.com Whole thread Raw
In response to	Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: WIP: Fast GiST index build
List	pgsql-hackers

Tree view

On 14.07.2011 11:33, Alexander Korotkov wrote:
> On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> One thing that caught my eye is that when you empty a buffer, you load the
>> entire subtree below that buffer, down to the next buffered or leaf level,
>> into memory. Every page in that subtree is kept pinned. That is a problem;
>> in the general case, the buffer manager can only hold a modest number of
>> pages pinned at a time. Consider that the minimum value for shared_buffers
>> is just 16. That's unrealistically low for any real system, but the default
>> is only 32MB, which equals to just 4096 buffers. A subtree could easily be
>> larger than that.
>>
> With level step = 1 we need only 2 levels in subtree. With mininun index
> tuple size (12 bytes) each page can have at maximum 675. Thus I think
> default shared_buffers is enough for level step = 1.

Hundreds of buffer pins is still a lot. And with_level_step=2, the 
number of pins required explodes to 675^2 = 455625.

Pinning a buffer that's already in the shared buffer cache is cheap, I 
doubt you're gaining much by keeping the private hash table in front of 
the buffer cache. Also, it's possible that not all of the subtree is 
actually required during the emptying, so in the worst case pre-loading 
them is counter-productive.

> I believe it's enough
> to add check we have sufficient shared_buffers, isn't it?

Well, what do you do if you deem that shared_buffers is too small? Fall 
back to the old method? Also, shared_buffers is shared by all backends, 
so you can't assume that you get to use all of it for the index build. 
You'd need a wide safety margin.

>> I don't think you're benefiting at all from the buffering that BufFile does
>> for you, since you're reading/writing a full block at a time anyway. You
>> might as well use the file API in fd.c directly, ie.
>> OpenTemporaryFile/FileRead/**FileWrite.
>
> BufFile is distributing temporary data through several files. AFAICS
> postgres avoids working with files larger than 1GB. Size of tree buffers can
> easily be greater. Without BufFile I need to maintain set of files manually.

Ah, I see. Makes sense.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

pgsql-hackers by date:

From: Simon Riggs
Date: 14 July 2011, 08:41:07
Subject: Re: Reduced power consumption in WAL Writer process

From: Heikki Linnakangas
Date: 14 July 2011, 09:02:21
Subject: Re: Small patch for GiST: move childoffnum to child

Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

Previous

Next