On 14.07.2011 11:33, Alexander Korotkov wrote:
> On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com> wrote:
>
>> One thing that caught my eye is that when you empty a buffer, you load the
>> entire subtree below that buffer, down to the next buffered or leaf level,
>> into memory. Every page in that subtree is kept pinned. That is a problem;
>> in the general case, the buffer manager can only hold a modest number of
>> pages pinned at a time. Consider that the minimum value for shared_buffers
>> is just 16. That's unrealistically low for any real system, but the default
>> is only 32MB, which equals to just 4096 buffers. A subtree could easily be
>> larger than that.
>>
> With level step = 1 we need only 2 levels in subtree. With mininun index
> tuple size (12 bytes) each page can have at maximum 675. Thus I think
> default shared_buffers is enough for level step = 1.
Hundreds of buffer pins is still a lot. And with_level_step=2, the
number of pins required explodes to 675^2 = 455625.
Pinning a buffer that's already in the shared buffer cache is cheap, I
doubt you're gaining much by keeping the private hash table in front of
the buffer cache. Also, it's possible that not all of the subtree is
actually required during the emptying, so in the worst case pre-loading
them is counter-productive.
> I believe it's enough
> to add check we have sufficient shared_buffers, isn't it?
Well, what do you do if you deem that shared_buffers is too small? Fall
back to the old method? Also, shared_buffers is shared by all backends,
so you can't assume that you get to use all of it for the index build.
You'd need a wide safety margin.
>> I don't think you're benefiting at all from the buffering that BufFile does
>> for you, since you're reading/writing a full block at a time anyway. You
>> might as well use the file API in fd.c directly, ie.
>> OpenTemporaryFile/FileRead/**FileWrite.
>
> BufFile is distributing temporary data through several files. AFAICS
> postgres avoids working with files larger than 1GB. Size of tree buffers can
> easily be greater. Without BufFile I need to maintain set of files manually.
Ah, I see. Makes sense.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com