On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
One thing that caught my eye is that when you empty a buffer, you load the entire subtree below that buffer, down to the next buffered or leaf level, into memory. Every page in that subtree is kept pinned. That is a problem; in the general case, the buffer manager can only hold a modest number of pages pinned at a time. Consider that the minimum value for shared_buffers is just 16. That's unrealistically low for any real system, but the default is only 32MB, which equals to just 4096 buffers. A subtree could easily be larger than that.
With level step = 1 we need only 2 levels in subtree. With mininun index tuple size (12 bytes) each page can have at maximum 675. Thus I think default shared_buffers is enough for level step = 1. I believe it's enough to add check we have sufficient shared_buffers, isn't it?
I don't think you're benefiting at all from the buffering that BufFile does for you, since you're reading/writing a full block at a time anyway. You might as well use the file API in fd.c directly, ie. OpenTemporaryFile/FileRead/FileWrite.
BufFile is distributing temporary data through several files. AFAICS postgres avoids working with files larger than 1GB. Size of tree buffers can easily be greater. Without BufFile I need to maintain set of files manually.
------
With best regards,
Alexander Korotkov.