Thread: GSoC 2011: Fast GiST index build

GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
Hackers!

I was happy to know that my proposal "Fast GiST index build" was accepted to GSoC 2011! Thank you very much for support! Especially thanks to Heikki Linnakangas for becoming my mentor! 

The first question that I would like to discuss is the node buffer storage. During index build each index page (except leaf) should have several pages of buffer. So my question is where to store buffers and how to operate with them? It is somewhat similar to GIN fastupdate buffer, but have differences. At first, we should take care about many buffers instead of only one. At second, I belive that we shouldn't take care about concurrency so much, because algorithm assume to perform relatively huge operations in memory (entries relocation between several buffers). That require locking of whole of currently operated buffers. I'm going to store buffers separetely from index itself, because we should free all of them when index is built.

I found some very simple solution about dealing with varlena keys. The greatest buffer size and minimal level step are achived when key size is minimal. Thereby, minimal key size is worst case. Since minimal varlena size is 4 bytes, we can use it in initial calculations. I'm going to hold on this assumption in first implementation.

----
With best regards,
Alexander Korotkov.

Re: GSoC 2011: Fast GiST index build

From
Heikki Linnakangas
Date:
Congrats on being selected, looking forward to mentor you!

On 25.04.2011 23:09, Alexander Korotkov wrote:
> The first question that I would like to discuss is the node buffer storage.
> During index build each index page (except leaf) should have several pages
> of buffer. So my question is where to store buffers and how to operate with
> them? It is somewhat similar to GIN fastupdate buffer, but have differences.
> At first, we should take care about many buffers instead of only one. At
> second, I belive that we shouldn't take care about concurrency so much,
> because algorithm assume to perform relatively huge operations in memory
> (entries relocation between several buffers). That require locking of whole
> of currently operated buffers. I'm going to store buffers separetely from
> index itself, because we should free all of them when index is built.

Just palloc() the buffers in memory, at least in the first phase. 
That'll work fine for index creation. Dealing with concurrent searches 
and inserts makes it a lot more complicated, it's better to make it work 
for the index creation first, and investigate something like the GIN 
fastupdate buffers later if you have time left.

> I found some very simple solution about dealing with varlena keys. The
> greatest buffer size and minimal level step are achived when key size is
> minimal. Thereby, minimal key size is worst case. Since minimal varlena size
> is 4 bytes, we can use it in initial calculations. I'm going to hold on this
> assumption in first implementation.

Ok, good.

The first priority should be to have something that works enough to be 
benchmarked. The paper you referred to in the GSoC application [1] 
contained empirical results on the number of I/O operations needed with 
the algorithm, but it didn't take operating system cache into account at 
all. That makes the empiric results next to worthless; keeping some 
stuff in in-memory buffers is obviously going to reduce I/O if you don't 
take OS cache into account.

So we're going to need benchmark results that show a benefit, or there's 
no point in doing this at all. The sooner we get to benchmarking, even 
with a very limited and buggy version of the patch, the better. If the 
algorithm described in that paper doesn't give much benefit, you might 
have to switch to some other algorithm half-way through the project. 
Fortunately there's plenty of R-tree bulk loading algorithms in the 
literature, it should be possible to adapt some of them to GiST.

[1] http://dx.doi.org/10.1007/s00453-001-0107-6

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
On Tue, Apr 26, 2011 at 10:46 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Just palloc() the buffers in memory, at least in the first phase. That'll work fine for index creation. Dealing with concurrent searches and inserts makes it a lot more complicated, it's better to make it work for the index creation first, and investigate something like the GIN fastupdate buffers later if you have time left.
Since algorithm is focused to reduce I/O, we should expect best acceleration in the case when index doesn't fitting to memory. Size of buffers is comparable to size of whole index. It means that if we can hold buffers in memory then we mostly can hold whole index in memory. That's why I think we should have simple on-disk buffers management for first representative benchmark.
 
The first priority should be to have something that works enough to be benchmarked. The paper you referred to in the GSoC application [1] contained empirical results on the number of I/O operations needed with the algorithm, but it didn't take operating system cache into account at all. That makes the empiric results next to worthless; keeping some stuff in in-memory buffers is obviously going to reduce I/O if you don't take OS cache into account.

So we're going to need benchmark results that show a benefit, or there's no point in doing this at all. The sooner we get to benchmarking, even with a very limited and buggy version of the patch, the better. If the algorithm described in that paper doesn't give much benefit, you might have to switch to some other algorithm half-way through the project. Fortunately there's plenty of R-tree bulk loading algorithms in the literature, it should be possible to adapt some of them to GiST.

[1] http://dx.doi.org/10.1007/s00453-001-0107-6
Yes, these priority seems very reasonable. We should have first effectiveness confirmation as soon as possible. I'll hold on this priority.

----
With best regards,
Alexander Korotkov.

Re: GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
On Tue, Apr 26, 2011 at 1:10 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Since algorithm is focused to reduce I/O, we should expect best acceleration in the case when index doesn't fitting to memory. Size of buffers is comparable to size of whole index. It means that if we can hold buffers in memory then we mostly can hold whole index in memory. That's why I think we should have simple on-disk buffers management for first representative benchmark.
Since we need to free all buffers after index built, I believe that buffers should be stored separately. If not, index become bloat immediatly after creation. We probably need to create a temporary relation to store buffers in it. If my thought is right, then is there any example of using temporary relation?
 
----
With best regards,
Alexander Korotkov.

Re: GSoC 2011: Fast GiST index build

From
Heikki Linnakangas
Date:
On 27.04.2011 09:51, Alexander Korotkov wrote:
> On Tue, Apr 26, 2011 at 1:10 PM, Alexander Korotkov<aekorotkov@gmail.com>wrote:
>
>> Since algorithm is focused to reduce I/O, we should expect best
>> acceleration in the case when index doesn't fitting to memory. Size of
>> buffers is comparable to size of whole index. It means that if we can hold
>> buffers in memory then we mostly can hold whole index in memory. That's why
>> I think we should have simple on-disk buffers management for first
>> representative benchmark.
>>
> Since we need to free all buffers after index built, I believe that buffers
> should be stored separately. If not, index become bloat immediatly after
> creation. We probably need to create a temporary relation to store buffers
> in it. If my thought is right, then is there any example of using temporary
> relation?

A temporary relation is a bit heavy-weight for this, a temporary file 
should be enough. See src/backend/storage/file/buffile.c, 
BufFileCreateTemp() function in particular. Or perhaps a tuplestore 
suits better, see src/backend/utils/sort/tuplestore.c, that's simpler to 
use if you're storing tuples. tuplestore only supports storing heap 
tuples at the moment, but it could easily be extended to store index 
tuples, like tuplesort.c does.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
During studying of existing GiST code I have a question. gistFindCorrectParent function have branch with following comment:
/*
* awful!!, we need search tree to find parent ... , but before we
* should release all old parent
*/
Can you provide me an example of case when this branch works?

----
With best regards,
Alexander Korotkov.

Re: GSoC 2011: Fast GiST index build

From
Teodor Sigaev
Date:
> gistFindCorrectParent function have branch with following comment:
> /*
> * awful!!, we need search tree to find parent ... , but before we
> * should release all old parent
> */
> Can you provide me an example of case when this branch works?

See several lines above:            if (parent->blkno == InvalidBlockNumber)
                /*                 * end of chain and still didn't found parent, It's very-very                 * rare
situationwhen root splited                 */                break;
 


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
2011/5/5 Teodor Sigaev <teodor@sigaev.ru>
See several lines above:
           if (parent->blkno == InvalidBlockNumber)

               /*
                * end of chain and still didn't found parent, It's very-very
                * rare situation when root splited
                */
               break;
 
As I understood it's because we can't move root to another page.

----
With best regards,
Alexander Korotkov.

Re: GSoC 2011: Fast GiST index build

From
Teodor Sigaev
Date:
> As I understood it's because we can't move root to another page.

Actually not. Path to node could change completely, For example, for page on 
second level path is 0-234 (where 0 is a root page, 234 is a page on second 
level). After root split path will be 0-new_page-234.

If algorithm could be able to change root then new path could be looked as 
new_root-new_page-234 because old root could be splitted to old_root_page and 
new_page.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GSoC 2011: Fast GiST index build

From
Alexander Korotkov
Date:
2011/5/6 Teodor Sigaev <teodor@sigaev.ru>
As I understood it's because we can't move root to another page.

Actually not. Path to node could change completely, For example, for page on second level path is 0-234 (where 0 is a root page, 234 is a page on second level). After root split path will be 0-new_page-234.

If algorithm could be able to change root then new path could be looked as new_root-new_page-234 because old root could be splitted to old_root_page and new_page.

Ok. Thank you for explanation.

----
With best regards,
Alexander Korotkov.