Re: WIP: Fast GiST index build - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: Fast GiST index build
Date
Msg-id CAPpHfdusd_b_dtvNY2uuJict6=VEUTGAS8ELXGck=o=kOdMnOw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: Fast GiST index build
List pgsql-hackers
Hi!

There is last version of patch. There is the list of most significant changes in comparison with your version of patch:
1) Reference counting of path items was added. It should helps against possible accumulation of path items.
2) Neighbor relocation was added.
3) Subtree prefetching was added. 
4) Final emptying algorithm was reverted to the original one. My experiments shows that typical number of passes in final emptying in your version of patch is about 5. It may be significant itself. Also I haven't estimate of number of passes and haven't guarantees that it will not be high in some corner cases. I.e. I prefer more predictable single-pass algorithm in spite of it's a little more complex.
5) Unloading even last page of node buffer from main memory to the disk. Imagine that that with levelstep = 1 each inner node has buffer. It seems to me that keeping one page of each buffer in memory may be memory consuming. 

Open items I see at this moment:
1) I dislike my switching to buffering build method because it's based on very unproven assumptions. And I didn't more reliable assumptions in scientific papers while. I would like to replace it with something much simplier. For example, switching to buffering build when regular build actually starts to produce a lot of IO. For this approach implementation I need to somehow detect actual IO (not just buffer read but miss of OS cache).
2) I'm worrying about possible size of nodeBuffersTab and path items. If we imagine extremely large tree with levelstep = 1 size of this datastructures will be singnificant. And it's hard to predict that size without knowing of tree size.

------
With best regards,
Alexander Korotkov.  
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs
Next
From: Josh Kupershmidt
Date:
Subject: Re: vacuumlo patch