Re: Sparse bit set data structure - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Sparse bit set data structure
Date
Msg-id dbf83e18-a3f8-145e-1d28-d7e07cb391a2@iki.fi
Whole thread Raw
In response to Re: Sparse bit set data structure  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On 14/03/2019 07:15, Andrey Borodin wrote:
>> 14 марта 2019 г., в 0:18, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>>
<0001-Add-SparseBitset-to-hold-a-large-set-of-64-bit-ints-.patch><0002-Andrey-Borodin-s-test_blockset-tool-adapted-for-Spar.patch>
> 
> That is very interesting idea. Basically, B-tree and radix tree is a tradeoff between space and time.
> 
> In general, lookup into radix tree will touch less CPU cache lines.
> In this terms Bitmapset is on most performant and memory-wasteful side: lookup into Bitmapset is always 1 cache
line.
> Performance of radix tree can be good in case of skewed distribution, while B-tree will be OK on uniform. I think
thatdistribution of GiST inner pages is uniform, distribution of empty leafs is... I have no idea, let's consider
uniformtoo.
 

Yeah. In this implementation, the leaf nodes are packed into bitmaps 
when possible, so it should perform quite well on skewed distributions, too.

> I'd review this data structure ASAP. I just need to understand: do we aim to v12 or v13? (I did not solve concurrency
issuesin GiST VACUUM yet, but will fix them at weekend)
 

I'm aiming v12 with this. It's a fairly large patch, but it's very 
isolated. I think the most pressing issue is getting the rest of the 
GiST vacuum patch fixed. If you get that fixed over the weekend, I'll 
take another look at it on Monday.

> Also, maybe we should consider using RoaringBitmaps? [0]
> As a side not I would add that while balanced trees are widely used for operations on external memory, there are more
performantversions for main memory. Like AVL-tree and RB-tree.
 

Hmm. Yeah, this is quite similar to Roaring Bitmaps. Roaring bitmaps 
also have a top-level, at which you binary search, and "leaf" nodes 
which can be bitmaps or arrays. In a roaring bitmap, the key space is 
divided into fixed-size chunks, like in a radix tree, but different from 
a B-tree.

Even if we used AVL-trees or RB-trees or something else for the top 
layers of the tree, I think at the bottom level, we'd still need to use 
sorted arrays or bitmaps, to get the density we want. So I think the 
implementation at the leaf level would look pretty much the same, in any 
case. And the upper levels don't take very much space, regardless of the 
implementation. So I don't think it matters much.

- Heikki


pgsql-hackers by date:

Previous
From: "Imai, Yoshikazu"
Date:
Subject: RE: speeding up planning with partitions
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: Timeout parameters