Home > mailing lists

Block B-Tree concept - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Block B-Tree concept
Date	September 26, 2006 10:16:52
Msg-id	4518FE16.1020507@enterprisedb.com Whole thread Raw
Responses	Re: Block B-Tree concept (Tom Lane <tgl@sss.pgh.pa.us>) Re: Block B-Tree concept ("Jim C. Nasby" <jim@nasby.net>)
List	pgsql-hackers

Tree view

I've been experimenting with the idea of a so-called Block B-Tree. The 
basic idea is that instead of storing an index tuple for each heap 
tuple, we store an index tuple for each heap block. This dramatically 
reduces the size of an index, leading to savings on I/O. This idea was 
briefly discussed in January: 
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00565.php

To make it actually work, the semantics of the B-Tree has been modified 
so that every index tuple represents 1 or more heap tuples that fall 
within some range of values, and are on the same heap page. The range 
that an index tuple represents is from X, inclusive, to Y, exclusive, 
where X is the key of the index tuple and Y is the key of the *next* 
index tuple in the index. If the heap is in index order (as after 
CLUSTER), we get a very compact index this way, effectively eliminating 
the leaf level of the B-tree.

To locate the actual matching items on the heap page, we have to scan 
the heap page because we don't have the item ids, so this is a tradeoff 
between CPU and I/O. However, we could have a hybrid where we initially 
store heap tuple pointers like we do now, and when there's more than X 
consecutive pointers to the same page, we collapse them to just one 
pointer to the whole page. This would potentially give us the best of 
both worlds.

This design is more flexible and less invasive than true 
index-organized-tables, because it doesn't require perfect ordering of 
the heap or moving heap tuples around. You can also define than one 
Block B-Tree on a table, though you wouldn't get much benefit from using 
Block B-Tree instead of normal B-Tree if there's no correlation between 
the index order and the heap order.

It also fits in nicely with the bitmap scans, since there's already 
support for "lossy" bitmap pages. Doing normal ordered index scans 
requires some coding, but can be done.

Thoughts? I'm thinking of getting this in early in the 8.3 cycle. We'll 
have to take a look at the indexam API to support both bitmap indexes 
and block B-trees nicely.

-- 
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

pgsql-hackers by date:

From: "Dave Page"
Date: 26 September 2006, 09:52:13
Subject: Re: Buildfarm alarms

From: Heikki Linnakangas
Date: 26 September 2006, 10:31:18
Subject: Re: Phantom Command ID

Block B-Tree concept - Mailing list pgsql-hackers

Previous

Next