Block B-Tree concept - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Block B-Tree concept
Date
Msg-id 4518FE16.1020507@enterprisedb.com
Whole thread Raw
Responses Re: Block B-Tree concept  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Block B-Tree concept  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
I've been experimenting with the idea of a so-called Block B-Tree. The 
basic idea is that instead of storing an index tuple for each heap 
tuple, we store an index tuple for each heap block. This dramatically 
reduces the size of an index, leading to savings on I/O. This idea was 
briefly discussed in January: 
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00565.php

To make it actually work, the semantics of the B-Tree has been modified 
so that every index tuple represents 1 or more heap tuples that fall 
within some range of values, and are on the same heap page. The range 
that an index tuple represents is from X, inclusive, to Y, exclusive, 
where X is the key of the index tuple and Y is the key of the *next* 
index tuple in the index. If the heap is in index order (as after 
CLUSTER), we get a very compact index this way, effectively eliminating 
the leaf level of the B-tree.

To locate the actual matching items on the heap page, we have to scan 
the heap page because we don't have the item ids, so this is a tradeoff 
between CPU and I/O. However, we could have a hybrid where we initially 
store heap tuple pointers like we do now, and when there's more than X 
consecutive pointers to the same page, we collapse them to just one 
pointer to the whole page. This would potentially give us the best of 
both worlds.

This design is more flexible and less invasive than true 
index-organized-tables, because it doesn't require perfect ordering of 
the heap or moving heap tuples around. You can also define than one 
Block B-Tree on a table, though you wouldn't get much benefit from using 
Block B-Tree instead of normal B-Tree if there's no correlation between 
the index order and the heap order.

It also fits in nicely with the bitmap scans, since there's already 
support for "lossy" bitmap pages. Doing normal ordered index scans 
requires some coding, but can be done.

Thoughts? I'm thinking of getting this in early in the 8.3 cycle. We'll 
have to take a look at the indexam API to support both bitmap indexes 
and block B-trees nicely.

-- 
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: "Dave Page"
Date:
Subject: Re: Buildfarm alarms
Next
From: Heikki Linnakangas
Date:
Subject: Re: Phantom Command ID