Re: Time-correlated columns in large tables - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Time-correlated columns in large tables
Date
Msg-id 45EC7ADE.20206@enterprisedb.com
Whole thread Raw
In response to Time-correlated columns in large tables  ("Jeroen T. Vermeulen" <jtv@xs4all.nl>)
Responses Re: Time-correlated columns in large tables  ("Jeroen T. Vermeulen" <jtv@xs4all.nl>)
List pgsql-hackers
Jeroen T. Vermeulen wrote:
> [Q: Is there some other transparent optimization for values that correlate
> with insertion/update order?]
> 
> So I was wondering whether it would make sense to have a more compact kind
> of index.  One that partitions the value range of a given column into
> sub-ranges, and for each of those, merely keeps loose track of where those
> value ranges might be found.  "Dates from 2006-07-15 to 2006-08-04: try
> blocks 99-126 and 175."  Just a fixed maximum of two or three contiguous
> block ranges per value range would probably do the trick.  The risk of
> having too few, of course, is that one oddly positioned block could make
> the index look as if a particular value range was spread out throughout
> most of the table.

I think you've just described a range-encoded bitmap index. The idea is 
to divide the range of valid values into a some smallish number of 
subranges, and for each of these boundary values you store a bitmap 
where you set the bit representing every tuple with value < boundary value.

For example, imagine a simple table like this:

position | key
---------+-----       1 |    1       2 |    2       3 |    3
...      10 |   10

If you divide this into for example 4 subranges, you'd get bitmaps
 key   bitmap
-----+-----------
<= 3 | 1110000000
<= 6 | 1111110000
<= 8 | 1111111100
<= 10| 1111111111

Now to find all tuples <= 3, you'd fetch all tuples in the first bitmap. 
To find all tuples <= 2, you'd fetch all tuples in the first bitmap, and 
recheck the <= 2 condition. To find anything between say 4 and 8, you'd 
take the XOR of the 3rd and 1st bitmap, and recheck the > 4 condition on 
resulting tuples.

So basically this allows you to satisfy any range query by reading two 
bitmaps and XORing them together. The resulting tuples will generally 
need to be rechecked because the index is lossy.

I *really* hope the equality-encoded bitmap index gets into 8.3. I've 
been trying to keep range-encoded bitmaps in mind when looking at the 
proposed indexam API changes. Hopefully we'll see them in 8.4. Feel free 
to submit a patch ;-).

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Bug: Buffer cache is not scan resistant
Next
From: Tom Lane
Date:
Subject: Re: Bug: Buffer cache is not scan resistant