Home > mailing lists

Re: On-disk bitmap index patch - Mailing list pgsql-hackers

From	Mark Kirkwood
Subject	Re: On-disk bitmap index patch
Date	July 26, 2006 23:00:54
Msg-id	44C81E32.8090500@paradise.net.nz Whole thread Raw
In response to	Re: On-disk bitmap index patch (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: On-disk bitmap index patch
List	pgsql-hackers

Tree view

Tom Lane wrote:
>
> 
> I'm surprised no one caught me making this bogus computation.  I
> realized this morning it's wrong: if there are 10000 distinct values
> then on the average the 1-bits would be about 10000 bits apart, not 100.

Right - I didn't think 10000 was *that* bad, but was too sleepy to try 
working it out :-).

> 
>
> I don't believe the 100x numbers that have been
> bandied around in this discussion, but 10x is plenty enough to be
> interesting.
> 

Yep - I have not managed to get 100x in any of my tests. However, I do 
see some about half that for the TPCH scale 10 dataset:

tpch=# \i relsizes.sql            (BTREE)        relname         | relpages
------------------------+---------- customer               |    41019 customer_c_custkey     |     3288
customer_c_mktsegment |     5779 customer_c_nationkey   |     3288 lineitem               |  1535724
lineitem_l_linenumber |   131347 lineitem_l_orderkey    |   131347 orders                 |   307567 orders_o_custkey
   |    32847 orders_o_orderpriority |    65876 orders_o_orderstatus   |    41131

tpch=# \i relsizes.sql            (MAINLY BITMAP)        relname         | relpages
------------------------+---------- customer               |    41019 customer_c_custkey     |     3288
customer_c_mktsegment |      157 customer_c_nationkey   |      336 lineitem               |  1535724
lineitem_l_linenumber |     7571 lineitem_l_orderkey    |   131347 orders                 |   307567 orders_o_custkey
   |    32847 orders_o_orderpriority |     1427 orders_o_orderstatus   |      717

The orders_o_orderpriority and orders_o_orderstatus bitmap indexes are 
46 and 57 times smaller than their btree counterparts (hmm...might we 
see even better compression for larger scale factors?).

An obvious deduction is that the TPCH dataset is much more amenable to 
run compression than my synthetic Zipfian data was. The interesting 
question is how well "real" datasets are run compressable, I suspect 
"better than my Zipfian data" is a safe assumption!

Cheers

Mark

pgsql-hackers by date:

From: "Qingqing Zhou"
Date: 26 July 2006, 22:31:13
Subject: Re: default lower case of identifier

From: "Luke Lonergan"
Date: 27 July 2006, 00:55:59
Subject: Re: On-disk bitmap index patch

Re: On-disk bitmap index patch - Mailing list pgsql-hackers

Previous

Next