Re: On-disk bitmap index patch - Mailing list pgsql-hackers

From Tom Lane
Subject Re: On-disk bitmap index patch
Date
Msg-id 1593.1153923961@sss.pgh.pa.us
Whole thread Raw
In response to Re: On-disk bitmap index patch  (Mark Kirkwood <markir@paradise.net.nz>)
Responses Re: On-disk bitmap index patch  (Mark Kirkwood <markir@paradise.net.nz>)
Re: On-disk bitmap index patch  ("Luke Lonergan" <llonergan@greenplum.com>)
List pgsql-hackers
I wrote:
>> ...  A realistic assumption for the numbers you mention is
>> that the bitmaps have 1-bits about 100 bits apart, which means you
>> could get maybe 3-to-1 compression using the runlength scheme that's
>> in there ... leaving the bitmap a factor of 20 bigger than the btree.

I'm surprised no one caught me making this bogus computation.  I
realized this morning it's wrong: if there are 10000 distinct values
then on the average the 1-bits would be about 10000 bits apart, not 100.
At that rate there *is* substantial compression.  The representation
used in the bitmap patch isn't amazingly efficient for this (I wonder
whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs) but it
still manages to get down to something like 11 bytes per 1-bit.  Since
each 1-bit corresponds to a btree entry, this means the bitmap does come
out somewhat smaller than a btree (16 bytes per entry ignoring overhead)
--- at least if overhead such as wasted space on a page doesn't kill it.

Still, this isn't going to make the difference between fitting in RAM or
not.  For small numbers of distinct values, like less than 100, the
bitmap representation looks more plausible.  In that range you're down
to a byte or two per entry and so there is (very roughly) a 10x storage
savings over btree.  I don't believe the 100x numbers that have been
bandied around in this discussion, but 10x is plenty enough to be
interesting.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Diogo Biazus"
Date:
Subject: xlogdump behaviour translating dropped relations
Next
From: "jkzhao"
Date:
Subject: default lower case of identifier