Thread: Bitmap indexes?

Bitmap indexes?

From
Greg Copeland
Date:
One of the reasons why I originally stated following the hackers list is
because I wanted to implement bitmap indexes.  I found in the archives,
the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
was extracted from this,

http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1,
archivethread. 

At any rate, that was some number of months ago.  I've started looking
at the results posted from their bitmap GiST efforts and found that they
were being tested rather poorly to be of real value (I also found it
annoying that the project seems to quote other people's work without
giving credit).  Nonetheless, I thought I'd post to find out if anyone
feels there is still a need for this?  That is, I'm not really sure that
data warehousing or DDS systems are currently very common with Postgres.

If the group here still see value in adding various types of bitmap
support, can someone please point me to some documentation.  I had
several bookmarked but lost then when X crashed.  Anything that outlines
cache strategy, index support, am overview, and any other documentation
that would help excel my understanding of the code as well as the
various structure relationships would be wonderful?

Oh yes, one last question, is the required method for adding index
support via GiST?  I ask because it seems to me that inserts could be
exceptionally expensive, though as usual, I still have more to look at.

Thanks,Greg


P.S.  And yes, I have been reading lots of code...including
walk-throughs!  :P



Re: Bitmap indexes?

From
Bruce Momjian
Date:
Greg Copeland wrote:

Checking application/pgp-signature: FAILURE
-- Start of PGP signed section.
> One of the reasons why I originally stated following the hackers list is
> because I wanted to implement bitmap indexes.  I found in the archives,
> the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
> was extracted from this,
>
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1,
archivethread.
 
> 
> At any rate, that was some number of months ago.  I've started looking
> at the results posted from their bitmap GiST efforts and found that they
> were being tested rather poorly to be of real value (I also found it
> annoying that the project seems to quote other people's work without
> giving credit).  Nonetheless, I thought I'd post to find out if anyone
> feels there is still a need for this?  That is, I'm not really sure that
> data warehousing or DDS systems are currently very common with Postgres.
> 
> If the group here still see value in adding various types of bitmap
> support, can someone please point me to some documentation.  I had
> several bookmarked but lost then when X crashed.  Anything that outlines
> cache strategy, index support, am overview, and any other documentation
> that would help excel my understanding of the code as well as the
> various structure relationships would be wonderful?

The only thing I know is that there is discussion of bitmap indexes on
the TODO list linked to from the 'bitmap index' item.  I also remember
that the intarray code in /contrib sort of simulates bitmapped indexes,
or something like that.  :-)

> Oh yes, one last question, is the required method for adding index
> support via GiST?  I ask because it seems to me that inserts could be
> exceptionally expensive, though as usual, I still have more to look at.

I think we would recommend GIST because it is easier.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


GIST

From
"Christopher Kings-Lynne"
Date:
> > Oh yes, one last question, is the required method for adding index
> > support via GiST?  I ask because it seems to me that inserts could be
> > exceptionally expensive, though as usual, I still have more to look at.
>
> I think we would recommend GIST because it is easier.

Would someone be able to explain to me exactly what GIST is?  I thought it
was just a _type_ of index, but is it actually a generalised index-creating
framework?  Do other DMBSs use it?  Is it a cool thing I could talk about
when I give my talk at UWA tommorrow?

Chris



Re: GIST

From
Greg Copeland
Date:
Here's a good spring board for information.  They actually have some
pretty cool tools available to help develop.  Using the libgist stuff
makes it pretty easy to prototype and play with various index schemes of
your own creation and the debugger lets you rapidly view the storage
results.

http://gist.cs.berkeley.edu/



On Wed, 2002-03-13 at 20:48, Christopher Kings-Lynne wrote:
> > > Oh yes, one last question, is the required method for adding index
> > > support via GiST?  I ask because it seems to me that inserts could be
> > > exceptionally expensive, though as usual, I still have more to look at.
> >
> > I think we would recommend GIST because it is easier.
>
> Would someone be able to explain to me exactly what GIST is?  I thought it
> was just a _type_ of index, but is it actually a generalised index-creating
> framework?  Do other DMBSs use it?  Is it a cool thing I could talk about
> when I give my talk at UWA tommorrow?
>
> Chris
>


Re: GIST

From
Teodor Sigaev
Date:
Read a collect of articles at bottom of http://www.sai.msu.su/~megera/postgres/gist/

Christopher Kings-Lynne wrote:
>>>Oh yes, one last question, is the required method for adding index
>>>support via GiST?  I ask because it seems to me that inserts could be
>>>exceptionally expensive, though as usual, I still have more to look at.
>>>
>>I think we would recommend GIST because it is easier.
>>
> 
> Would someone be able to explain to me exactly what GIST is?  I thought it
> was just a _type_ of index, but is it actually a generalised index-creating
> framework?  Do other DMBSs use it?  Is it a cool thing I could talk about
> when I give my talk at UWA tommorrow?
> 
> Chris
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 
> 


-- 
Teodor Sigaev
teodor@stack.net




Re: Bitmap indexes?

From
Oleg Bartunov
Date:
Greg,

if you're still in bitmap indices you may take a look on our
contrib/intarray module.
Regards,
    Oleg

On 13 Mar 2002, Greg Copeland wrote:

> One of the reasons why I originally stated following the hackers list is
> because I wanted to implement bitmap indexes.  I found in the archives,
> the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
> was extracted from this,
>
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1,
archivethread.
 
>
> At any rate, that was some number of months ago.  I've started looking
> at the results posted from their bitmap GiST efforts and found that they
> were being tested rather poorly to be of real value (I also found it
> annoying that the project seems to quote other people's work without
> giving credit).  Nonetheless, I thought I'd post to find out if anyone
> feels there is still a need for this?  That is, I'm not really sure that
> data warehousing or DDS systems are currently very common with Postgres.
>
> If the group here still see value in adding various types of bitmap
> support, can someone please point me to some documentation.  I had
> several bookmarked but lost then when X crashed.  Anything that outlines
> cache strategy, index support, am overview, and any other documentation
> that would help excel my understanding of the code as well as the
> various structure relationships would be wonderful?
>
> Oh yes, one last question, is the required method for adding index
> support via GiST?  I ask because it seems to me that inserts could be
> exceptionally expensive, though as usual, I still have more to look at.
>
> Thanks,
>     Greg
>
>
> P.S.  And yes, I have been reading lots of code...including
> walk-throughs!  :P
>
>
>



Re: GIST

From
Oleg Bartunov
Date:
Christopher,

I'm sorry it's too late, but I haven't receive any messages from
postgres mailing lists for a month (don't know why). Just found your
message in archives.

GiST is a great thing, it's generalised search tree invented by
Hellerstein in 1995. It allows you to define custom data type,
index access to them and custom queries.
I have a little intro about GiST (not finished yest)
http://www.sai.msu.su/~megera/postgres/gist/doc/intro.html
Also, at the bottom of http://www.sai.msu.su/~megera/postgres/gist/
there are several seminal papers for reading.
Regards,
    Oleg



On Thu, 14 Mar 2002, Christopher Kings-Lynne wrote:

> > > Oh yes, one last question, is the required method for adding index
> > > support via GiST?  I ask because it seems to me that inserts could be
> > > exceptionally expensive, though as usual, I still have more to look at.
> >
> > I think we would recommend GIST because it is easier.
>
> Would someone be able to explain to me exactly what GIST is?  I thought it
> was just a _type_ of index, but is it actually a generalised index-creating
> framework?  Do other DMBSs use it?  Is it a cool thing I could talk about
> when I give my talk at UWA tommorrow?
>
> Chris
>
>



Re: Bitmap indexes?

From
Matthew Kirkwood
Date:
On Tue, 19 Mar 2002, Oleg Bartunov wrote:

Sorry to reply over you, Oleg.

> On 13 Mar 2002, Greg Copeland wrote:
>
> > One of the reasons why I originally stated following the hackers list is
> > because I wanted to implement bitmap indexes.  I found in the archives,
> > the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
> > was extracted from this,
> >
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1,
archivethread.
 

For every case I have used a bitmap index on Oracle, a
partial index[0] made more sense (especialy since it
could usefully be compound).

Our troublesome case (on Oracle) is a table of "events"
where maybe fifty to a couple of hundred are "published"
(ie. web-visible) at any time.  The events are categorised
by sport (about a dozen) and by "event type" (about five).
We never really query events except by PK or by sport/type/
published.

We make a bitmap index on "published", and trust Oracle to
use it correctly, and hope that our other indexes are also
useful.

On Postgres[1] we would make a partial compound index:

create index ... on events(sport_id,event_type_id)
where published='Y';

Matthew.

[0] Is this a postgres-only feature; my tame Oracle and   Sybase DBAs had never heard of such a thing, but   were
ratherimpressed at the idea.
 
[1] Disclaimer.  Our system doesn't run on PG, though I   do have a nearly equivalent prototype system which   does.
I'dlove to hear any success (or otherwise)   stories about PG partial indexes.
 



Re: Bitmap indexes?

From
Greg Copeland
Date:
On Tue, 2002-03-19 at 15:30, Matthew Kirkwood wrote:
> On Tue, 19 Mar 2002, Oleg Bartunov wrote:
>
> Sorry to reply over you, Oleg.
>
> > On 13 Mar 2002, Greg Copeland wrote:
> >
> > > One of the reasons why I originally stated following the hackers list is
> > > because I wanted to implement bitmap indexes.  I found in the archives,
> > > the follow link, http://www.it.iitb.ernet.in/~rvijay/dbms/proj/, which
> > > was extracted from this,
> > >
http://groups.google.com/groups?hl=en&threadm=01C0EF67.5105D2E0.mascarm%40mascari.com&rnum=1&prev=/groups%3Fq%3Dbitmap%2Bindex%2Bgroup:comp.databases.postgresql.hackers%26hl%3Den%26selm%3D01C0EF67.5105D2E0.mascarm%2540mascari.com%26rnum%3D1,
archivethread. 
>
> For every case I have used a bitmap index on Oracle, a
> partial index[0] made more sense (especialy since it
> could usefully be compound).

That's very true, however, often bitmap indexes are used where partial
indexes may not work well.  It maybe you were trying to apply the cure
for the wrong disease.  ;)

>
> Our troublesome case (on Oracle) is a table of "events"
> where maybe fifty to a couple of hundred are "published"
> (ie. web-visible) at any time.  The events are categorised
> by sport (about a dozen) and by "event type" (about five).
> We never really query events except by PK or by sport/type/
> published.

The reason why bitmap indexes are primarily used for DSS and data
wherehousing applications is because they are best used on extremely
large to very large tables which have low cardinality (e.g, 10,000,000
rows having 200 distinct values).  On top of that, bitmap indexes also
tend to be much smaller than their "standard" cousins.  On large and
very tables tables, this can sometimes save gigs in index space alone
(serious space benefit).  Plus, their small index size tends to result
in much less I/O (serious speed benefit).  This, of course, can result
in several orders of magnitude speed improvements when index scans are
required.  As an added bonus, using AND, OR, XOR and NOT predicates are
exceptionally fast and if implemented properly, can even take advantage
of some 64-bit hardware for further speed improvements.  This, of
course, further speeds look ups.  The primary down side is that inserts
and updates to bitmap indexes are very costly (comparatively) which is,
yet again, why they excel in read-only environments (DSS & data
wherehousing).

It should also be noted that RDMS's, such as Oracle, often use multiple
types of bitmap indexes.  This further impedes insert/update
performance, however, the additional bitmap index types usually allow
for range predicates while still making use of the bitmap index.  If I'm
not mistaken, several other types of bitmaps are available as well as
many ways to encode and compress (rle, quad compression, etc) bitmap
indexes which further save on an already compact indexing scheme.

Given the proper problem domain, index bitmaps can be a big win.

>
> We make a bitmap index on "published", and trust Oracle to
> use it correctly, and hope that our other indexes are also
> useful.
>
> On Postgres[1] we would make a partial compound index:
>
> create index ... on events(sport_id,event_type_id)
> where published='Y';


Generally speaking, bitmap indexes will not serve you very will on
tables having a low row counts, high cardinality or where they are
attached to tables which are primarily used in an OLTP capacity.
Situations where you have a low row count and low cardinality or high
row count and high cardinality tend to be better addressed by partial
indexes; which seem to make much more sense.  In your example, it sounds
like you did "the right thing"(tm).  ;)


Greg