Re: CLUSTER and indisclustered - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: CLUSTER and indisclustered
Date
Msg-id 200208130425.g7D4PZm26976@candle.pha.pa.us
Whole thread Raw
In response to Re: CLUSTER and indisclustered  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: CLUSTER and indisclustered  (Hannu Krosing <hannu@tm.ee>)
List pgsql-hackers
I wanted to comment on this bitmapped index discussion because I am
hearing a lot about star joins, data warehousing, and bitmapped indexes
recently.

It seems we have several uses for bitmapped indexes:
Do index lookups in sequential heap orderAllow joining of bitmapped indexes to construct arbitrary indexes

There is a web page about "star joins" used a lot in data warehousing,
where you don't know what queries are going to be required and what
indexes to create:
http://www.dbdomain.com/a100397.htm

They show some sample queries, which is good.  Here is some
interesting text:
Star Transformation
If there are bitmap indexes on SALES_REP_ID, PRODUCT_ID, andDEPARTMENT_ID in the SALES table, then Oracle can resolve
thequeryusing merges of the bitmap indexes.Because Oracle can efficiently merge multiple bitmap indexes, you can create
asingle bitmap index on each of the foreign-key columns in thefact table rather than on every possible combination of
columns.Thislets you support all possible combinations of dimensions withoutcreating an unreasonable number of
indexes.

Added to TODO:* Use bitmaps to fetch heap pages in sequential order [performance]     * Use bitmaps to combine existing
indexes[performance]
 

and I will add some of these emails to TODO.detail/performance.

---------------------------------------------------------------------------

Tom Lane wrote:
> Curt Sampson <cjs@cynic.net> writes:
> > But after doing some benchmarking of various sorts of random reads
> > and writes, it occurred to me that there might be optimizations
> > that could help a lot with this sort of thing. What if, when we've
> > got an index block with a bunch of entries, instead of doing the
> > reads in the order of the entries, we do them in the order of the
> > blocks the entries point to?
> 
> I thought to myself "didn't I just post something about that?"
> and then realized it was on a different mailing list.  Here ya go
> (and no, this is not the first time around on this list either...)
> 
> 
> I am currently thinking that bitmap indexes per se are not all that
> interesting.  What does interest me is bitmapped index lookup, which
> came back into mind after hearing Ann Harrison describe how FireBird/
> InterBase does it.
> 
> The idea is that you don't scan the index and base table concurrently
> as we presently do it.  Instead, you scan the index and make a list
> of the TIDs of the table tuples you need to visit.  This list can
> be conveniently represented as a sparse bitmap.  After you've finished
> looking at the index, you visit all the required table tuples *in
> physical order* using the bitmap.  This eliminates multiple fetches
> of the same heap page, and can possibly let you get some win from
> sequential access.
> 
> Once you have built this mechanism, you can then move on to using
> multiple indexes in interesting ways: you can do several indexscans
> in one query and then AND or OR their bitmaps before doing the heap
> scan.  This would allow, for example, "WHERE a = foo and b = bar"
> to be handled by ANDing results from separate indexes on the a and b
> columns, rather than having to choose only one index to use as we do
> now.
> 
> Some thoughts about implementation: FireBird's implementation seems
> to depend on an assumption about a fixed number of tuple pointers
> per page.  We don't have that, but we could probably get away with
> just allocating BLCKSZ/sizeof(HeapTupleHeaderData) bits per page.
> Also, the main downside of this approach is that the bitmap could
> get large --- but you could have some logic that causes you to fall
> back to plain sequential scan if you get too many index hits.  (It's
> interesting to think of this as lossy compression of the bitmap...
> which leads to the idea of only being fuzzy in limited areas of the
> bitmap, rather than losing all the information you have.)
> 
> A possibly nasty issue is that lazy VACUUM has some assumptions in it
> about indexscans holding pins on index pages --- that's what prevents
> it from removing heap tuples that a concurrent indexscan is just about
> to visit.  It might be that there is no problem: even if lazy VACUUM
> removes a heap tuple and someone else then installs a new tuple in that
> same TID slot, you should be okay because the new tuple is too new to
> pass your visibility test.  But I'm not convinced this is safe.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql-server/src backend/tcop/postgres.cbacke
Next
From: Bruce Momjian
Date:
Subject: Re: [COMMITTERS] pgsql-server/src backend/tcop/postgres.cbacke