Re: Question about indexes - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Question about indexes
Date
Msg-id 003701c3e5ec$44306250$efb887d9@LaptopDellXP
Whole thread Raw
In response to Re: Question about indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Some potentially helpful background comments on the discussion so far...

>Tom Lane writes
>>Greg Stark writes
>> Note that the space saving of bitmap indexes is still a substantial 
>> factor.
>I think you are still confusing what I'm talking about with a bitmap
index, >ie, a persistent structure on-disk.  It's not that at all, but a
transient >structure built in-memory during an index scan.

Oracle allows the creation of bitmap indices as persistent data
structures. 

The "space saving" of bitmap indices is only a saving when compared with
btree indices. If you don't have them at all because they are built
dynamically when required, as Tom is suggesting, then you "save" even
more space. 

Maintaining the bitmap index is a costly operation. You tend to want to
build them on "characteristic" columns, of which there tends to be more
of in a database than "partial/full identity" columns on which you build
btrees (forgive the vagueness of that comment), so you end up with loads
of the damn things, so the space soon adds up. It can be hard to judge
which ones are the important ones, especially when each is used by a
different user/group. Building them dynamically is a good way of solving
the question "which ones are needed?". Ever seen 58 indices on a table?
Don't go there.

My vote would be implement the dynamic building capability, then return
to implement a persisted structure later if that seems like it would be
a further improvement. [The option would be nice]

If we do it dynamically, as Tom suggests, then we don't have to code the
index maintenance logic at all and the functionality will be with us all
the sooner. Go Tom!

>Tom Lane writes
> In any case, this discussion is predicated on the assumption that the
> operations involving the bitmap are a significant fraction of the
total
> time, which I think is quite uncertain.  Until we build it and profile
> it, we won't know that.

Dynamically building the bitmaps has been the strategy in use by
Teradata for nearly a decade on many large datawarehouses. I can
personally vouch for the effectiveness of this approach - I was
surprised when Oracle went for the persistent option. Certainly in that
case building the bitmaps adds much less time than is saved overall by
the better total query strategy.

>Greg Stark writes
> > To build the in-memory bitmap you effectively have to do a sort.

Not sure on this latter point: I think I agree with Greg on that point,
but want to believe Tom because requiring a sort will definitely add
time. 

To shed some light in this area, some other major implementations are:

In Teradata, tables are stored based upon a primary index, which is
effectively an index-organised table. The index pointers are stored in
sorted order lock step with the blocks of the associated table - No sort
required. (The ordering is based upon a hashed index, but that doesn't
change the technique).

Oracle's tables/indexes use heaps/btrees also, though they do provide an
index-organised table feature similar to Teradata. Maybe the lack of
heap/btree consistent ordering in Oracle and their subsequent design
choice of persistent bitmap indices is an indication for PostgreSQL too?

In Oracle, bitmap indices are an important precursor to the star join
technique. AFAICS it is still possible to have a star join plan without
having persistent bitmap indices. IMHO, the longer term goal of a good
star join plan is an important one - that may influence the design
selection for this discussion.

Hope some of that helps,

Best regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Write cache
Next
From: "Simon Riggs"
Date:
Subject: Re: lock related issues...