Re: Bitmap index status - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bitmap index status
Date
Msg-id 4518F016.8080709@enterprisedb.com
Whole thread Raw
In response to Re: Bitmap index status  ("Jie Zhang" <jzhang@greenplum.com>)
Responses Re: Bitmap index status  (Gavin Sherry <swm@linuxworld.com.au>)
List pgsql-hackers
Looks a bit better now, though I think you need to think more about the 
encapsulation of the structs. More detailed comments below.

Jie Zhang wrote:
> Essentially, we want to have a stream bitmap object that has an iterator,
> which will be able to iterate through the bitmaps. The BitmapIndexscan,
> BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a
> hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate 
> through
> the bitmaps.
>
> The StreamBitmap structure will look like below.
>
> struct StreamBitmap {
> NodeTag type; /* to make it a valid Node */
> PagetableEntry entry; /* a page of tids in this stream bitmap */
>
> /* the iterator function */
> void (*next)(StreamBitmap*);
> Node* state; /* store how this stream bitmap generated,
> and all necessary information to
> obtain the next stream bitmap. */
> };

I'd suggest making state just a (void *). It's private to the producer 
of the bitmap, and I don't see a reason to expose it. I assume that the 
next-function fills in the PageTableEntry with the next set of tids.

> Two new state objects will look like below. At the same time, we introduce
> three new node types: T_StreamBitmapAND, T_StreamBitmapOR,
> And T_StreamBitmapIndex, to define different states.
>
> /*
> * Stores the necessary information for iterating through the stream 
> bitmaps
> * generated by nodeBitmapAnd or nodeBitmapOr.
> */
> struct StreamBitmapOp {
> NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */
> List* bitmaps;
> };

AFAICS, this struct is private to tidbitmap.c, where the new 
stream-enabled tbm_intersect etc. functions are defined. Also, I don't 
see a reason why it needs to by a valid Node.

> /*
> * Stores some necessary information for iterating through the stream
> * bitmaps generated by nodeBitmapIndexscan.
> */
> struct StreamBitmapIndex {
> NodeTag type; /* handle T_StreamBitmapIndex */
> IndexScanDesc scan;
> BlockNumber nextBlockNo;/* next block no to be read */
> };

Where would this struct be defined? I think different index access 
methods might want to have different kind of states. The struct above 
assumes that the position of an index scan is always represented by the 
nextBlockNo. That seems to be the right thing for the bitmap indexam, so 
this struct is fine for StreamBitmaps returned by bmgetbitmap, but not 
necessary for others.

> Then we will have the iterator functions like the following:
>
> ...
>
> void StreamBitmapIndexNext(StreamBitmap* node) {
> StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state;
> amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo);
> }

This means that the amgetbitmap function would still be called many 
times in each scan.  What would amgetbitmap return? A new StreamBitmap 
on each call?

I'd like to see just one call to amgetbitmap. It would a) fill in the 
hash bitmap passed to it, b) return a new hash bitmap, or c) return a 
new StreamBitmap, with a indexam specific next-function that returns the 
tids one page at a time. I think we'll also need some kind of a 
destructor in StreamBitmap that's called by the consumer of the bitmap 
after it's done with it.

-- 
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Jeremy Drake
Date:
Subject: Re: [PATCHES] large object regression tests
Next
From: Joachim Wieland
Date:
Subject: Re: Buildfarm alarms