Thread: Bitmap index status
Hi, What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)? I've started to review it, to get it into CVS early in the 8.3 cycle. I just want to make sure that I'm working on the latest version. Beside the issues already discussed, I found two minor bugs: * pg_am says that bitmap am supports unique indexes, while it doesn't. Second, * race condition in _bitmap_inserttuple if two backends try to insert the same, new value. If they both find that there's no lov item for the key, and try to create one, one backend will get a duplicate key error on the lov index. Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki@enterprisedb.com> writes: > What's the status of the bitmap index patch? Have you worked on it since > the last posted patch > (http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)? Gavin and Jie have made major changes since that version (or at least they'd better have something to show for the month since then ;-)). I wouldn't recommend reviewing the patch until they post something current ... > Also, vacuum actually does a reindex, which seems awfully wasteful. That > needs to be looked at. Yikes. I imagine they've not tried to do anything about that; if you want to help, maybe you could take that subproblem? regards, tom lane
Hi Heikki, Gavin and I are trying to merge our changes together this week. We will post a new patch by the end of this week. This patch will include some style fixes, bug fixes, and the stream bitmap implementation. I will look into the problems you have mentioned in this email. Yes, vacuum currently does a reindex now. Gavin and I just talked about this yesterday. We are looking into ways to improve this. One way is not to do reindex for each vacuum. We maintain a list of updated tids along with the bitmap index. Only when this list goes to a certain point, vacuum will re-build the index. Thanks, Jie On 9/12/06 2:43 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Hi, > > What's the status of the bitmap index patch? Have you worked on it since > the last posted patch > (http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)? > > I've started to review it, to get it into CVS early in the 8.3 cycle. I > just want to make sure that I'm working on the latest version. > > Beside the issues already discussed, I found two minor bugs: > * pg_am says that bitmap am supports unique indexes, while it doesn't. > Second, > * race condition in _bitmap_inserttuple if two backends try to insert > the same, new value. If they both find that there's no lov item for the > key, and try to create one, one backend will get a duplicate key error > on the lov index. > > Also, vacuum actually does a reindex, which seems awfully wasteful. That > needs to be looked at.
Hi Heikki and all, I just sent the latest bitmap index patch to the list. I am not sure if there is any size limit for this mailing list. If you have received my previous email, please let me know. Thanks, Jie On 9/12/06 2:43 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Hi, > > What's the status of the bitmap index patch? Have you worked on it since > the last posted patch > (http://archives.postgresql.org/pgsql-patches/2006-08/msg00003.php)? > > I've started to review it, to get it into CVS early in the 8.3 cycle. I > just want to make sure that I'm working on the latest version. > > Beside the issues already discussed, I found two minor bugs: > * pg_am says that bitmap am supports unique indexes, while it doesn't. > Second, > * race condition in _bitmap_inserttuple if two backends try to insert > the same, new value. If they both find that there's no lov item for the > key, and try to create one, one backend will get a duplicate key error > on the lov index. > > Also, vacuum actually does a reindex, which seems awfully wasteful. That > needs to be looked at.
Jie Zhang wrote: > Hi Heikki and all, > > Please find the latest bitmap index patch in the attachment. This patch is > generated against the postgresql cvs head. > Thanks. The handling of stream and hash bitmaps looks pretty complicated to me. All the bitmap-related nodes have logic to handle both types slightly differently. It all seems to come down to that if a subnode (or amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the caller needs to call the subnode many times, until it returns a NULL. With a HashBitmap, the caller only calls the subnode once. I think amgetbitmap should be called just once per index scan, and it should return either a hash bitmap or a stream bitmap. The same applies to all the executor nodes that return bitmaps, they would only return a single HashBitmap or StreamBitmap, and the upper node would call tbm_iterate repeatedly on that. StreamBitmap would contain a callback (filled by the indexam) that tbm_iterate would call to fill the next TBMIterateResult. I would also move the code to AND and OR together different types of bitmaps to tidbitmap.c, so that BitmapAnd and BitmapOr nodes wouldn't need to care about different types either. tbm_intersect and tbm_union with two HashBitmaps would work like they do now. Calling tbm_intersect or tbm_union with two StreamBitmaps would return a new StreamBitmap object that would pull results from the two argument StreamBitmaps page at a time and AND or OR them together. Combining a HashBitmap and a StreamBitmap would wrap the HashBitmap with a new StreamBitmap that pulls the entries from the HashBitmap page at a time. What do you think? Would you like me to help implementing the above? > This patch includes code style changes, bug fixes, and the stream bitmap > implementation. I have also fixed the problems mentioned in Heikki's email. > It seems you fixed the race condition in inserttuple by holding a write lock on the metapage while you find/create the lov item. While correct, isn't that pretty bad for concurrency? I realize that the main use case for bitmap indexes is large data warehouses where you don't do concurrent updates, but still... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 19 Sep 2006, Heikki Linnakangas wrote: > Jie Zhang wrote: > > Hi Heikki and all, > > > > Please find the latest bitmap index patch in the attachment. This patch is > > generated against the postgresql cvs head. > > > > Thanks. > > The handling of stream and hash bitmaps looks pretty complicated to me. > All the bitmap-related nodes have logic to handle both types slightly > differently. It all seems to come down to that if a subnode (or > amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the > caller needs to call the subnode many times, until it returns a NULL. > With a HashBitmap, the caller only calls the subnode once. > > I think amgetbitmap should be called just once per index scan, and it > should return either a hash bitmap or a stream bitmap. The same applies > to all the executor nodes that return bitmaps, they would only return a > single HashBitmap or StreamBitmap, and the upper node would call > tbm_iterate repeatedly on that. > > StreamBitmap would contain a callback (filled by the indexam) that > tbm_iterate would call to fill the next TBMIterateResult. Right, this was the approach taken by an earlier version of the patch I had worked on. It was significantly uglified by the need to keep the index state around and to be careful about what amrescan might do behind out back. I will, however, introduce the idea again because it makes the code much cleaner and more logical, as you seem to suggest. Thanks, Gavin
On 9/19/06 5:15 AM, "Gavin Sherry" <swm@linuxworld.com.au> wrote: > On Tue, 19 Sep 2006, Heikki Linnakangas wrote: > >> Jie Zhang wrote: >>> Hi Heikki and all, >>> >>> Please find the latest bitmap index patch in the attachment. This patch is >>> generated against the postgresql cvs head. >>> >> >> Thanks. >> >> The handling of stream and hash bitmaps looks pretty complicated to me. >> All the bitmap-related nodes have logic to handle both types slightly >> differently. It all seems to come down to that if a subnode (or >> amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the >> caller needs to call the subnode many times, until it returns a NULL. >> With a HashBitmap, the caller only calls the subnode once. >> >> I think amgetbitmap should be called just once per index scan, and it >> should return either a hash bitmap or a stream bitmap. The same applies >> to all the executor nodes that return bitmaps, they would only return a >> single HashBitmap or StreamBitmap, and the upper node would call >> tbm_iterate repeatedly on that. >> >> StreamBitmap would contain a callback (filled by the indexam) that >> tbm_iterate would call to fill the next TBMIterateResult. > > Right, this was the approach taken by an earlier version of the patch I > had worked on. It was significantly uglified by the need to keep the index > state around and to be careful about what amrescan might do behind out > back. I will, however, introduce the idea again because it makes the code > much cleaner and more logical, as you seem to suggest. I will think about this approach. But I am not quite convinced that this approach will be simpler and cleaner than the above approach. :-) Thanks, Jie
Jie Zhang wrote: > Hi Heikki and all, > > I just sent the latest bitmap index patch to the list. I am not sure if > there is any size limit for this mailing list. If you have received my > previous email, please let me know. Hi Jie, I know I said I was going to get testing on this months ago but I've been juggling between 3 systems due to disk failures and other hardware configuration issues. Anyways, I've take a baseline run of only the power test using a 1GB database with the patch 09-17 patch against a snapshot of pgsql from 2006-09-17: http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/ Do you think the 1GB scale factor will be sufficient for testing as it will certainly be faster? Do you think testing with just a power test will be sufficient for now? I really don't have a good reason why I didn't run a throughput test other than to save time. :) I also wanted to get your opinion again on which indexes we will want to try first. Thanks, Mark
Gavin & Heikki, >> >> The handling of stream and hash bitmaps looks pretty complicated to me. >> All the bitmap-related nodes have logic to handle both types slightly >> differently. It all seems to come down to that if a subnode (or >> amgetbitmap in a bitmap index scan node) returns a StreamBitmap, the >> caller needs to call the subnode many times, until it returns a NULL. >> With a HashBitmap, the caller only calls the subnode once. >> >> I think amgetbitmap should be called just once per index scan, and it >> should return either a hash bitmap or a stream bitmap. The same applies >> to all the executor nodes that return bitmaps, they would only return a >> single HashBitmap or StreamBitmap, and the upper node would call >> tbm_iterate repeatedly on that. >> >> StreamBitmap would contain a callback (filled by the indexam) that >> tbm_iterate would call to fill the next TBMIterateResult. > > Right, this was the approach taken by an earlier version of the patch I > had worked on. It was significantly uglified by the need to keep the index > state around and to be careful about what amrescan might do behind out > back. I will, however, introduce the idea again because it makes the code > much cleaner and more logical, as you seem to suggest. > I have been thinking about this approach some more. I do agree that this is more attractive now. The following includes some more detailed design. Please let me know if you have any comments. (My apologies to Gavin. You talked to me about this approach before. But you introduced some on-disk bitmap specific code into the tidbitmap.c, which prevented me from looking more in this direction.) Essentially, we want to have a stream bitmap object that has an iterator, which will be able to iterate through the bitmaps. The BitmapIndexscan, BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate through the bitmaps. The StreamBitmap structure will look like below. struct StreamBitmap { NodeTag type; /* to make it a valid Node */ PagetableEntry entry; /* apage of tids in this stream bitmap */ /* the iterator function */ void (*next)(StreamBitmap*); Node* state; /* store how this streambitmap generated, and all necessary information to obtain the next stream bitmap. */ }; Two new state objects will look like below. At the same time, we introduce three new node types: T_StreamBitmapAND, T_StreamBitmapOR, And T_StreamBitmapIndex, to define different states. /** Stores the necessary information for iterating through the stream bitmaps* generated by nodeBitmapAnd or nodeBitmapOr.*/ struct StreamBitmapOp { NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */ List* bitmaps; }; /** Stores some necessary information for iterating through the stream* bitmaps generated by nodeBitmapIndexscan.*/ struct StreamBitmapIndex { NodeTag type; /* handle T_StreamBitmapIndex */ IndexScanDesc scan; BlockNumber nextBlockNo;/* next block no to be read */ }; Then we will have the iterator functions like the following: void StreamBitmapAndNext(StreamBitmap* node) { tbm_intersect_stream(((StreampBitmapOp*) node->state)->bitmaps, node); } void StreamBitmapOrNext(StreamBitmap* node) { tbm_union_stream(((StreampBitmapOp*) node->state)->bitmaps, node); } void StreamBitmapIndexNext(StreamBitmap* node) { StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state; amgetbitmap(sbi->scan,NULL, sbi->nextBlockNo); } What do you think? Thanks, Jie
Hi Mark, Thanks for doing the test. I checked out the link you provided below. I am a little confused about the goal of these tests. Do you plan to test the overall performance of postgreSQL on handling TPC-H queries? Thanks, Jie On 9/22/06 3:45 PM, "Mark Wong" <markw@osdl.org> wrote: > Jie Zhang wrote: >> Hi Heikki and all, >> >> I just sent the latest bitmap index patch to the list. I am not sure if >> there is any size limit for this mailing list. If you have received my >> previous email, please let me know. > > Hi Jie, > > I know I said I was going to get testing on this months ago but I've > been juggling between 3 systems due to disk failures and other hardware > configuration issues. Anyways, I've take a baseline run of only the > power test using a 1GB database with the patch 09-17 patch against a > snapshot of pgsql from 2006-09-17: > > http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/ > > Do you think the 1GB scale factor will be sufficient for testing as it > will certainly be faster? Do you think testing with just a power test > will be sufficient for now? I really don't have a good reason why I > didn't run a throughput test other than to save time. :) I also wanted > to get your opinion again on which indexes we will want to try first. > > Thanks, > Mark >
Hi Jie, Yeah, basically gather as many stats as I can to accurately profile the overall system performance. I thought it would be appropriate to use a TPC-H based workload as one measuring stick to use for bitmap indexes. Mark Jie Zhang wrote: > Hi Mark, > > Thanks for doing the test. I checked out the link you provided below. I am a > little confused about the goal of these tests. Do you plan to test the > overall performance of postgreSQL on handling TPC-H queries? > > Thanks, > Jie > > On 9/22/06 3:45 PM, "Mark Wong" <markw@osdl.org> wrote: > >> Jie Zhang wrote: >>> Hi Heikki and all, >>> >>> I just sent the latest bitmap index patch to the list. I am not sure if >>> there is any size limit for this mailing list. If you have received my >>> previous email, please let me know. >> Hi Jie, >> >> I know I said I was going to get testing on this months ago but I've >> been juggling between 3 systems due to disk failures and other hardware >> configuration issues. Anyways, I've take a baseline run of only the >> power test using a 1GB database with the patch 09-17 patch against a >> snapshot of pgsql from 2006-09-17: >> >> http://dbt.osdl.org/dbt/dbt3testing/results/dev8-007/2/ >> >> Do you think the 1GB scale factor will be sufficient for testing as it >> will certainly be faster? Do you think testing with just a power test >> will be sufficient for now? I really don't have a good reason why I >> didn't run a throughput test other than to save time. :) I also wanted >> to get your opinion again on which indexes we will want to try first. >> >> Thanks, >> Mark >>
Mark, On 9/25/06 11:32 AM, "Mark Wong" <markw@osdl.org> wrote: > Yeah, basically gather as many stats as I can to accurately profile the > overall system performance. I thought it would be appropriate to use a > TPC-H based workload as one measuring stick to use for bitmap indexes. Note that the TPC-H queries don't follow the typical good use case for bitmap indexes. You'd like to see queries that use multiple AND and OR clauses, otherwise there may be no benefit. Also, DBT-3/TPC-H on Postgres right now does not benefit from indices overall. The planner has limitations WRT selectivity estimates and other limitations that cause it to choose index access poorly for the query workload. We have two new features coming (for 8.3) that fix this, but for now we find that indexes are a net loss, in some queries a huge loss. If you look at the whitepaper that Ayush Parashar published, he uses the TPC-H data with some targeted queries that showcase the best use-cases for bitmap index. - Luke
Looks a bit better now, though I think you need to think more about the encapsulation of the structs. More detailed comments below. Jie Zhang wrote: > Essentially, we want to have a stream bitmap object that has an iterator, > which will be able to iterate through the bitmaps. The BitmapIndexscan, > BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a > hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate > through > the bitmaps. > > The StreamBitmap structure will look like below. > > struct StreamBitmap { > NodeTag type; /* to make it a valid Node */ > PagetableEntry entry; /* a page of tids in this stream bitmap */ > > /* the iterator function */ > void (*next)(StreamBitmap*); > Node* state; /* store how this stream bitmap generated, > and all necessary information to > obtain the next stream bitmap. */ > }; I'd suggest making state just a (void *). It's private to the producer of the bitmap, and I don't see a reason to expose it. I assume that the next-function fills in the PageTableEntry with the next set of tids. > Two new state objects will look like below. At the same time, we introduce > three new node types: T_StreamBitmapAND, T_StreamBitmapOR, > And T_StreamBitmapIndex, to define different states. > > /* > * Stores the necessary information for iterating through the stream > bitmaps > * generated by nodeBitmapAnd or nodeBitmapOr. > */ > struct StreamBitmapOp { > NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */ > List* bitmaps; > }; AFAICS, this struct is private to tidbitmap.c, where the new stream-enabled tbm_intersect etc. functions are defined. Also, I don't see a reason why it needs to by a valid Node. > /* > * Stores some necessary information for iterating through the stream > * bitmaps generated by nodeBitmapIndexscan. > */ > struct StreamBitmapIndex { > NodeTag type; /* handle T_StreamBitmapIndex */ > IndexScanDesc scan; > BlockNumber nextBlockNo;/* next block no to be read */ > }; Where would this struct be defined? I think different index access methods might want to have different kind of states. The struct above assumes that the position of an index scan is always represented by the nextBlockNo. That seems to be the right thing for the bitmap indexam, so this struct is fine for StreamBitmaps returned by bmgetbitmap, but not necessary for others. > Then we will have the iterator functions like the following: > > ... > > void StreamBitmapIndexNext(StreamBitmap* node) { > StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state; > amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo); > } This means that the amgetbitmap function would still be called many times in each scan. What would amgetbitmap return? A new StreamBitmap on each call? I'd like to see just one call to amgetbitmap. It would a) fill in the hash bitmap passed to it, b) return a new hash bitmap, or c) return a new StreamBitmap, with a indexam specific next-function that returns the tids one page at a time. I think we'll also need some kind of a destructor in StreamBitmap that's called by the consumer of the bitmap after it's done with it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 26 Sep 2006, Heikki Linnakangas wrote: > Looks a bit better now, though I think you need to think more about the > encapsulation of the structs. More detailed comments below. > > Jie Zhang wrote: > > Essentially, we want to have a stream bitmap object that has an iterator, > > which will be able to iterate through the bitmaps. The BitmapIndexscan, > > BitmapAnd, BitmapOr will be executed once and return a streamp bitmap or a > > hash bitmap. The BitmapHeapscan then calls tbm_iterate() to iterate > > through > > the bitmaps. > > > > The StreamBitmap structure will look like below. > > > > struct StreamBitmap { > > NodeTag type; /* to make it a valid Node */ > > PagetableEntry entry; /* a page of tids in this stream bitmap */ > > > > /* the iterator function */ > > void (*next)(StreamBitmap*); > > Node* state; /* store how this stream bitmap generated, > > and all necessary information to > > obtain the next stream bitmap. */ > > }; > > I'd suggest making state just a (void *). It's private to the producer > of the bitmap, and I don't see a reason to expose it. I assume that the > next-function fills in the PageTableEntry with the next set of tids. > > > Two new state objects will look like below. At the same time, we introduce > > three new node types: T_StreamBitmapAND, T_StreamBitmapOR, > > And T_StreamBitmapIndex, to define different states. > > > > /* > > * Stores the necessary information for iterating through the stream > > bitmaps > > * generated by nodeBitmapAnd or nodeBitmapOr. > > */ > > struct StreamBitmapOp { > > NodeTag type; /* handles T_StreamBitmapAND and T_StreamBitmapOR */ > > List* bitmaps; > > }; > > AFAICS, this struct is private to tidbitmap.c, where the new > stream-enabled tbm_intersect etc. functions are defined. Also, I don't > see a reason why it needs to by a valid Node. Well, making it a valid nodes makes it easy to identify (IsA) and gives us access to copy/equal frameworks. I do think that it is best to bury this in the bitmap code however. > > * Stores some necessary information for iterating through the stream > > * bitmaps generated by nodeBitmapIndexscan. > > */ > > struct StreamBitmapIndex { > > NodeTag type; /* handle T_StreamBitmapIndex */ > > IndexScanDesc scan; > > BlockNumber nextBlockNo;/* next block no to be read */ > > }; > > Where would this struct be defined? I think different index access > methods might want to have different kind of states. The struct above > assumes that the position of an index scan is always represented by the > nextBlockNo. That seems to be the right thing for the bitmap indexam, so > this struct is fine for StreamBitmaps returned by bmgetbitmap, but not > necessary for others. right. > > > Then we will have the iterator functions like the following: > > > > ... > > > > void StreamBitmapIndexNext(StreamBitmap* node) { > > StreamBitmapIndex* sbi = (StreamBitmapIndex*) node->state; > > amgetbitmap(sbi->scan, NULL, sbi->nextBlockNo); > > } > > This means that the amgetbitmap function would still be called many > times in each scan. What would amgetbitmap return? A new StreamBitmap > on each call? > > I'd like to see just one call to amgetbitmap. It would a) fill in the > hash bitmap passed to it, b) return a new hash bitmap, or c) return a > new StreamBitmap, with a indexam specific next-function that returns the > tids one page at a time. I think we'll also need some kind of a > destructor in StreamBitmap that's called by the consumer of the bitmap > after it's done with it. Right, I agree. I am working on this now. Thanks, gavin
Luke Lonergan wrote: > Mark, > > On 9/25/06 11:32 AM, "Mark Wong" <markw@osdl.org> wrote: > >> Yeah, basically gather as many stats as I can to accurately profile the >> overall system performance. I thought it would be appropriate to use a >> TPC-H based workload as one measuring stick to use for bitmap indexes. > > Note that the TPC-H queries don't follow the typical good use case for > bitmap indexes. You'd like to see queries that use multiple AND and OR > clauses, otherwise there may be no benefit. Oh right, people keep telling me that and it keeps going in one ear and out the other... > Also, DBT-3/TPC-H on Postgres right now does not benefit from indices > overall. The planner has limitations WRT selectivity estimates and other > limitations that cause it to choose index access poorly for the query > workload. We have two new features coming (for 8.3) that fix this, but for > now we find that indexes are a net loss, in some queries a huge loss. Great, I'll keep my eye for those. :) > If you look at the whitepaper that Ayush Parashar published, he uses the > TPC-H data with some targeted queries that showcase the best use-cases for > bitmap index. Mark