Thread: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote: > I'd like to help where I can if nobody else is currently doing this. I > would focus initially on some analysis of the various use cases to give > a better view on what we would need B-tree, clustered indexes and bitmap > indexes to do for us. I've done some further analysis of bitmap indexes in preparation for a comparison with clustered indexes (GIT), to help understand the use cases for each. Overall, my conclusion is that BMI and GIT have separate use cases, almost opposite use cases or at least orthogonal ones. I would eventually like both. BMI optimises for high numbers of rows per value, whilst GIT optimises for clustering of values. BMI is not useful at all for PKs, whilst GIT is specifically designed to handle them. Both handle INSERTs well, though GIT handles growing numbers of values easily, BMI prefers to keep the distribution more constant. GIT needs HOT to continue to operate effectively for long periods, whereas BMI doesn't seem to handle UPDATEs well at all (but more testing required on that one). --- Neither the latest bitmap index nor the latest GIT patch applied cleanly. The bitmap patch was close, but GIT needs an update yet to integrate Alexey's recent work. My test case was a table with 10 million rows, with columns with varying numbers of unique values. So Ndistinct = 100 means 100,000 rows per value. BITMAP INDEXES Ndistinct Best time Size in blocks 1 10.6s 100 10 10.4s 102 100 11.7s 2002 1000 15.1s 6006 10000 19.8s 10046 100000 82.1s 100442 1000000 - >450000 Size exactly equivalent for both Integer and Text (same values). Build time was similar also. The test for 1 million distinct values didn't return after over 4 CPU minutes expended with the disk going crazy. After a number of minutes I decided to cancel the index build. Multiple cancels didn't stop the build, so after some more time I decided to kill it, which then crashed the server. Automatic restart crashed as well with a "could not find transaction id 0" error. Clearly some WAL-weirdness to investigate... Overall, I'd have to say that's quite enough for me to say bitmap is not quite ready yet without clear health warnings. I had hopes... B-TREE INDEXES (Integers) Rows/value Best time Size in blocks 10000000 49s 21899 1000000 49s 21899 100000 49s 21899 10000 47s 21899 1000 43s 21899 100 38s 21899 10 38s 21899 1 33s 21899 Build time for Integers shown. Build time for Text ~x5-6 times as long. Testing against equivalent b-tree builds, the fastest b-tree build I could get was 33s on a unique integer index. So BMI build time is certainly optimised for low numbers of distinct values, but doesn't have any optimisation for when the BMI is built on a poor candidate column. GIT does degrade down to a normal b-tree when clustering isn't sufficient to give reduction in index size. The cross-over point was between 10^4 and 10^5 distinct values for both size and build time; on that test around 100-1000 rows per value. So BMIs are probably still useful with varying number of rows per value, but overall high Ndistinct proves inefficient in both build time and space allocation. This isn't such a surprise since we know that b-tree build uses a sort-based plan whereas BMI uses a hash based plan; neither will win all of the time, we know that from the executor. GIT works well even with unique indexes, since each grouped tuple covers a range of values. I'll re-run the tests when I can to get timings. GIT can compress typically down to 1-5% with clustered data, not quite as good as bitmap's 0.5% best. GIT's design was to have an index that was tuned for clustered data, yet degrades cleanly to a standard b-tree when conditions are not right. This makes me think that a hybrid b-tree should be possible, even desirable. When the data is clustered, use the grouping technique to reduce he number of tuples stored and when the data is highly non-unique use the bitmap technique to reduce numbers of tuples. Using both techniques in the same index would offer even wider flexibility, since we'd be able to cater for real-world data more easily. Both GIT and BMI use bitmaps, just in different ways. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > ... BMI is not useful at all > for PKs, whilst GIT is specifically designed to handle them. This seems a strange statement, because GIT doesn't look particularly efficient for unique indexes AFAICS. In the worst case you'd have to look individually at each tuple on a heap page to check for uniqueness conflict (no binary search, because you couldn't assume they are ordered). > B-TREE INDEXES (Integers) > Rows/value Best time Size in blocks > 10000000 49s 21899 > 1000000 49s 21899 > 100000 49s 21899 > 10000 47s 21899 > 1000 43s 21899 > 100 38s 21899 > 10 38s 21899 > 1 33s 21899 Surely the GIT code failed to kick in at all here? That's just about exactly the index size I'd expect for 10 million integers with the existing btree code (at least when MAXALIGN=4). regards, tom lane
On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > ... BMI is not useful at all > > for PKs, whilst GIT is specifically designed to handle them. > > This seems a strange statement, because GIT doesn't look particularly > efficient for unique indexes AFAICS. In the worst case you'd have to > look individually at each tuple on a heap page to check for uniqueness > conflict (no binary search, because you couldn't assume they are > ordered). That is one of a few heuristics about the patch that need some active discussion, so I'm glad you asked. The main use case is nearly-unique, so for cases where we have a Master:Detail relationship, e.g. Order:OrderItem. The Order index is a PK, with the OrderItem index as a nearly unique key. The index is not brilliant for the Order index, but is good for the OrderItem index. Heikki designed the grouping so that there is a state change between non-grouped and non-grouped (normal) index entries. By default the patch uses a threshold of non-grouped -> grouped at N=2 index entries and then no limit on the number of rows/block. Currently you can tune N, but we might also envisage setting a limit on the width of the range of values to limit the number of tids stored in a grouped index entry. That could control the uniqueness overhead. On an I/O bound workload the space saving on the index outweighs the CPU loss from uniqueness checking. When I/O is not an issue then unfortunately there is a CPU overhead. For GIT it would appear that the summary is that it gives a slight loss on medium sized PK indexes and an increasing win as index size increases. We struggled to come up with ways of making it Just Work with as few parameters as possible. > > B-TREE INDEXES (Integers) > > > Rows/value Best time Size in blocks > > 10000000 49s 21899 > > 1000000 49s 21899 > > 100000 49s 21899 > > 10000 47s 21899 > > 1000 43s 21899 > > 100 38s 21899 > > 10 38s 21899 > > 1 33s 21899 > > Surely the GIT code failed to kick in at all here? That's just about > exactly the index size I'd expect for 10 million integers with the > existing btree code (at least when MAXALIGN=4). That was the b-tree test, i.e. the control. The GIT patch has bitrot, so not able to test just yet. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-07-23 at 23:11 +0100, Simon Riggs wrote: > On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote: > > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > ... BMI is not useful at all > > > for PKs, whilst GIT is specifically designed to handle them. > > > > This seems a strange statement, because GIT doesn't look particularly > > efficient for unique indexes AFAICS. In the worst case you'd have to > > look individually at each tuple on a heap page to check for uniqueness > > conflict (no binary search, because you couldn't assume they are > > ordered). > > That is one of a few heuristics about the patch that need some active > discussion, so I'm glad you asked. > > The main use case is nearly-unique, so for cases where we have a > Master:Detail relationship, e.g. Order:OrderItem. The Order index is a > PK, with the OrderItem index as a nearly unique key. The index is not > brilliant for the Order index, but is good for the OrderItem index. > > Heikki designed the grouping so that there is a state change between > non-grouped and non-grouped (normal) index entries. By default the patch > uses a threshold of non-grouped -> grouped at N=2 index entries and then > no limit on the number of rows/block. Currently you can tune N, but we > might also envisage setting a limit on the width of the range of values > to limit the number of tids stored in a grouped index entry. That could > control the uniqueness overhead. Possibly Heikki might add more here, but it occurs to me that I didn't mention two other things about uniqueness checking. The state change occurs when the block fills, so up to that point all the index entries are separate, so no additional uniqueness checking cost. When the state change does occur the highest value is always left as a singleton index entry, again to speed uniqueness checking. This copes with INSERTs, since the dominant use case is to have a similar-to-the-last-high-value or increasing key (for PKs). Lastly, GIT is designed to work in conjunction with HOT. When doing HOT updates there are no index insertions, so far fewer uniqueness checks need to be performed anyway. So overall, GIT is reasonably well suited to unique indexes. But I think you can see that these behaviours influence the performance considerably, even though they are just small parts of the patch. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
From
Heikki Linnakangas
Date:
Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: >> ... BMI is not useful at all >> for PKs, whilst GIT is specifically designed to handle them. > > This seems a strange statement, because GIT doesn't look particularly > efficient for unique indexes AFAICS. In the worst case you'd have to > look individually at each tuple on a heap page to check for uniqueness > conflict (no binary search, because you couldn't assume they are > ordered). It handles them in the sense that a clustered PK index is way smaller than a normal PK index. Unlike the bitmap index, which is not suitable for highly distinct columns. Inserting and performing a uniqueness check is more expensive on a clustered index, because as you said it needs to scan the heap page looking for conflicts. It's alleviated by the heuristics Simon mentioned; a page is "groupified" when only when it gets full, which means there'll usually be a mixture of normal and groupified tuples on a leaf page. In particular, if there's hot key values that are repeatedly inserted, the index tuples corresponding those key values are likely to stay as normal index tuples, and are therefore cheaper to check uniqueness against. Also IIRC, the patch tries to keep the last index tuple on a page as a normal index tuple, which catches the important special case of inserting monotonically increasing keys, like with a sequence-generated PK. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
From
Bruce Momjian
Date:
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote: > > > I'd like to help where I can if nobody else is currently doing this. I > > would focus initially on some analysis of the various use cases to give > > a better view on what we would need B-tree, clustered indexes and bitmap > > indexes to do for us. > > I've done some further analysis of bitmap indexes in preparation for a > comparison with clustered indexes (GIT), to help understand the use > cases for each. > > Overall, my conclusion is that BMI and GIT have separate use cases, > almost opposite use cases or at least orthogonal ones. I would > eventually like both. BMI optimises for high numbers of rows per value, > whilst GIT optimises for clustering of values. BMI is not useful at all > for PKs, whilst GIT is specifically designed to handle them. Both handle > INSERTs well, though GIT handles growing numbers of values easily, BMI > prefers to keep the distribution more constant. GIT needs HOT to > continue to operate effectively for long periods, whereas BMI doesn't > seem to handle UPDATEs well at all (but more testing required on that > one). > > --- > > Neither the latest bitmap index nor the latest GIT patch applied > cleanly. The bitmap patch was close, but GIT needs an update yet to > integrate Alexey's recent work. > > My test case was a table with 10 million rows, with columns with varying > numbers of unique values. So Ndistinct = 100 means 100,000 rows per > value. > > BITMAP INDEXES > > Ndistinct Best time Size in blocks > 1 10.6s 100 > 10 10.4s 102 > 100 11.7s 2002 > 1000 15.1s 6006 > 10000 19.8s 10046 > 100000 82.1s 100442 > 1000000 - >450000 > > Size exactly equivalent for both Integer and Text (same values). Build > time was similar also. > > The test for 1 million distinct values didn't return after over 4 CPU > minutes expended with the disk going crazy. After a number of minutes I > decided to cancel the index build. Multiple cancels didn't stop the > build, so after some more time I decided to kill it, which then crashed > the server. Automatic restart crashed as well with a "could not find > transaction id 0" error. Clearly some WAL-weirdness to investigate... > > Overall, I'd have to say that's quite enough for me to say bitmap is not > quite ready yet without clear health warnings. I had hopes... > > B-TREE INDEXES (Integers) > > Rows/value Best time Size in blocks > 10000000 49s 21899 > 1000000 49s 21899 > 100000 49s 21899 > 10000 47s 21899 > 1000 43s 21899 > 100 38s 21899 > 10 38s 21899 > 1 33s 21899 > > Build time for Integers shown. Build time for Text ~x5-6 times as long. > > Testing against equivalent b-tree builds, the fastest b-tree build I > could get was 33s on a unique integer index. So BMI build time is > certainly optimised for low numbers of distinct values, but doesn't have > any optimisation for when the BMI is built on a poor candidate column. > GIT does degrade down to a normal b-tree when clustering isn't > sufficient to give reduction in index size. > > The cross-over point was between 10^4 and 10^5 distinct values for both > size and build time; on that test around 100-1000 rows per value. So > BMIs are probably still useful with varying number of rows per value, > but overall high Ndistinct proves inefficient in both build time and > space allocation. This isn't such a surprise since we know that b-tree > build uses a sort-based plan whereas BMI uses a hash based plan; neither > will win all of the time, we know that from the executor. > > GIT works well even with unique indexes, since each grouped tuple covers > a range of values. I'll re-run the tests when I can to get timings. GIT > can compress typically down to 1-5% with clustered data, not quite as > good as bitmap's 0.5% best. > > GIT's design was to have an index that was tuned for clustered data, yet > degrades cleanly to a standard b-tree when conditions are not right. > This makes me think that a hybrid b-tree should be possible, even > desirable. When the data is clustered, use the grouping technique to > reduce he number of tuples stored and when the data is highly non-unique > use the bitmap technique to reduce numbers of tuples. Using both > techniques in the same index would offer even wider flexibility, since > we'd be able to cater for real-world data more easily. Both GIT and BMI > use bitmaps, just in different ways. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +