On Mon, 2007-07-23 at 23:11 +0100, Simon Riggs wrote:
> On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> > > ... BMI is not useful at all
> > > for PKs, whilst GIT is specifically designed to handle them.
> >
> > This seems a strange statement, because GIT doesn't look particularly
> > efficient for unique indexes AFAICS. In the worst case you'd have to
> > look individually at each tuple on a heap page to check for uniqueness
> > conflict (no binary search, because you couldn't assume they are
> > ordered).
>
> That is one of a few heuristics about the patch that need some active
> discussion, so I'm glad you asked.
>
> The main use case is nearly-unique, so for cases where we have a
> Master:Detail relationship, e.g. Order:OrderItem. The Order index is a
> PK, with the OrderItem index as a nearly unique key. The index is not
> brilliant for the Order index, but is good for the OrderItem index.
>
> Heikki designed the grouping so that there is a state change between
> non-grouped and non-grouped (normal) index entries. By default the patch
> uses a threshold of non-grouped -> grouped at N=2 index entries and then
> no limit on the number of rows/block. Currently you can tune N, but we
> might also envisage setting a limit on the width of the range of values
> to limit the number of tids stored in a grouped index entry. That could
> control the uniqueness overhead.
Possibly Heikki might add more here, but it occurs to me that I didn't
mention two other things about uniqueness checking.
The state change occurs when the block fills, so up to that point all
the index entries are separate, so no additional uniqueness checking
cost. When the state change does occur the highest value is always left
as a singleton index entry, again to speed uniqueness checking. This
copes with INSERTs, since the dominant use case is to have a
similar-to-the-last-high-value or increasing key (for PKs).
Lastly, GIT is designed to work in conjunction with HOT. When doing HOT
updates there are no index insertions, so far fewer uniqueness checks
need to be performed anyway.
So overall, GIT is reasonably well suited to unique indexes. But I think
you can see that these behaviours influence the performance
considerably, even though they are just small parts of the patch.
-- Simon Riggs EnterpriseDB http://www.enterprisedb.com