GiST: PickSplit and multi-attr indexes - Mailing list pgsql-hackers

From Neil Conway
Subject GiST: PickSplit and multi-attr indexes
Date
Msg-id 419723A7.5040505@samurai.com
Whole thread Raw
Responses Re: GiST: PickSplit and multi-attr indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Oleg & Teodor,

If I understand the code correctly, GiST will only pass the first 
attribute of each index tuple to the user-defined PickSplit method when 
it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first 
attribute in the index is sufficient to make a reasonable decision about 
how to divide the node into two halves, but I don't think that is 
universally true. For example, consider a two column index whose first 
attribute has a small number of distinct values. It could well be that 
all the first attribute values in a node-to-be-split would be the same. 
Only passing the first attribute to PickSplit would result in an 
essentially random distribution of tuples among the split nodes, rather 
than allowing the GiST extension to make use of the second attribution 
to partition the nodes. That's an extreme example, but it is easy to 
construct more realistic scenarios (basically, any situation in which 
the cardinality of the first index attribute is low -- a reasonably 
common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation: 
repeated invocations of PickSplit are one of the main factors in 
deriving the ultimate shape of the GiST tree. Poor distribution of keys 
resulting from PickSplit would eventually result in unnecessarily loose 
bounding predicates in internal nodes, which would hurt performance.

Comments welcome,

Neil


pgsql-hackers by date:

Previous
From: Alexander Antonakakis
Date:
Subject: Big Database
Next
From: Joachim Wieland
Date:
Subject: postmaster segfaults with HUGE table