Home > mailing lists

GiST: PickSplit and multi-attr indexes - Mailing list pgsql-hackers

From	Neil Conway
Subject	GiST: PickSplit and multi-attr indexes
Date	November 14, 2004 12:22:03
Msg-id	419723A7.5040505@samurai.com Whole thread Raw
Responses	Re: GiST: PickSplit and multi-attr indexes (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Oleg & Teodor,

If I understand the code correctly, GiST will only pass the first 
attribute of each index tuple to the user-defined PickSplit method when 
it wants to split a node. (see circa line 1269 of gist.c)

Is this a wise design decision? Granted, in many situations the first 
attribute in the index is sufficient to make a reasonable decision about 
how to divide the node into two halves, but I don't think that is 
universally true. For example, consider a two column index whose first 
attribute has a small number of distinct values. It could well be that 
all the first attribute values in a node-to-be-split would be the same. 
Only passing the first attribute to PickSplit would result in an 
essentially random distribution of tuples among the split nodes, rather 
than allowing the GiST extension to make use of the second attribution 
to partition the nodes. That's an extreme example, but it is easy to 
construct more realistic scenarios (basically, any situation in which 
the cardinality of the first index attribute is low -- a reasonably 
common occurrence with a multi-column index, I believe).

I'm not sure whether this would be a problem in practice. Speculation: 
repeated invocations of PickSplit are one of the main factors in 
deriving the ultimate shape of the GiST tree. Poor distribution of keys 
resulting from PickSplit would eventually result in unnecessarily loose 
bounding predicates in internal nodes, which would hurt performance.

Comments welcome,

Neil

pgsql-hackers by date:

From: Alexander Antonakakis
Date: 14 November 2004, 11:02:42
Subject: Big Database

From: Joachim Wieland
Date: 14 November 2004, 12:33:30
Subject: postmaster segfaults with HUGE table

GiST: PickSplit and multi-attr indexes - Mailing list pgsql-hackers

Previous

Next