Re: Cube Index Size - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Cube Index Size
Date
Msg-id BANLkTinRfzBz=ygsO+fckxN5sn62YVQ4qg@mail.gmail.com
Whole thread Raw
In response to Re: Cube Index Size  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Wed, Jun 1, 2011 at 3:37 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
My guess is that the picksplit algorithm performs poorly with that data. Unfortunately, I have no idea how to improve that.

Current cube picksplit function have no storage utilization guarantees, while original Guttman's picksplit has them (if one of group size reaches some threshold, then all other entries go to another group). Also, current picksplit is mix of Guttman's linear and quadratic algorithms. It picks seeds quadratically, but distributes entries linearly. 
I see following ways of solving picksplit problem for cube:
1) Add storage utilization guarantees to current picksplit. It may cause increase of overlaps, but should descrease index size.
2) Add storage utilization guarantees to current picksplit and replace entries distribution algorithm to the quadratic one. Picksplit will take more time, but it should give more stable and predictable result.
3) I had some experiments with my own picksplit algorithm, which showed pretty good results on tests which I've run. But current implementation is dirty and it's require more testing.

 ------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Dave Page
Date:
Subject: Re: pg_listener in 9.0
Next
From: Andrew Dunstan
Date:
Subject: Re: pg_listener in 9.0