Proposal to improve multicolumn GiST page split algoriithm. - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Proposal to improve multicolumn GiST page split algoriithm.
Date
Msg-id 4487040D.7040002@sigaev.ru
Whole thread Raw
List pgsql-hackers
I'd like to base on paper "Generalizing "Search" in Generalized Search Trees" by 
Paul Aoki, section 4.1 "Multiple Key Support" 
(http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf)

Proposed algorithm (without details about nulls etc):
1) n = 1  - set column's number to first one
2) form vector of keys of n-th column
3) set left and right union keys to NULL (see below)
4) call user-defined pickSplit method for n-th column
5) if it was a last column then return - page is splitted
6) try to find keys on left page with zero penalty with right union key and   keys on right page with left union. Note,
wecheck only keys in n-th   column. Let M is a total number of keys with zero penalties with   opposite unions. So we
haveM keys/tuples which can be freely   distributed between left and right page. Penalty is calculated by call
user-definedPenalty() method for n-th column.
 
7) if M == 0 then return - page is splitted
8) n++
9) form vector of keys of n-th column, but use only tuples, determined in   step 6
10)set left and right union keys. Its forms from tuples which can't be   freely distributed between page. These tuples
aredetermined in step 6
 
11)go to step 4.


That algorithm requires to small change in interface of pickSplit() method.
It works with GIST_SPLITVEC structure. But step 6 requires that pickSplit()
should knows: are unions already formed? If not, pickSplit() should work
as now. If yes, pickSplit() must take in consideration formed left and right 
unions, and it can't to 'reduce' that unions.

I suggest add one boolean field to GIST_SPLITVEC: isSubSplit (suggest exact 
name, pls)
if isSubSplit == false then unions are not formed, pickSplit works as now.
if isSubSplit == true then unions are already formed, pickSplit should use
formed keys. So, after pickSplit() call isSubSplit should be always 'false', if 
it's not then GiST's core should say at least a warning about unsupported 
feature in opclass. BTW, in this case, GiST may compose old (with formed before) 
unions with suggested by pickSplit() by maximizing penalty between left and 
right keys.

Also, I plan to split GIST_SPLITVEC to 2 structures: one of its will be argument 
for pickSplit() and another will use internally in GiST core. Now GIST_SPLITVEC 
contains a lot of field that's needed only for core.

Thoughts, suggestions, objections?

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: How to avoid transaction ID wrap
Next
From: Teodor Sigaev
Date:
Subject: Snowball and ispell in tsearch2