Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date
Msg-id CAPpHfduv5bwwXgdErZvneX4bT0hDb6MLznk63+Ht9vGKBE2n1Q@mail.gmail.com
Whole thread Raw
In response to Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi!

New version of patch is attached.
On Thu, Dec 22, 2011 at 11:46 AM, Jeff Davis <pgsql@j-davis.com> wrote:
A few comments:

* In range_gist_picksplit, it would be nice to have a little bit more
intuitive description of what's going on with the nonEmptyCount and
nonInfCount numbers. For instance, it appears to depend on the fact that
a range must either be in nonEmptyCount or in nonInfCount. Also, can you
describe the reason you're multiplying by two and taking the absolute
value? It seems to work, but I am missing the intuition behind those
operations.
total_count - 2*nonEmptyCount = (total_count - nonEmptyCount) - nonEmptyCount = emptyCount - nonEmptyCount
So, it's really not evident. I've simplified it.

 
* The penalty function is fairly hard to read still. At a high level, I
think we're trying to accomplish a few things (in order from most to
least important):
 - Keep normal ranges separate.
 - Avoid broadening the class of the original predicate (e.g. turning
single-infinite into double-infinite).
 - Avoid broadening (as determined by subtype_diff) the original
predicate.
 - Favor adding ranges to narrower original predicates.

Do you agree? If so, perhaps we can attack those more directly and it
might be a little more readable.

Additionally, the arbitrary numbers might become a problem. Can we
choose better constants there? They would still be arbitrary when
compared with real numbers derived from subtype_diff, but maybe we can
still do better than what's there.
I've changes some comments and add constants for penalty values.
 
* Regarding the leftover "common" entries that can go to either side:
what is the "delta" measure trying to accomplish? When I work through
some examples, it seems to favor putting larger common ranges on the
left (small delta) and smaller common ranges on the right (smaller
delta). Why is that good? Or did I misread the code?

Intuitively, I would think that we'd want to push the ranges with lower
upper bounds to the left and higher lower bounds to the right -- in
other words, recurse. Obviously, we'd need to make sure it terminated at
some point, but splitting the common entries does seem like a smaller
version of the original problem. Thoughts?
That was a bug. Actually, no "abs" is needed. Indeed it doesn't affect result significantly.

-----
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: New replication mode: write
Next
From: Heikki Linnakangas
Date:
Subject: Re: Client Messages