Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> There's been some significant change in rangetypes_gist.c, can you
>> please rebase this patch?
> OK, rebased with head.
I looked at this patch a bit. I agree with the aspect of it that says
"let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges"; I think we really need that in order to make
contained_by searches usable. However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions. I see two problems
there:
1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff. This is not good, because you have
no idea what scale the subtype differences will be expressed on. The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.
2. It's too large/complicated. You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size. I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.
There are a lot of garden-variety coding problems, too, for instance here:
+ *penalty = Max(DatumGetFloat8(FunctionCall2(
+ subtype_diff, orig_lower.val, new_lower.val)), 0.0);
which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument. But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.
Earlier in the thread you wrote:
> Questions:
> 1) I'm not sure about whether we need support of <> in GiST, because it
> always produces full index scan (except search for non-empty ranges).
I was thinking the same thing; that opclass entry seems pretty darn
useless.
I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly". The redesign of the penalty and picksplit
functions should be discussed separately.
regards, tom lane