Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date
Msg-id 4870.1322419417@sss.pgh.pa.us
Whole thread Raw
In response to Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
List pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> There's been some significant change in rangetypes_gist.c, can you
>> please rebase this patch?

> OK, rebased with head.

I looked at this patch a bit.  I agree with the aspect of it that says
"let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges"; I think we really need that in order to make
contained_by searches usable.  However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions.  I see two problems
there:

1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff.  This is not good, because you have
no idea what scale the subtype differences will be expressed on.  The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.

2. It's too large/complicated.  You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size.  I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.

There are a lot of garden-variety coding problems, too, for instance here:

+                 *penalty = Max(DatumGetFloat8(FunctionCall2(
+                     subtype_diff, orig_lower.val, new_lower.val)), 0.0);

which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument.  But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.

Earlier in the thread you wrote:

> Questions:
> 1) I'm not sure about whether we need support of <> in GiST, because it
> always produces full index scan (except search for non-empty ranges).

I was thinking the same thing; that opclass entry seems pretty darn
useless.

I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly".  The redesign of the penalty and picksplit
functions should be discussed separately.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alexander Soudakov
Date:
Subject: Re: Feature proposal: www_fdw
Next
From: Peter Eisentraut
Date:
Subject: information schema/aclexplode doesn't know about default privileges