Re: knngist - 0.8 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: knngist - 0.8
Date
Msg-id 27958.1290378240@sss.pgh.pa.us
Whole thread Raw
In response to Re: knngist - 0.8  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: knngist - 0.8
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> 2010/11/12 Teodor Sigaev <teodor@sigaev.ru>:
>> My variants informs GiST by SK_ORDER flags and consistentFn looks at
>> strategy number (strategy numbers are different for different purposes).

> Yeah.  At ten thousand feet, I think the open design question here is
> to what extent it's OK to rely on the fact that the ORDER BY clauses
> we wish to optimize happen to look a lot like the WHERE clauses we
> already know how to optimize: namely, they're both binary opclauses of
> the form <indexed-column> <op> <constant>.  Your patch manages to
> reuse a LOT of existing machinery by shoving ordering expressions
> through the same code paths that quals take.  Code reuse is generally
> a good thing, but here's we're forming RestrictInfo and ScanKey
> objects out of things that are neither restrictions nor keys, which
> might lead to maintainability problems down the road.  I'd like to get
> some input from Tom on how he feels about that, and any alternatives
> he sees.

I haven't spent any time on this patch yet (hope to start looking at it
next week).  As for your specific question above, I don't have a big
problem with reusing ScanKey this way, but I do agree that using
RestrictInfo for this would be a crock.  ISTM what we ought to have is
just the ability to match PathKeys with expressions of the form
"indexedcol op constant" to an index.

I'm undecided about the big-picture question of how much extra
generality ought to be put into the system along with this patch.
The argument not to is that with no candidate uses of additional
generality on the horizon, it's a waste of time to design something
more general, because we'll probably get it wrong anyway.  I'm not
a fan of designing APIs without use-cases in mind.  On the other hand,
there's an argument *for* doing something more general, which is
basically Polya's paradox: the more general problem may be easier to
solve.  To support that argument, we'd need a design that is clearly
cleaner than bolting KNNGIST on according to the current patch.
AIUI we don't have that at the moment, but I still think it's worth
spending a bit more time looking for one.

> It seems to me that our concept of ScanDirection is really woefully
> under-expressive.  For example, given:

> CREATE TABLE foo (
>     id integer NOT NULL,
>     name character varying NOT NULL,
>     PRIMARY KEY (id)
> );

> We use the index for the first of these but not the second:

> select * from foo order by id nulls last;
> select * from foo order by id nulls first;

> In an ideal world, we'd like to handle the second one by finding the
> first non-NULL entry in the index, scanning away from the NULLs, and
> then, when we run off the end, looping back around to spit out the
> NULL entries.

This example leaves me totally cold, not least because it assumes a
specific storage implementation for nulls in an index.  It is also,
I think, misunderstanding what ScanDirection is for.  That's only
intended to allow executor plans to be run forward and then backed up
in the same fashion that fetching backwards from a cursor would do;
which is not a btree-specific concept, indeed not even index-specific.
If there is sufficient interest in doing what you suggest, what we'd
want to do is pass the PathKey representation to the index and let the
index AM figure out what it has to do to produce that sort order.  But
that is way way down my priority list.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Spread checkpoint sync
Next
From: Vaibhav Kaushal
Date:
Subject: Re: Fwd: What do these terms mean in the SOURCE CODE?