Re: knngist - 0.8 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: knngist - 0.8 |
Date | |
Msg-id | 27958.1290378240@sss.pgh.pa.us Whole thread Raw |
In response to | Re: knngist - 0.8 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: knngist - 0.8
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > 2010/11/12 Teodor Sigaev <teodor@sigaev.ru>: >> My variants informs GiST by SK_ORDER flags and consistentFn looks at >> strategy number (strategy numbers are different for different purposes). > Yeah. At ten thousand feet, I think the open design question here is > to what extent it's OK to rely on the fact that the ORDER BY clauses > we wish to optimize happen to look a lot like the WHERE clauses we > already know how to optimize: namely, they're both binary opclauses of > the form <indexed-column> <op> <constant>. Your patch manages to > reuse a LOT of existing machinery by shoving ordering expressions > through the same code paths that quals take. Code reuse is generally > a good thing, but here's we're forming RestrictInfo and ScanKey > objects out of things that are neither restrictions nor keys, which > might lead to maintainability problems down the road. I'd like to get > some input from Tom on how he feels about that, and any alternatives > he sees. I haven't spent any time on this patch yet (hope to start looking at it next week). As for your specific question above, I don't have a big problem with reusing ScanKey this way, but I do agree that using RestrictInfo for this would be a crock. ISTM what we ought to have is just the ability to match PathKeys with expressions of the form "indexedcol op constant" to an index. I'm undecided about the big-picture question of how much extra generality ought to be put into the system along with this patch. The argument not to is that with no candidate uses of additional generality on the horizon, it's a waste of time to design something more general, because we'll probably get it wrong anyway. I'm not a fan of designing APIs without use-cases in mind. On the other hand, there's an argument *for* doing something more general, which is basically Polya's paradox: the more general problem may be easier to solve. To support that argument, we'd need a design that is clearly cleaner than bolting KNNGIST on according to the current patch. AIUI we don't have that at the moment, but I still think it's worth spending a bit more time looking for one. > It seems to me that our concept of ScanDirection is really woefully > under-expressive. For example, given: > CREATE TABLE foo ( > id integer NOT NULL, > name character varying NOT NULL, > PRIMARY KEY (id) > ); > We use the index for the first of these but not the second: > select * from foo order by id nulls last; > select * from foo order by id nulls first; > In an ideal world, we'd like to handle the second one by finding the > first non-NULL entry in the index, scanning away from the NULLs, and > then, when we run off the end, looping back around to spit out the > NULL entries. This example leaves me totally cold, not least because it assumes a specific storage implementation for nulls in an index. It is also, I think, misunderstanding what ScanDirection is for. That's only intended to allow executor plans to be run forward and then backed up in the same fashion that fetching backwards from a cursor would do; which is not a btree-specific concept, indeed not even index-specific. If there is sufficient interest in doing what you suggest, what we'd want to do is pass the PathKey representation to the index and let the index AM figure out what it has to do to produce that sort order. But that is way way down my priority list. regards, tom lane
pgsql-hackers by date: