Re: KNN-GiST with recheck - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: KNN-GiST with recheck
Date
Msg-id CAPpHfdtpam+Qah0U6R4_7wf+GFa+UDJ3B2603DcdXS6FYVe7+A@mail.gmail.com
Whole thread Raw
In response to Re: KNN-GiST with recheck  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Tue, Jan 28, 2014 at 9:32 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 01/28/2014 04:12 PM, Alexander Korotkov wrote:
On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:

4. (as you mentioned in the other thread: ) It's a modularity violation
that you peek into the heap tuple from gist. I think the proper way to do
this would be to extend the IndexScan executor node to perform the
re-shuffling of tuples that come from the index in wrong order, or perhaps
add a new node type for it.

Of course that's exactly what your partial sort patch does :-). I haven't
looked at that in detail, but I don't think the approach the partial sort
patch takes will work here as is. In the KNN-GiST case, the index is
returning tuples roughly in the right order, but a tuple that it returns
might in reality belong somewhere later in the ordering. In the partial
sort patch, the "input stream" of tuples is divided into non-overlapping
groups, so that the tuples within the group are not sorted, but the groups
are. I think the partial sort case is a special case of the KNN-GiST case,
if you consider the lower bound of each tuple to be the leading keys that
you don't need to sort.

Yes. But, for instance btree accesses heap for unique checking. Is really
it so crimilal? :-)

Well, it is generally considered an ugly hack in b-tree too. I'm not 100% opposed to doing such a hack in GiST, but would very much prefer not to.


This is not only question of a new node or extending existing node. We need
to teach planner/executor access method can return value of some expression
which is lower bound of another expression. AFICS now access method can
return only original indexed datums and TIDs. So, I afraid that enormous
infrastructure changes are required. And I can hardly imagine what they
should look like.

Yeah, I'm not sure either. Maybe a new field in IndexScanDesc, along with xs_itup. Or as an attribute of xs_itup itself.

This shouldn't look like a hack too. Otherwise I see no point of it: it's better to have some isolated hack in access method than hack in planner/executor. So I see following changes to be needed to implement this right way:
1) Implement new relation between operators: operator1 is lower bound of operator2.
2) Extend am interface to let it return values of operators.
3) Implement new node for knn-sorting.
However, it requires a lot of changes in PostgreSQL infrastructure and can appear to be not enough general too (we don't know until we have another application).

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Regression tests failing if not launched on db "regression"
Next
From: Alexander Korotkov
Date:
Subject: Re: GIN improvements part2: fast scan