Re: KNN-GiST with recheck - Mailing list pgsql-hackers

From Emre Hasegeli
Subject Re: KNN-GiST with recheck
Date
Msg-id 20140803134809.GA58181@hasegeli-2.local
Whole thread Raw
In response to Re: KNN-GiST with recheck  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: KNN-GiST with recheck
Re: KNN-GiST with recheck
List pgsql-hackers
> > 1. This patch introduces a new "polygon <-> point" operator. That seems
> > useful on its own, with or without this patch.
>
> Yeah, but exact-knn cant come with no one implementation. But it would
> better come in a separate patch.

I tried to split them.  Separated patches are attached.  I changed
the order of the arguments as point <-> polygon, because point was
the first one on all the others.  Its commutator was required for
the index, so I added it on the second patch.  I also added tests
for the operator.  I think it is ready for committer as a separate
patch.  We can add it to the open CommitFest.

I have made some cosmetic changes on the patches.  I hope they are
useful.

I added support to point <-> circle operator with the same GiST
distance function you added for polygon.  I can change it, if it is not
the right way.

> > 2. I wonder how useful it really is to allow mixing exact and non-exact
> > return values from the distance function. The distance function included in
> > the patch always returns recheck=true. I have a feeling that all other
> > distance function will also always return either true or false.
>
> For geometrical datatypes recheck variations in consistent methods are also
> very rare (I can't remember any). But imagine opclass for arrays where keys
> have different representation depending on array length. For such opclass
> and knn on similarity recheck flag could be useful.

I also wonder how useful it is.  Your example is convincing, but maybe
setting it index-wide will make the decisions on the framework easier.
For example, how hard would it be to decide if further sorting is
required or not on the planner?

> > 4. (as you mentioned in the other thread: ) It's a modularity violation
> > that you peek into the heap tuple from gist. I think the proper way to do
> > this would be to extend the IndexScan executor node to perform the
> > re-shuffling of tuples that come from the index in wrong order, or perhaps
> > add a new node type for it.
> >
> > Of course that's exactly what your partial sort patch does :-). I haven't
> > looked at that in detail, but I don't think the approach the partial sort
> > patch takes will work here as is. In the KNN-GiST case, the index is
> > returning tuples roughly in the right order, but a tuple that it returns
> > might in reality belong somewhere later in the ordering. In the partial
> > sort patch, the "input stream" of tuples is divided into non-overlapping
> > groups, so that the tuples within the group are not sorted, but the groups
> > are. I think the partial sort case is a special case of the KNN-GiST case,
> > if you consider the lower bound of each tuple to be the leading keys that
> > you don't need to sort.
>
> Yes. But, for instance btree accesses heap for unique checking. Is really
> it so crimilal? :-)
> This is not only question of a new node or extending existing node. We need
> to teach planner/executor access method can return value of some expression
> which is lower bound of another expression. AFICS now access method can
> return only original indexed datums and TIDs. So, I afraid that enormous
> infrastructure changes are required. And I can hardly imagine what they
> should look like.

Unfortunately, I am not experienced enough to judge your implementation.
As far as I understand the problem is partially sorting rows on
the index scan node.  It can lead the planner to choose non-optimal
plans, because of not taking into account the cost of sorting.

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Proposed changing the definition of decade for date_trunc and extract
Next
From: Kevin Grittner
Date:
Subject: Re: Usability improvements for pg_stop_backup()