Re: [HACKERS] [PATCH] kNN for SP-GiST - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: [HACKERS] [PATCH] kNN for SP-GiST
Date
Msg-id c6957e2c-3052-4fce-55ac-2cf7bcfd0dd9@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] [PATCH] kNN for SP-GiST  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [HACKERS] [PATCH] kNN for SP-GiST  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers

Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

On 02.07.2018 19:06, Alexander Korotkov wrote:

Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On 06.03.2018 17:30, David Steele wrote:

I agree with Andres.  Pushing this patch to the next CF.
Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.
I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.
I decided to bring back the List-based scan stack in the previous version 
of the patches, and here is how spgAddSearchItemToQueue() looks like now:

static void
spgAddSearchItemToQueue(SpGistScanOpaque so, SpGistSearchItem *item)
{   if (so->queue)       pairingheap_add(so->queue, &item->phNode);   else       so->scanStack = lcons(item, so->scanStack);
}

so->queue is initialized in spgrescan() only for ordered searches.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.
Distances for NULL items are expected to be NULL (it would not be true for
non-strict the distance operators, so we do not seem to support them here),
and so NULL items are considered to be greater than any non-NULL items in
pairingheap_SpGistSearchItem_cmp().  Distances are copied into SpGistSearchItem 
only if it is non-NULL item.  Also in the last version of the patch I have
introduced spgAllocSearchItem() which conditionally allocates distance-array in
SpGistSearchItem.  Now inifinity distances are used only in one place, but
if we require that innerConsistent() should always return distances, then
we can completely get rid of so->infDistances field.   
* I think KNN support should be briefly described in
src/backed/access/spgist/README.
A minimal description written by the original author Vlad Sterzhanov
already present in README.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Non-reserved replication slots and slot advancing
Next
From: Srinivas Karthik V
Date:
Subject: Re: Bulk Insert into PostgreSQL