Thread: [HACKERS] [PATCH] kNN for SP-GiST

[HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:
Hello hackers,

I'd like to present a series of patches which is a continuation of
Vlad Sterzhanov's work on kNN for SP-GiST that was done for GSOC'14.

Original thread: "KNN searches support for SP-GiST [GSOC'14]"
https://www.postgresql.org/message-id/CAK2aJC0-Jhrb3UjOZL+arBCHoTVwbeJVpRN-JOfEnN0vA6+MZQ@mail.gmail.com


I've done the following:
   * rebased onto HEAD
   * fixed several bugs
   * fixed indentation and unnecessary whitespace changes
   * implemented logic for order-by in spgvalidate() and spgproperty()
   * used pairing_heap instead of RBTree for the priority queue
     (as it was done earlier in GiST)
   * used existing traversalValue* fields instead of the newly added suppValue* fields
   * added caching of support functions infos
   * added recheck support for distances
   * refactored code
      - simplified some places
      - some functions were moved from spgproc.c to spgscan.c
        (now spgproc.c contains only one public function spg_point_distance()
         that can be moved somewhere and this file can be removed then)
      - extracted functions spgTestLeafTuple(), spgMakeInnerItem()
   * added new opclasses for circles and polygons with order-by-distance support
    (it was originally intended for kNN recheck testing)
   * improved tests for point_ops


Below is a very brief description of the patches:
1. Extracted two subroutines from GiST code (patch is the same as in "kNN for btree").
2. Exported two box functions that are used in patch 5.
3. Extracted subroutine from GiST code that is used in patch 4.
4. Main patch: added kNN support to SP-GiST.
5. Added ordering operators to kd_point_ops and quad_point_ops.
6. Added new SP-GiST support function spg_compress (gist_compress analogue)
     for compressed (lossy) representation of types in SP-GiST, used in patch 7.
7. Added new SP-GiST quad-tree opclasses for circles and polygons
     (reused existing quad-tree box_ops).

If you want to see the step-by-step sequence of changes applied to the original patch,
you can see it on my github: https://github.com/glukhovn/postgres/commits/knn_spgist

Please note that the tests for circle_ops and poly_ops will not work until
the а bug found in SP-GiST box_ops will not be fixed
(see https://www.postgresql.org/message-id/9ea5b157-478c-8874-bc9b-050076b7d042%40postgrespro.ru).

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for its name.  GiST supports ordering only by float4 and float8 distances.  SP-GiST also goes that way.  But in the future, access methods could support distances of other types.
Thus, I suggest to rename function to  index_store_float8_orderby_distances().

* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly. 

patching file doc/src/sgml/spgist.sgml
patching file src/backend/access/spgist/Makefile
patching file src/backend/access/spgist/README
patching file src/backend/access/spgist/spgscan.c
Hunk #5 succeeded at 242 with fuzz 2.
Hunk #11 FAILED at 861.
1 out of 11 hunks FAILED -- saving rejects to file src/backend/access/spgist/spgscan.c.rej
patching file src/backend/access/spgist/spgtextproc.c
patching file src/backend/access/spgist/spgutils.c
Hunk #3 succeeded at 65 (offset 1 line).
Hunk #4 succeeded at 916 (offset 1 line).
patching file src/backend/access/spgist/spgvalidate.c
patching file src/include/access/spgist.h
patching file src/include/access/spgist_private.h
Hunk #4 FAILED at 191.
Hunk #5 succeeded at 441 (offset -230 lines).
1 out of 5 hunks FAILED -- saving rejects to file src/include/access/spgist_private.h.rej

I had to apply failed chunks manually.  While trying to compile that I faced compile error.

spgtextproc.c:89:7: error: no member named 'suppLen' in 'struct spgConfigOut'
        cfg->suppLen = 0; /* we don't need any supplimentary data */
        ~~~  ^

I noticed that this line was removed in the next patch.  After removing it, I faced following error.

make[4]: *** No rule to make target `spgproc.o', needed by `objfiles.txt'.  Stop.

After removing spgproc.o from src/backend/access/spgist/Makefile, I finally succeed to compile it.


*************** AUTHORS
*** 371,373 ****
--- 376,379 ----
  
      Teodor Sigaev <teodor@sigaev.ru>
      Oleg Bartunov <oleg@sai.msu.su>
+     Vlad Sterzhanov <gliderok@gmail.com>

I don't think it worth mentioning here GSoC student who made just dirty prototype and abandon this work just after final evaluation.

More generally, you switched SP-GiST from scan stack to pairing heap.  We should check if it introduces overhead to non-KNN scans.  Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now.

* 0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch

I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration]
                return get_float8_nan();
                       ^
1 warning generated.

* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable, but completely separate feature.  And it should be posted and discussed separately.
  
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
David Steele
Date:
Hi Nikita,

On 3/9/17 8:52 AM, Alexander Korotkov wrote:
>
> I take a look to this patchset.  My first notes are following.

This thread has been idle for quite a while.  Please respond and/or post 
a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be 
marked "Returned with Feedback".

Thanks,
-- 
-David
david@pgmasters.net



Re: [PATCH] kNN for SP-GiST

From
David Steele
Date:
On 3/21/17 11:45 AM, David Steele wrote:
> Hi Nikita,
>
> On 3/9/17 8:52 AM, Alexander Korotkov wrote:
>>
>> I take a look to this patchset.  My first notes are following.
>
> This thread has been idle for quite a while.  Please respond and/or post
> a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
> marked "Returned with Feedback".

This submission has been marked "Returned with Feedback".  Please feel 
free to resubmit to a future commitfest.

Regards,
-- 
-David
david@pgmasters.net



Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:

Attached 3rd version of kNN for SP-GiST.


On 09.03.2017 16:52, Alexander Korotkov wrote:

Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for its name.  GiST supports ordering only by float4 and float8 distances.  SP-GiST also goes that way.  But in the future, access methods could support distances of other types.
Thus, I suggest to rename function to  index_store_float8_orderby_distances().
This function was renamed.

* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.
Patches were rebased onto current master.


*************** AUTHORS
*** 371,373 ****
--- 376,379 ----
  
      Teodor Sigaev <teodor@sigaev.ru>
      Oleg Bartunov <oleg@sai.msu.su>
+     Vlad Sterzhanov <gliderok@gmail.com>

I don't think it worth mentioning here GSoC student who made just dirty prototype and abandon this work just after final evaluation.
Student's mention was removed.

More generally, you switched SP-GiST from scan stack to pairing heap.  We should check if it introduces overhead to non-KNN scans. 
I decided to bring back scan stack for unordered scans.

Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now.

* 0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch

I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration]
                return get_float8_nan();
                       ^
1 warning generated.
Fixed.

* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable, but completely separate feature.  And it should be posted and discussed separately. 
Compress method for SP-GiST with poly_ops already have been committed, so I left only ordering operators for polygons in the last 6th patch.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andres Freund
Date:
Hi,

On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:
> Attached 3rd version of kNN for SP-GiST.

Given that this was submitted to the last v11 CF, after not being
developed for a year, I think it's unfortunately too late for v11. As we
should be concentrating on getting things into v11, I think it'd be
appropriate to move this to the next CF.

Greetings,

Andres Freund


Re: Re: [HACKERS] [PATCH] kNN for SP-GiST

From
David Steele
Date:
Hi Nikita,

On 3/2/18 1:35 AM, Andres Freund wrote:
> 
> On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:
>> Attached 3rd version of kNN for SP-GiST.
> 
> Given that this was submitted to the last v11 CF, after not being
> developed for a year, I think it's unfortunately too late for v11. As we
> should be concentrating on getting things into v11, I think it'd be
> appropriate to move this to the next CF.

I agree with Andres.  Pushing this patch to the next CF.

Regards,
-- 
-David
david@pgmasters.net


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:
On 06.03.2018 17:30, David Steele wrote:

> I agree with Andres.  Pushing this patch to the next CF.

Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> On 06.03.2018 17:30, David Steele wrote:
>
> > I agree with Andres.  Pushing this patch to the next CF.
>
> Attached 4th version of the patches rebased onto the current master.
> Nothing interesting has changed from the previous version.

I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.

* I think KNN support should be briefly described in
src/backed/access/spgist/README.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:

Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

On 02.07.2018 19:06, Alexander Korotkov wrote:

Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
On 06.03.2018 17:30, David Steele wrote:

I agree with Andres.  Pushing this patch to the next CF.
Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.
I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.
I decided to bring back the List-based scan stack in the previous version 
of the patches, and here is how spgAddSearchItemToQueue() looks like now:

static void
spgAddSearchItemToQueue(SpGistScanOpaque so, SpGistSearchItem *item)
{   if (so->queue)       pairingheap_add(so->queue, &item->phNode);   else       so->scanStack = lcons(item, so->scanStack);
}

so->queue is initialized in spgrescan() only for ordered searches.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.
Distances for NULL items are expected to be NULL (it would not be true for
non-strict the distance operators, so we do not seem to support them here),
and so NULL items are considered to be greater than any non-NULL items in
pairingheap_SpGistSearchItem_cmp().  Distances are copied into SpGistSearchItem 
only if it is non-NULL item.  Also in the last version of the patch I have
introduced spgAllocSearchItem() which conditionally allocates distance-array in
SpGistSearchItem.  Now inifinity distances are used only in one place, but
if we require that innerConsistent() should always return distances, then
we can completely get rid of so->infDistances field.   
* I think KNN support should be briefly described in
src/backed/access/spgist/README.
A minimal description written by the original author Vlad Sterzhanov
already present in README.

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andrey Borodin
Date:
Hi!


> 4 июля 2018 г., в 3:21, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
> Attached 5th version of the patches, where minor refactoring of distance
> handling was done (see below).
>

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small
notices.

Following directives can be omitted:
#include "lib/rbtree.h"
#include "storage/relfilenode.h"

This message is not noly GiST related, is it?
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is
lossy");

Some small typos:
usefull -> useful
nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)

Best regards, Andrey Borodin.

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andrey Borodin
Date:
Hi!

> I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small
notices.

I've passed through the code one more time. Here are few more questions:
1. Logic behind division of the patch into steps is described last time 2017-01-30, but ISTM actual steps have changed
sincethan? May I ask you to write a bit about steps of the patchset? 
2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need to update it's behavior somehow to be
usedas-is? Or to support new functionality? 
3. There is a patch about predicate locking in SP-GiST [0] Is this KNN patch theoretically compatible with predicate
locking?Seems like it is, I just want to note that this functionality may exist. 
4. Scan state now have scanStack and queue. May be it's better to name scanStack and scanQueue or stack and queue?

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/14/1215/

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:
Attached  6th version of the patches.

On 09.07.2018 20:47, Andrey Borodin wrote:

>> 4 июля 2018 г., в 3:21, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
>> Attached 5th version of the patches, where minor refactoring of distance
>> handling was done (see below).
> I'm reviewing this patch. Currently I'm trying to understand sp-gist scan
> deeeper, but as for now have some small notices.

Thank your for your review.

>
> Following directives can be omitted:
> #include "lib/rbtree.h"
> #include "storage/relfilenode.h"
>
> This message is not noly GiST related, is it?
> elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is
lossy");
>
> Some small typos:
> usefull -> useful
> nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)

Fixed.

On 10.07.2018 20:09, Andrey Borodin wrote:

> I've passed through the code one more time. Here are few more questions:
> 1. Logic behind division of the patch into steps is described last time
> 2017-01-30, but ISTM actual steps have changed since than? May I ask you
> to write a bit about steps of the patchset?

A brief description of the current patch set:
1. Exported two box functions that are used in patch 5.
2. Extracted subroutine index_store_float8_orderby_distances() from GiST code
    that is common for GiST, SP-GiST, and possibly other kNN implementations
    (used in patch 4).
3. Extracted two subroutines from GiST code common for gistproperty() and
    spgistproperty() (used in path 4).
4. The main patch: added kNN support to SP-GiST.
    This patch in theory can be split into two patches: refactoring and kNN
    support, but it will require some additional effort.
    Refactoring basically consists in the extraction of spgInnerTest(),
    spgTestLeafTuple(), spgMakeInnerItem() subroutines.
    A more detailed commit history can be found in my github [1].
5. Added ordering operator point <-> point to kd_point_ops and quad_point_ops.
6. Added ordering operator polygon <-> point to poly_ops.

> 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
> need to update it's behavior somehow to be used as-is? Or to support new
> functionality?

There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/.  Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.

> 3. There is a patch about predicate locking in SP-GiST [2] Is this KNN patch
> theoretically compatible with predicate locking? Seems like it is, I just
> want to note that this functionality may exist.

I think kNN is compatible with predicate locking.  Patch [2] does not apply
cleanly on kNN but the conflict resolution is trivial.

> 4. Scan state now have scanStack and queue. May be it's better to name
> scanStack and scanQueue or stack and queue?

"queue" was renamed to "scanQueue".

[1] https://github.com/glukhovn/postgres/commits/knn_spgist
[2] https://commitfest.postgresql.org/14/1215/

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andrey Borodin
Date:
Hi!

> 14 июля 2018 г., в 1:31, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
>
> Attached  6th version of the patches.
> ...
>> 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
>> need to update it's behavior somehow to be used as-is? Or to support new
>> functionality?
>
> There are no SP-GiST opclasses in contrib/, so there is nothing to change in
> contrib/.  Moreover, existing extensions need to be updated only for support of
> new distance-ordered searches -- they need to implement distances[][] array
> calculation in their spgInnerConsistent() and spgLeafConsistent() support
> functions.
So, if extension will not update anything - it will keep all preexisting functionality, right?

I have some more nitpicks about naming
+        recheckQual = out.recheck;
+        recheckDist = out.recheckDistances;
Can we call this things somehow in one fashion?

Also, this my be a stupid question, but do we really need to differ this two recheck cases? It is certain that we
cannotmerge them? 

This seems strange if-formatting
+    /* If allTheSame, they should all or none of 'em match */
+    if (innerTuple->allTheSame)
+        if (out.nNodes != 0 && out.nNodes != nNodes)
+            elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple");
+
+    if (out.nNodes) // few lines before you compare with 0

Best regards, Andrey Borodin.

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Nikita Glukhov
Date:
Attached 7th version of the patches:
  * renamed recheck fields and variables
  * fixed formatting of the one if-statement

On 15.07.2018 14:54, Andrey Borodin wrote:

>> 14.07.2018, 1:31, Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
>>
>> Attached  6th version of the patches.
>> ...
>>> 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
>>> need to update it's behavior somehow to be used as-is? Or to support new
>>> functionality?
>> There are no SP-GiST opclasses in contrib/, so there is nothing to change in
>> contrib/.  Moreover, existing extensions need to be updated only for support of
>> new distance-ordered searches -- they need to implement distances[][] array
>> calculation in their spgInnerConsistent() and spgLeafConsistent() support
>> functions.
> So, if extension will not update anything - it will keep all preexisting functionality, right?

Yes, of course.

> I have some more nitpicks about naming
> +        recheckQual = out.recheck;
> +        recheckDist = out.recheckDistances;
> Can we call this things somehow in one fashion?

I would prefer to rename "spgLeafConsitentOut.recheck" to "recheckQual" but it
is not good to rename user-visible fields, so I decided to rename all fields
and variables to "recheck"/"recheckDistances".

> Also, this my be a stupid question, but do we really need to differ this two
> recheck cases? It is certain that we cannot merge them?

This recheck cases are absolutely different.

In the first case, opclass support functions can not accurately determine
whether the leaf value satisfies the boolean search operator (compressed
values can be stored in leaves).

In the second case, opclass returns only a approximate value (the lower bound)
of the leaf distances.

In the example below operator 'polygon >> polygon' does not need recheck because
bounding box (they are stored in the index instead of polygons) test gives exact
results, but the ordering operator 'polygon <-> point' needs recheck:

SELECT * FROM polygons
WHERE poly >> polygon(box '((0,0),(1,1))')
ORDER BY poly <-> point '(0,0)';

> This seems strange if-formatting
> +    /* If allTheSame, they should all or none of 'em match */
> +    if (innerTuple->allTheSame)
> +        if (out.nNodes != 0 && out.nNodes != nNodes)
> +            elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple");
> +
> +    if (out.nNodes) // few lines before you compare with 0

Fixed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andrey Borodin
Date:
17 июля 2018 г., в 16:42, Nikita Glukhov <n.gluhov@postgrespro.ru> написал(а):
>
> Fixed.

Patch works as advertised, adds documentation and tests. I didn't succeeded in attempts to break it's functionality. I
haveno more notices about the code. 
Maybe except this const looks unusual, but is correct
+extern BOX *box_copy(const BOX *box);

I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit
heavy,but OK. At least, it has meaningful benefits. 

I think the patch is ready for committer.

Best regards, Andrey Borodin.

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit
heavy,but OK. At least, it has meaningful benefits.
 

It seems that Nikita have reworked it that way after my suspect that
switching regular scans to pairing heap *might* cause a regression.
However, I didn't insist that we need to support two ways.  For
instance, if we can prove that there is no regression then it's fine
to use just heap...

Links
1. https://www.postgresql.org/message-id/CAPpHfdu6Wm4DSAp8Pvwq0uo7fCSzsbrNy7x2v5EKK_g4Nkjx1Q%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
Hi!

On Tue, Aug 28, 2018 at 12:50 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit
heavy,but OK. At least, it has meaningful benefits.
 
>
> It seems that Nikita have reworked it that way after my suspect that
> switching regular scans to pairing heap *might* cause a regression.
> However, I didn't insist that we need to support two ways.  For
> instance, if we can prove that there is no regression then it's fine
> to use just heap...

So, I decided to check does it really matter to support both list and
pairing heap?  Or can we just always use pairing heap?

I wrote following simple test.

Points are uniformly distributed in box (0,0)-(1,1).
# create table spgist_test as (select point(random(), random()) p from
generate_series(1,1000000) i);
# create index spgist_test_idx on spgist_test using spgist(p);

spgist_bench() walks over this box with given step and queries it each
time with boxes of given size.
CREATE FUNCTION spgist_bench(step float8, size float8) RETURNS void AS $$
DECLARE
    x float8;
    y float8;
BEGIN
    y := 0.0;
    WHILE y <= 1.0 LOOP
        x := 0.0;
        WHILE x <= 1.0 LOOP
            PERFORM * FROM spgist_test WHERE p <@ box(point(x,y),
point(x+size, y+size));
            x := x + step;
        END LOOP;
        y := y + step;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

It fist I've compared current patch with modified patch, which I made
to to always queue scan.

# Current patch (use list)
x     run 1 run 2 run 3
0.1    1206  1230  1231
0.01   2862  2813  2802
0.003 13915 13911 13900

# Always use queue
x     run 1 run 2 run 3
0.1    1238  1189  1170
0.01   2740  2852  2769
0.003 13627 13751 13757

And it appears that difference is less than statistical error.  But
then I compared that with master.  And it appears that master is much
faster.

# Master
x     run 1 run 2 run 3
0.1     725   691   700
0.01   1488  1480  1495
0.003  6696  6210  6295

It's probably because we're anyway allocating separate queue memory
context.  I'll explore this more and come up with details.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> # Current patch (use list)
> x     run 1 run 2 run 3
> 0.1    1206  1230  1231
> 0.01   2862  2813  2802
> 0.003 13915 13911 13900

Sorry, I didn't explain what these tables means.  There are times in
milliseconds for 3 runs of spgist_bench($x, $x)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Aug 30, 2018 at 12:30 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > # Current patch (use list)
> > x     run 1 run 2 run 3
> > 0.1    1206  1230  1231
> > 0.01   2862  2813  2802
> > 0.003 13915 13911 13900
>
> Sorry, I didn't explain what these tables means.  There are times in
> milliseconds for 3 runs of spgist_bench($x, $x)

Right, performance regression appears to be caused by queue memory
context allocation.  I've tried to apply the same approach that we've
in GiST: allocate separate memory context for queue only at second
rescan call.  And it appears to work, there is no measurable
regression in comparison to master.

Patch v8
x     run 1 run 2 run 3
0.1     680   660   662
0.01   1403  1395  1508
0.003  6561  6475  6223

Revised version of patch is attached.  I've squashed patchset into one
patch.  Later I'll split it into fewer parts before committing.  I'm
going to also make some benchmarking of KNN itself: GiST vs SP-GiST.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Aug 30, 2018 at 12:41 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Right, performance regression appears to be caused by queue memory
> context allocation.  I've tried to apply the same approach that we've
> in GiST: allocate separate memory context for queue only at second
> rescan call.  And it appears to work, there is no measurable
> regression in comparison to master.
>
> Patch v8
> x     run 1 run 2 run 3
> 0.1     680   660   662
> 0.01   1403  1395  1508
> 0.003  6561  6475  6223
>
> Revised version of patch is attached.  I've squashed patchset into one
> patch.  Later I'll split it into fewer parts before committing.  I'm
> going to also make some benchmarking of KNN itself: GiST vs SP-GiST.

I've made KNN GiST vs SP-GiST benchmarking similar to what I've done
previously for plain search.

create table knn_test as (select point(random(), random()) p from
generate_series(1,1000000) i);
create index knn_test_idx on knn_test using spgist(p);
create index knn_test_idx on knn_test using gist(p);

CREATE FUNCTION knn_bench(step float8, lim int) RETURNS void AS $$
DECLARE
    x float8;
    y float8;
BEGIN
    y := 0.0;
    WHILE y <= 1.0 LOOP
        x := 0.0;
        WHILE x <= 1.0 LOOP
            PERFORM * FROM knn_test ORDER BY p <-> point(x,y) LIMIT lim;
            x := x + step;
        END LOOP;
        y := y + step;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

And the results are following.

# GiST
step limit run 1 run 2 run 3 buffers
 0.1  1000   156   161   158  123191
0.03   100   298   305   301  122902
0.01    10  1216  1214  1212  138591

# SP-GiST
step limit run 1 run 2 run 3 buffers
 0.1  1000   147   152   153  126446
0.03   100   227   223   226  131241
0.01    10   734   733   730  175908

For me it looks expectable.  SP-GiST takes less CPU, but uses more
buffers.  It's pretty same to what we have for plain scans.  So, I see
no anomaly here.

Generally patch looks close to committable shape for me.  I'm going to
revise code and documentation again, split it up, and then propose to
commit.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Aug 30, 2018 at 3:57 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Generally patch looks close to committable shape for me.  I'm going to
revise code and documentation again, split it up, and then propose to
commit.

I've revised this patch again.  This revision includes a lot of code beautification, adjustments to comments and documentation.  Also, after reviewing bug fix [1] I found that we don't need separate memory context for queue.  traversalCxt looks perfectly fine for that purpose, because there it's single scan lifetime.  Also, it appears to me that it's OK to be a single patch: besides KNN SP-GiST and opclasses it contains only extraction of few common function between GiST and SP-GiST.

So, I'm going to commit this if no objections.

Links.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Andrey Borodin
Date:
Hi!

> 17 сент. 2018 г., в 2:03, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>
> Also, it appears to me that it's OK to be a single patch

+1, ISTM that these 6 patches represent atomic unit of work.

Best regards, Andrey Borodin.

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > 17 сент. 2018 г., в 2:03, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
> >
> > Also, it appears to me that it's OK to be a single patch
>
> +1, ISTM that these 6 patches represent atomic unit of work.

Thank you, pushed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Mark Dilger
Date:

> On Sep 18, 2018, at 3:58 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>
> On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>> 17 сент. 2018 г., в 2:03, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>>>
>>> Also, it appears to me that it's OK to be a single patch
>>
>> +1, ISTM that these 6 patches represent atomic unit of work.
>
> Thank you, pushed.

One note on this commit,

in my fork, I have converted BoxPGetDatum from a macro to a static inline function,
and it shows a thinko in your commit, namely that in->leafDatum is of type (BOX *),
but is actually (as the name implies) already a Datum:

spgquadtreeproc.c:469:27: error: incompatible integer to pointer conversion passing 'Datum' (aka 'unsigned long') to
parameterof type 'BOX *' [-Werror,-Wint-conversion] 

BoxPGetDatum(in->leafDatum),true, 

I'm not claiming this creates any active bug, just that it would make more sense to
handle the typing cleanly.  Perhaps the following:

diff --git a/src/backend/access/spgist/spgquadtreeproc.c b/src/backend/access/spgist/spgquadtreeproc.c
index dee438a307..90cc776899 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)

        if (res && in->norderbys > 0)
                /* ok, it passes -> let's compute the distances */
-               out->distances = spg_key_orderbys_distances(
-
BoxPGetDatum(in->leafDatum),true, 
+               out->distances = spg_key_orderbys_distances(in->leafDatum, true,
                                                                                                        in->orderbys,
in->norderbys);

mark

Re: [HACKERS] [PATCH] kNN for SP-GiST

From
Alexander Korotkov
Date:
On Thu, Sep 27, 2018 at 8:28 PM Mark Dilger <hornschnorter@gmail.com> wrote:
> On Sep 18, 2018, at 3:58 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
>
> On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>> 17 сент. 2018 г., в 2:03, Alexander Korotkov <a.korotkov@postgrespro.ru> написал(а):
>>>
>>> Also, it appears to me that it's OK to be a single patch
>>
>> +1, ISTM that these 6 patches represent atomic unit of work.
>
> Thank you, pushed.

One note on this commit,

in my fork, I have converted BoxPGetDatum from a macro to a static inline function,
and it shows a thinko in your commit, namely that in->leafDatum is of type (BOX *),
but is actually (as the name implies) already a Datum:

spgquadtreeproc.c:469:27: error: incompatible integer to pointer conversion passing 'Datum' (aka 'unsigned long') to parameter of type 'BOX *' [-Werror,-Wint-conversion]
                                                                                                        BoxPGetDatum(in->leafDatum), true,

I'm not claiming this creates any active bug, just that it would make more sense to
handle the typing cleanly.  Perhaps the following:

diff --git a/src/backend/access/spgist/spgquadtreeproc.c b/src/backend/access/spgist/spgquadtreeproc.c
index dee438a307..90cc776899 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)

        if (res && in->norderbys > 0)
                /* ok, it passes -> let's compute the distances */
-               out->distances = spg_key_orderbys_distances(
-                                                                                                       BoxPGetDatum(in->leafDatum), true,
+               out->distances = spg_key_orderbys_distances(in->leafDatum, true,
                                                                                                        in->orderbys, in->norderbys);

True.  Pushed, thanks!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company