Re: [PATCH] kNN for btree - Mailing list pgsql-hackers

From Nikita Glukhov
Subject Re: [PATCH] kNN for btree
Date
Msg-id 146f4c47-cc52-a06d-343f-d152a20524e3@postgrespro.ru
Whole thread Raw
In response to Re: [PATCH] kNN for btree  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: [PATCH] kNN for btree  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

Attached 5th version of the patches.

On 11.01.2019 2:21, Alexander Korotkov wrote:

Hi!

I've couple more notes regarding this patch.
1) There are two loops over scan key determining scan strategy:
existing loop in _bt_first(), and in new function
_bt_select_knn_search_strategy().  It's kind of redundant that we've
to process scan keys twice for knn searches.  I think scan keys
processing should be merged into one loop.
This redundant loop was removed and code from _bt_select_knn_search_strategy()
was moved into the new function _bt_emit_scan_key() extracted from
_bt_preprocess_keys().
2) We're not supporting knn ordering only using the first key.
Supporting multiple knn keys would require significant reword of
B-tree traversal algorithm making it closer to GiST and SP-GiST.  So,
I think supporting only one knn key is reasonable restriction, at
least for now.  But we could support single-key knn ordering in more
cases.  For instance, knn search in "SELECT * FROM tbl WHERE a =
const1 ORDER BY b <-> const2" could benefit from (a, b) B-tree index.
So, we can support knn search on n-th key if there are equality scan
keys for [1, n-1] index keys.
I will try to implement this in the next version of the patch.

I also note that you've removed implementation of distance functions
from btree_gist.  But during pg_upgrade extensions are moved "as is".
Not just CREATE EXTENSION command is dumped, but the whole extension
content.  pg_upgrade'd instances would have old version of extension
metadata with new .so until ALTER EXTENSION UPDATE. So, user would get
errors about missed function in .so until updates the extension.

We're typically evade this by inclusion of old functions into new .so.
Then user can work normally before extension update.  In this
particular case, we can leave the distance functions in the .so, but
make them just wrappers over core functions.

Wrappers over core functions were left in btree_gist.

I've run regression tests with patch applied and opr_sanity showed some errors:

1) date_dist_timestamptz(), timestamp_dist_timestamptz(),
timestamptz_dist_date(), timestamptz_dist_timestamp() should be
stable, not immutable.  These functions use timezone during
conversion.

Fixed.
2) date_dist_timestamp(), date_dist_timestamptz(),
timestamp_dist_date(), timestamp_dist_timestamptz(),
timestamptz_dist_date(), timestamptz_dist_timestamp() should be not
leafproof.  These functions perform conversion, which might fail in
corner case.  So, this error should be considered as a leak.
All new distance functions except oiddist() are not leakproof,
so I had to relax condition in opr_sanity.sql test:

- pp.proleakproof != po.proleakproof
+ (NOT pp.proleakproof AND po.proleakproof))


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions
Next
From: Peter Eisentraut
Date:
Subject: Re: port of INSTALL file generation to XSLT