On 02/09/2019 07:54, Alexander Korotkov wrote:
> Andrey Borodin noticed me that results of some KNN-GIST tests are
> obviously wrong and don't match results of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (10,10)
> (NaN,NaN)
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (-5,-12)
> (5.1,34.5)
>
> (1e+300,Infinity)
> (10 rows)
So we've memorized incorrect results in the expected output. Ouch!
> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN. It used plain float
> comparison, which doesn't handle Inf and Nan values well. Attached
> patch replaced it with float8_cmp_internal().
>
> Also, note that with patch KNN results still don't fully match results
> of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (10,10)
> (-5,-12)
> (5.1,34.5)
> (1e+300,Infinity)
>
> (NaN,NaN)
> (10 rows)
>
> NULL and '(NaN,NaN)' are swapped. It happens so, because we assume
> distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> be greater than NULL. If even we would assume distance to NULL to be
> Nan, it doesn't guarantee that NULLs would be last. It looks like we
> can handle this only by introduction array of "distance is null" flags
> to GISTSearchItem. But does it worth it?
I don't think we have much choice, returning wrong results is not an
option. At first I thought we could set the "recheck" flag if we see
NULLs or NaNs, but that won't work because rechecking is not supported
with Index-Only Scans.
Looking at the corresponding SP-GiST code,
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest
copy-pasting the implementation from there, so that they would be as
similar as possible. (SP-GiST gets away with just one isNull flag,
because it doesn't support multi-column indexes. In GiST, you need an
array, as you said.)
- Heikki