Re: Bug in GiST paring heap comparator - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Bug in GiST paring heap comparator
Date
Msg-id 7362fcac-3664-d960-f0fa-77c24e62a7b3@iki.fi
Whole thread Raw
In response to Bug in GiST paring heap comparator  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Bug in GiST paring heap comparator
List pgsql-hackers
On 02/09/2019 07:54, Alexander Korotkov wrote:
> Andrey Borodin noticed me that results of some KNN-GIST tests are
> obviously wrong and don't match results of sort node.
> 
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>          f1
> -------------------
>   (10,10)
>   (NaN,NaN)
>   (0,0)
>   (1e-300,-1e-300)
>   (-3,4)
>   (-10,0)
>   (-5,-12)
>   (5.1,34.5)
> 
>   (1e+300,Infinity)
> (10 rows)

So we've memorized incorrect results in the expected output. Ouch!

> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN.  It used plain float
> comparison, which doesn't handle Inf and Nan values well.  Attached
> patch replaced it with float8_cmp_internal().
> 
> Also, note that with patch KNN results still don't fully match results
> of sort node.
> 
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>          f1
> -------------------
>   (0,0)
>   (1e-300,-1e-300)
>   (-3,4)
>   (-10,0)
>   (10,10)
>   (-5,-12)
>   (5.1,34.5)
>   (1e+300,Infinity)
> 
>   (NaN,NaN)
> (10 rows)
> 
> NULL and '(NaN,NaN)' are swapped.  It happens so, because we assume
> distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> be greater than NULL.  If even we would assume distance to NULL to be
> Nan, it doesn't guarantee that NULLs would be last.  It looks like we
> can handle this only by introduction array of "distance is null" flags
> to GISTSearchItem.  But does it worth it?

I don't think we have much choice, returning wrong results is not an 
option. At first I thought we could set the "recheck" flag if we see 
NULLs or NaNs, but that won't work because rechecking is not supported 
with Index-Only Scans.

Looking at the corresponding SP-GiST code, 
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest 
copy-pasting the implementation from there, so that they would be as 
similar as possible. (SP-GiST gets away with just one isNull flag, 
because it doesn't support multi-column indexes. In GiST, you need an 
array, as you said.)

- Heikki



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Bug in GiST paring heap comparator
Next
From: Peter Eisentraut
Date:
Subject: pg_dump --exclude-* options documentation