Hi hackers,
While working with Chris Hajas on merging Postgres 12 with Greenplum
Database we stumbled upon the following strange behavior in the geometry
type polygon:
------ >8 --------
CREATE TEMP TABLE foo (p point);
CREATE INDEX ON foo USING gist(p);
INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
SELECT $q$
SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
$q$ AS qry \gset
BEGIN;
SAVEPOINT yolo;
SET LOCAL enable_seqscan TO off;
:qry;
ROLLBACK TO SAVEPOINT yolo;
SET LOCAL enable_indexscan TO off;
SET LOCAL enable_bitmapscan TO off;
:qry;
------ 8< --------
If you run the above repro SQL in HEAD (and 12, and likely all older
versions), you get the following output:
CREATE TABLE
CREATE INDEX
INSERT 0 3
BEGIN
SAVEPOINT
SET
p
-------
(0,0)
(1,1)
(2 rows)
ROLLBACK
SET
SET
p
-----------
(0,0)
(1,1)
(NaN,NaN)
(3 rows)
At first glance, you'd think this is the gist AM's bad, but on a second
thought, something else is strange here. The following query returns
true:
SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
The above behavior of the "contained in" operator is surprising, and
it's probably not what the GiST AM is expecting. I took a look at
point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
distnace operator for polygon. It gives the following interesting
output:
SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
FROM (
SELECT circle(point(100 * i, 'NaN'), 50) AS c
FROM generate_series(-2, 4) i
) t(c)
ORDER BY 2;
c | distance
-----------------+----------
<(-200,NaN),50> | 0
<(-100,NaN),50> | 0
<(0,NaN),50> | 0
<(100,NaN),50> | 0
<(200,NaN),50> | NaN
<(300,NaN),50> | NaN
<(400,NaN),50> | NaN
(7 rows)
Should they all be NaN? Am I alone in thinking the index is right but
the operators are wrong? Or should we call the indexes wrong here?
Cheers,
Jesse and Chris