Thread: Fix picksplit with nan values

Fix picksplit with nan values

From
Alexander Korotkov
Date:
Hackers,

PostGIS spotted that picksplit algorithm freezes in infinite loop when dealing with nan values. I discovered same bug is present in core opclasses. Attached patch fixes this issue interpreting nan as value greater than infinity like btree comparison function does. 
This patch contain copy of float8_cmp_internal rather than exposing it from float.c, because it let compiler inline this function.

------
With best regards,
Alexander Korotkov.
Attachment

Re: Fix picksplit with nan values

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> PostGIS spotted that picksplit algorithm freezes in infinite loop when
> dealing with nan values. I discovered same bug is present in core
> opclasses. Attached patch fixes this issue interpreting nan as value
> greater than infinity like btree comparison function does.

Hm.  Good point, but it seems like some of these hunks are only taking
care of a subset of the possible combinations of input NaNs.  If you're
certain the other combinations are impossible, there should be code
comments explaining why.

BTW, as a stylistic matter, I think it sucks to write     !float8_cmp_internal(x,y)
when what you mean is     float8_cmp_internal(x,y) == 0
The "!" syntax should pretty much only be used for boolean tests IMO.

I do recognize that there's a tradition of writing "!ptr" rather than
"ptr == NULL", which I think is all right in most contexts, mainly
because returning a null pointer has an element of boolean yes-or-no-ness
to it.  When you're doing arithmetic comparisons, though, it's just
confusing.

I wrote another rant about this years ago in the context of complaining
about "!strcmp" tests; there was probably more detail in that, if you
care to look in the archives.
        regards, tom lane



Re: Fix picksplit with nan values

From
Alexander Korotkov
Date:
On Sat, Sep 7, 2013 at 1:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> PostGIS spotted that picksplit algorithm freezes in infinite loop when
> dealing with nan values. I discovered same bug is present in core
> opclasses. Attached patch fixes this issue interpreting nan as value
> greater than infinity like btree comparison function does.

Hm.  Good point, but it seems like some of these hunks are only taking
care of a subset of the possible combinations of input NaNs.  If you're
certain the other combinations are impossible, there should be code
comments explaining why.

I had some more insight into NaNs and geometrical datatypes. I didn't find functions dealing with geometrical datatypes to take care about NaN. Behaviour seems pretty weird. For example, box constructor takes care about order of coordinates but no in case of NaN:

test=# select box('(0.0,0.0)'::point,'(1.0,1.0)'::point);
     box
-------------
 (1,1),(0,0)
(1 row)

test=# select box('(1.0,1.0)'::point,'(0.0,0.0)'::point);
     box
-------------
 (1,1),(0,0)
(1 row)

test=# select box('(nan,nan)'::point,'(0.0,0.0)'::point);
       box
-----------------
 (0,0),(nan,nan)
(1 row)

test=# select box('(0.0,0.0)'::point,'(nan,nan)'::point);
       box
-----------------
 (nan,nan),(0,0)
(1 row)

Since comparison with NaN always returns false, bool returning operators dealing with NaN-containing values mostly return false. Exceptions are operators dealing with only some of coordinates like << and >>. 

I thinks proper way to fix it is to prohibit NaNs in geometrical datatypes at all. However, it could be a hard decision like it is about fuzzy float comparison. I would like to fix GiST index by now.

I wrote attached patch by following principles:
1) NaN coordinates shouldn't crash or hang GiST.
2) NaN coordinates should be processed in GiST index scan like in sequential scan.
3) NaN coordinates shouldn't lead to significant slowdown.

That could be illustrated on following test-case:

create table test1 as (select point(random(), random()) as p from generate_series(1,1000000));
create index test1_idx on test1 using gist(p);
create table temp as (select * from test1);
insert into temp (select point('nan'::float8, 'nan'::float8) from generate_series(1,1000));
insert into temp (select point('nan'::float8, random()) from generate_series(1,1000));
insert into temp (select point(random(), 'nan'::float8) from generate_series(1,1000));
create table test2 as (select * from temp order by random());
create index test2_idx on test2 using gist(p);
drop table temp;

You can't observe performance degradation of index:

test=# explain (analyze, buffers) select * from test1 where p <@ box(point(0.25,0.6),point(0.30,0.68));
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test1  (cost=48.03..2593.37 rows=1000 width=16) (actual time=3.137..8.315 rows=4057 loops=1)
   Recheck Cond: (p <@ '(0.3,0.68),(0.25,0.6)'::box)
   Buffers: shared hit=2901
   ->  Bitmap Index Scan on test1_idx  (cost=0.00..47.78 rows=1000 width=0) (actual time=2.281..2.281 rows=4057 loops=1)
         Index Cond: (p <@ '(0.3,0.68),(0.25,0.6)'::box)
         Buffers: shared hit=59
 Total runtime: 9.648 ms
(7 rows)

test=# explain (analyze, buffers) select * from test2 where p <@ box(point(0.25,0.6),point(0.30,0.68));
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=48.06..2601.55 rows=1003 width=16) (actual time=3.121..17.717 rows=4057 loops=1)
   Recheck Cond: (p <@ '(0.3,0.68),(0.25,0.6)'::box)
   Buffers: shared hit=70 read=2830
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..47.81 rows=1003 width=0) (actual time=2.246..2.246 rows=4057 loops=1)
         Index Cond: (p <@ '(0.3,0.68),(0.25,0.6)'::box)
         Buffers: shared hit=56
 Total runtime: 18.233 ms
(7 rows)

NaN results of queries seems to be same with sequential scan:

test=# explain (analyze, buffers) select * from test2 where p <^ '(nan,0.5)'::point and p::text like '%nan%';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=4389.54..11817.54 rows=4012 width=16) (actual time=138.879..1096.506 rows=492 loops=1)
   Recheck Cond: (p <^ '(nan,0.5)'::point)
   Filter: ((p)::text ~~ '%nan%'::text)
   Rows Removed by Filter: 499961
   Buffers: shared hit=6287 read=3703 written=60
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..4388.53 rows=100300 width=0) (actual time=136.003..136.003 rows=500453 loops=1)
         Index Cond: (p <^ '(nan,0.5)'::point)
         Buffers: shared hit=4568
 Total runtime: 1096.720 ms
(9 rows)

test=# explain (analyze, buffers) select * from test2 where p <^ '(nan,0.5)'::point and p::text like '%nan%';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..25482.02 rows=4012 width=16) (actual time=1.156..1118.252 rows=492 loops=1)
   Filter: ((p <^ '(nan,0.5)'::point) AND ((p)::text ~~ '%nan%'::text))
   Rows Removed by Filter: 1002508
   Buffers: shared hit=5422
 Total runtime: 1118.404 ms
(5 rows)

test=# explain (analyze, buffers) select * from test2 where p >^ '(nan,0.5)'::point and p::text like '%nan%';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=4389.54..11817.54 rows=4012 width=16) (actual time=138.041..1084.822 rows=508 loops=1)
   Recheck Cond: (p >^ '(nan,0.5)'::point)
   Filter: ((p)::text ~~ '%nan%'::text)
   Rows Removed by Filter: 500036
   Buffers: shared hit=8602 read=1442
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..4388.53 rows=100300 width=0) (actual time=135.415..135.415 rows=500544 loops=1)
         Index Cond: (p >^ '(nan,0.5)'::point)
         Buffers: shared hit=4560 read=62
 Total runtime: 1085.028 ms
(9 rows)


test=# explain (analyze, buffers) select * from test2 where p >^ '(nan,0.5)'::point and p::text like '%nan%';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..25482.02 rows=4012 width=16) (actual time=2.786..1121.685 rows=508 loops=1)
   Filter: ((p >^ '(nan,0.5)'::point) AND ((p)::text ~~ '%nan%'::text))
   Rows Removed by Filter: 1002492
   Buffers: shared hit=5422
 Total runtime: 1121.815 ms
(5 rows)

test=# explain (analyze, buffers) select * from test2 where p << '(0.5,nan)'::point and p::text like '%nan%';
explain (analyze, buffers) select * from test2 where p >> '(0.5,nan)'::point and p::text like '%nan%';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=4389.54..11817.54 rows=4012 width=16) (actual time=135.344..1072.619 rows=502 loops=1)
   Recheck Cond: (p << '(0.5,nan)'::point)
   Filter: ((p)::text ~~ '%nan%'::text)
   Rows Removed by Filter: 500000
   Buffers: shared hit=9992 read=31
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..4388.53 rows=100300 width=0) (actual time=132.975..132.975 rows=500502 loops=1)
         Index Cond: (p << '(0.5,nan)'::point)
         Buffers: shared hit=4570 read=31
 Total runtime: 1072.820 ms
(9 rows)

test=# explain (analyze, buffers) select * from test2 where p << '(0.5,nan)'::point and p::text like '%nan%';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..25482.02 rows=4012 width=16) (actual time=2.154..1110.458 rows=502 loops=1)
   Filter: ((p << '(0.5,nan)'::point) AND ((p)::text ~~ '%nan%'::text))
   Rows Removed by Filter: 1002498
   Buffers: shared hit=5422
 Total runtime: 1110.587 ms
(5 rows)

test=# explain (analyze, buffers) select * from test2 where p >> '(0.5,nan)'::point and p::text like '%nan%';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test2  (cost=4389.54..11817.54 rows=4012 width=16) (actual time=135.071..1076.240 rows=498 loops=1)
   Recheck Cond: (p >> '(0.5,nan)'::point)
   Filter: ((p)::text ~~ '%nan%'::text)
   Rows Removed by Filter: 499997
   Buffers: shared hit=9989 read=25
   ->  Bitmap Index Scan on test2_idx  (cost=0.00..4388.53 rows=100300 width=0) (actual time=132.679..132.679 rows=500495 loops=1)
         Index Cond: (p >> '(0.5,nan)'::point)
         Buffers: shared hit=4567 read=25
 Total runtime: 1076.434 ms
(9 rows)

test=# explain (analyze, buffers) select * from test2 where p >> '(0.5,nan)'::point and p::text like '%nan%';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..25482.02 rows=4012 width=16) (actual time=2.473..1121.940 rows=498 loops=1)
   Filter: ((p >> '(0.5,nan)'::point) AND ((p)::text ~~ '%nan%'::text))
   Rows Removed by Filter: 1002502
   Buffers: shared hit=5422
 Total runtime: 1122.073 ms
(5 rows)

------
With best regards,
Alexander Korotkov.
Attachment

Re: Fix picksplit with nan values

From
Andrew Gierth
Date:
>>>>> "Alexander" == Alexander Korotkov <aekorotkov@gmail.com> writes:
Alexander> 2) NaN coordinates should be processed in GiST index scanAlexander> like in sequential scan.

postgres=# select * from pts order by a <-> '(0,0)' limit 10;   a     
----------(1,1)(7,nan)(9,nan)(11,nan)(4,nan)(nan,6)(2,1)(1,2)(2,2)(3,1)
(10 rows)

postgres=# set enable_indexscan=false;
SET

postgres=# select * from pts order by a <-> '(0,0)' limit 10;  a   
-------(1,1)(2,1)(1,2)(2,2)(3,1)(1,3)(3,2)(2,3)(4,1)(1,4)
(10 rows)

this data set was created by:
insert into pts select point(i,j)   from (select generate_series(1,100)::float8 union all select 'nan') s1(i),
(selectgenerate_series(1,100)::float8 union all select 'nan') s2(j)  order by random();
 

-- 
Andrew (irc:RhodiumToad)



Re: Fix picksplit with nan values

From
Alexander Korotkov
Date:
On Mon, Sep 16, 2013 at 4:13 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Alexander" == Alexander Korotkov <aekorotkov@gmail.com> writes:

 Alexander> 2) NaN coordinates should be processed in GiST index scan
 Alexander> like in sequential scan.

postgres=# select * from pts order by a <-> '(0,0)' limit 10;
    a
----------
 (1,1)
 (7,nan)
 (9,nan)
 (11,nan)
 (4,nan)
 (nan,6)
 (2,1)
 (1,2)
 (2,2)
 (3,1)
(10 rows)

postgres=# set enable_indexscan=false;
SET

postgres=# select * from pts order by a <-> '(0,0)' limit 10;
   a
-------
 (1,1)
 (2,1)
 (1,2)
 (2,2)
 (3,1)
 (1,3)
 (3,2)
 (2,3)
 (4,1)
 (1,4)
(10 rows)

this data set was created by:
insert into pts
  select point(i,j)
    from (select generate_series(1,100)::float8 union all select 'nan') s1(i),
         (select generate_series(1,100)::float8 union all select 'nan') s2(j)
   order by random();

Thanks, Andrew! Good spot.
I didn't examine order by operators for work with NaNs.
I think this time problem is in GiST itself rather than in opclass. I'm going to fix it in a separate patch.

------
With best regards,
Alexander Korotkov.

Re: Fix picksplit with nan values

From
Alexander Korotkov
Date:
On Tue, Sep 17, 2013 at 5:04 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Sep 16, 2013 at 4:13 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Alexander" == Alexander Korotkov <aekorotkov@gmail.com> writes:

 Alexander> 2) NaN coordinates should be processed in GiST index scan
 Alexander> like in sequential scan.

postgres=# select * from pts order by a <-> '(0,0)' limit 10;
    a
----------
 (1,1)
 (7,nan)
 (9,nan)
 (11,nan)
 (4,nan)
 (nan,6)
 (2,1)
 (1,2)
 (2,2)
 (3,1)
(10 rows)

postgres=# set enable_indexscan=false;
SET

postgres=# select * from pts order by a <-> '(0,0)' limit 10;
   a
-------
 (1,1)
 (2,1)
 (1,2)
 (2,2)
 (3,1)
 (1,3)
 (3,2)
 (2,3)
 (4,1)
 (1,4)
(10 rows)

this data set was created by:
insert into pts
  select point(i,j)
    from (select generate_series(1,100)::float8 union all select 'nan') s1(i),
         (select generate_series(1,100)::float8 union all select 'nan') s2(j)
   order by random();

Thanks, Andrew! Good spot.
I didn't examine order by operators for work with NaNs.
I think this time problem is in GiST itself rather than in opclass. I'm going to fix it in a separate patch.

Attached patch fixes knn GiST behaviour with NaN. It makes RB-tree comparison function in GiST work like float8 btree opclass comparison function.

------
With best regards,
Alexander Korotkov.

Attachment

Re: Fix picksplit with nan values

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
>> Thanks, Andrew! Good spot.
>> I didn't examine order by operators for work with NaNs.
>> I think this time problem is in GiST itself rather than in opclass. I'm
>> going to fix it in a separate patch.

> Attached patch fixes knn GiST behaviour with NaN. It makes RB-tree
> comparison function in GiST work like float8 btree opclass comparison
> function.

Hmm ... does that really work, or even do anything?  I'd have thought
that if either input is a NAN, the initial test
    if (sa->distances[i] != sb->distances[i])

would return false so we'd not enter the rest of it.
        regards, tom lane



Re: Fix picksplit with nan values

From
Tom Lane
Date:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I wrote attached patch by following principles:
> 1) NaN coordinates shouldn't crash or hang GiST.
> 2) NaN coordinates should be processed in GiST index scan like in
> sequential scan.
> 3) NaN coordinates shouldn't lead to significant slowdown.

I looked at this patch for awhile.  It seems like there's still an awful
lot of places in gistproc.c that are not worrying about NANs, and it's not
clear to me that they don't need to.  For instance, despite the changes in
adjustBox(), it'll still be possible to have boxes with NAN boundaries if
all the contained values are all-NAN boxes.  It doesn't seem like
gist_box_penalty will behave very sanely for that; it'll return a NAN
penalty which seems unhelpful.  The static box_penalty function doesn't
work sanely for NANs either, and if it can return NAN then you also have
to worry about NAN deltas in common_entry_cmp.  And isn't it still
possible for the Assert in gist_box_picksplit to fire?

> That could be illustrated on following test-case:

> create table test1 as (select point(random(), random()) as p from
> generate_series(1,1000000));
> create index test1_idx on test1 using gist(p);
> create table temp as (select * from test1);
> insert into temp (select point('nan'::float8, 'nan'::float8) from
> generate_series(1,1000));
> insert into temp (select point('nan'::float8, random()) from
> generate_series(1,1000));
> insert into temp (select point(random(), 'nan'::float8) from
> generate_series(1,1000));
> create table test2 as (select * from temp order by random());
> create index test2_idx on test2 using gist(p);
> drop table temp;

I think this test case is unlikely to generate any all-NAN index entry
boxes, because almost certainly the initial entries will be non-NANs, and
you've got it set up to keep incoming NANs from adjusting the boundaries
of an existing box.  You'd get better code coverage if you started by
inserting some NANs into an empty index.

Also, as a stylistic matter, I thought your previous solution of
copying float8_cmp_internal was a better idea than manually inlining it
(sans comments!) in multiple places as this version does.
        regards, tom lane



Re: Fix picksplit with nan values

From
Bruce Momjian
Date:
Where are we on this?

---------------------------------------------------------------------------

On Fri, Nov  8, 2013 at 01:38:28PM -0500, Tom Lane wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > I wrote attached patch by following principles:
> > 1) NaN coordinates shouldn't crash or hang GiST.
> > 2) NaN coordinates should be processed in GiST index scan like in
> > sequential scan.
> > 3) NaN coordinates shouldn't lead to significant slowdown.
> 
> I looked at this patch for awhile.  It seems like there's still an awful
> lot of places in gistproc.c that are not worrying about NANs, and it's not
> clear to me that they don't need to.  For instance, despite the changes in
> adjustBox(), it'll still be possible to have boxes with NAN boundaries if
> all the contained values are all-NAN boxes.  It doesn't seem like
> gist_box_penalty will behave very sanely for that; it'll return a NAN
> penalty which seems unhelpful.  The static box_penalty function doesn't
> work sanely for NANs either, and if it can return NAN then you also have
> to worry about NAN deltas in common_entry_cmp.  And isn't it still
> possible for the Assert in gist_box_picksplit to fire?
> 
> > That could be illustrated on following test-case:
> 
> > create table test1 as (select point(random(), random()) as p from
> > generate_series(1,1000000));
> > create index test1_idx on test1 using gist(p);
> > create table temp as (select * from test1);
> > insert into temp (select point('nan'::float8, 'nan'::float8) from
> > generate_series(1,1000));
> > insert into temp (select point('nan'::float8, random()) from
> > generate_series(1,1000));
> > insert into temp (select point(random(), 'nan'::float8) from
> > generate_series(1,1000));
> > create table test2 as (select * from temp order by random());
> > create index test2_idx on test2 using gist(p);
> > drop table temp;
> 
> I think this test case is unlikely to generate any all-NAN index entry
> boxes, because almost certainly the initial entries will be non-NANs, and
> you've got it set up to keep incoming NANs from adjusting the boundaries
> of an existing box.  You'd get better code coverage if you started by
> inserting some NANs into an empty index.
> 
> Also, as a stylistic matter, I thought your previous solution of
> copying float8_cmp_internal was a better idea than manually inlining it
> (sans comments!) in multiple places as this version does.
> 
>             regards, tom lane
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Fix picksplit with nan values

From
Alexander Korotkov
Date:
On Sat, Feb 1, 2014 at 7:50 AM, Bruce Momjian <bruce@momjian.us> wrote:

Where are we on this?

I found myself to have empty draft letter from November with new version of patch attached. I'll return here when we have some solution in gin fast scan challenge.

------
With best regards,
Alexander Korotkov.