Re: Fix picksplit with nan values - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Fix picksplit with nan values
Date
Msg-id 22995.1383935908@sss.pgh.pa.us
Whole thread Raw
In response to Re: Fix picksplit with nan values  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Fix picksplit with nan values  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I wrote attached patch by following principles:
> 1) NaN coordinates shouldn't crash or hang GiST.
> 2) NaN coordinates should be processed in GiST index scan like in
> sequential scan.
> 3) NaN coordinates shouldn't lead to significant slowdown.

I looked at this patch for awhile.  It seems like there's still an awful
lot of places in gistproc.c that are not worrying about NANs, and it's not
clear to me that they don't need to.  For instance, despite the changes in
adjustBox(), it'll still be possible to have boxes with NAN boundaries if
all the contained values are all-NAN boxes.  It doesn't seem like
gist_box_penalty will behave very sanely for that; it'll return a NAN
penalty which seems unhelpful.  The static box_penalty function doesn't
work sanely for NANs either, and if it can return NAN then you also have
to worry about NAN deltas in common_entry_cmp.  And isn't it still
possible for the Assert in gist_box_picksplit to fire?

> That could be illustrated on following test-case:

> create table test1 as (select point(random(), random()) as p from
> generate_series(1,1000000));
> create index test1_idx on test1 using gist(p);
> create table temp as (select * from test1);
> insert into temp (select point('nan'::float8, 'nan'::float8) from
> generate_series(1,1000));
> insert into temp (select point('nan'::float8, random()) from
> generate_series(1,1000));
> insert into temp (select point(random(), 'nan'::float8) from
> generate_series(1,1000));
> create table test2 as (select * from temp order by random());
> create index test2_idx on test2 using gist(p);
> drop table temp;

I think this test case is unlikely to generate any all-NAN index entry
boxes, because almost certainly the initial entries will be non-NANs, and
you've got it set up to keep incoming NANs from adjusting the boundaries
of an existing box.  You'd get better code coverage if you started by
inserting some NANs into an empty index.

Also, as a stylistic matter, I thought your previous solution of
copying float8_cmp_internal was a better idea than manually inlining it
(sans comments!) in multiple places as this version does.
        regards, tom lane



pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: backup.sgml-cmd-v003.patch
Next
From: Tom Lane
Date:
Subject: Re: Protocol forced to V2 in low-memory conditions?