Re: Fix picksplit with nan values - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Fix picksplit with nan values
Date
Msg-id 20140201035022.GH31141@momjian.us
Whole thread Raw
In response to Re: Fix picksplit with nan values  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Fix picksplit with nan values  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Where are we on this?

---------------------------------------------------------------------------

On Fri, Nov  8, 2013 at 01:38:28PM -0500, Tom Lane wrote:
> Alexander Korotkov <aekorotkov@gmail.com> writes:
> > I wrote attached patch by following principles:
> > 1) NaN coordinates shouldn't crash or hang GiST.
> > 2) NaN coordinates should be processed in GiST index scan like in
> > sequential scan.
> > 3) NaN coordinates shouldn't lead to significant slowdown.
> 
> I looked at this patch for awhile.  It seems like there's still an awful
> lot of places in gistproc.c that are not worrying about NANs, and it's not
> clear to me that they don't need to.  For instance, despite the changes in
> adjustBox(), it'll still be possible to have boxes with NAN boundaries if
> all the contained values are all-NAN boxes.  It doesn't seem like
> gist_box_penalty will behave very sanely for that; it'll return a NAN
> penalty which seems unhelpful.  The static box_penalty function doesn't
> work sanely for NANs either, and if it can return NAN then you also have
> to worry about NAN deltas in common_entry_cmp.  And isn't it still
> possible for the Assert in gist_box_picksplit to fire?
> 
> > That could be illustrated on following test-case:
> 
> > create table test1 as (select point(random(), random()) as p from
> > generate_series(1,1000000));
> > create index test1_idx on test1 using gist(p);
> > create table temp as (select * from test1);
> > insert into temp (select point('nan'::float8, 'nan'::float8) from
> > generate_series(1,1000));
> > insert into temp (select point('nan'::float8, random()) from
> > generate_series(1,1000));
> > insert into temp (select point(random(), 'nan'::float8) from
> > generate_series(1,1000));
> > create table test2 as (select * from temp order by random());
> > create index test2_idx on test2 using gist(p);
> > drop table temp;
> 
> I think this test case is unlikely to generate any all-NAN index entry
> boxes, because almost certainly the initial entries will be non-NANs, and
> you've got it set up to keep incoming NANs from adjusting the boundaries
> of an existing box.  You'd get better code coverage if you started by
> inserting some NANs into an empty index.
> 
> Also, as a stylistic matter, I thought your previous solution of
> copying float8_cmp_internal was a better idea than manually inlining it
> (sans comments!) in multiple places as this version does.
> 
>             regards, tom lane
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_restore multiple --function options
Next
From: Andrew Dunstan
Date:
Subject: Re: jsonb and nested hstore