Re: comment regarding double timestamps; and, infinite timestampsand NaN - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: comment regarding double timestamps; and, infinite timestampsand NaN
Date
Msg-id 20191230151829.GP12890@telsasoft.com
Whole thread Raw
In response to Re: comment regarding double timestamps; and, infinite timestamps and NaN  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: comment regarding double timestamps; and, infinite timestamps and NaN
List pgsql-hackers
On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > That seems to be only used for ineq_histogram_selectivity() interpolation of
> > histogram bins.  It looks to me that at least isn't working for "special
> > values", and needs to use something other than isnan().
> 
> Uh, what?  This seems completely wrong to me.  We could possibly
> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> I don't really see the point.  They'll compare to other timestamp
> values correctly without that, cf timestamp_cmp_internal().
> The example you give seems to me to be working sanely, or at least
> as sanely as it can given the number of histogram points available,
> with the existing code.  In any case, shoving NaNs into the
> computation is not going to make anything better.

As I see it, the problem is that the existing code tests for isnan(), but
infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
absurdly large values being used as if they were isnormal().

src/include/datatype/timestamp.h:#define DT_NOBEGIN             PG_INT64_MIN
src/include/datatype/timestamp.h-#define DT_NOEND               PG_INT64_MAX

On v12, my test gives:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t  (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1)

vs patched master:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t  (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1)

IMO 146 rows is a reasonable estimate given a single histogram bucket of
infinite width, and 3 rows is a less reasonable result of returning INT64_MAX
in one place and then handling it as a normal value.  The comments in
ineq_histogram seem to indicate that this case is intended to get binfrac=0.5:

|  Watch out for the possibility that we got a NaN or Infinity from the
|  division.  This can happen despite the previous checks, if for example "low" is
|  -Infinity.

I changed to use INFINITY, -INFINITY and !isnormal() rather than nan() and
isnan() (although binfrac is actually NAN at that point so the existing test is
ok).

Justin



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: TAP testing for psql's tab completion code
Next
From: Anastasia Lubennikova
Date:
Subject: Re: Building infrastructure for B-Tree deduplication that recognizeswhen opclass equality is also equivalence