infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN) - Mailing list pgsql-hackers

From Justin Pryzby
Subject infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN)
Date
Msg-id 20200102135538.GB12890@telsasoft.com
Whole thread Raw
In response to Re: comment regarding double timestamps; and, infinite timestamps and NaN  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: infinite histogram bounds and nan (Re: comment regarding double timestamps; and, infinite timestamps and NaN)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> > On v12, my test gives:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t  (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1)
> 
> This is what it should do.  There's only one histogram bucket, and
> it extends down to -infinity, so the conclusion is going to be that
> the WHERE clause excludes all but a small part of the bucket.  This
> is the correct answer based on the available stats; the problem is
> not with the calculation, but with the miserable granularity of the
> available stats.
> 
> > vs patched master:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t  (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1)
> 
> This answer is simply broken.  You've caused it to estimate half
> of the bucket, which is an insane estimate for the given bucket
> boundaries and WHERE constraint.
> 
> > IMO 146 rows is a reasonable estimate given a single histogram bucket of
> > infinite width,
> 
> No, it isn't.

When using floats, v12 also returns half the histogram:

 DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(0, 99, 1)::float;
 INSERT INTO t VALUES('-Infinity');
 ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
 explain analyze SELECT * FROM t WHERE t>='50';
 Seq Scan on t  (cost=0.00..2.26 rows=51 width=8) (actual time=0.014..0.020 rows=50 loops=1)

I'm fine if the isnan() logic changes, but the comment indicates it's intended
to be hit for an infinite histogram bound, but that doesn't work for timestamps
(convert_to_scalar() should return (double)INFINITY and not
(double)INT64_MIN/MAX).

On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> >> Uh, what?  This seems completely wrong to me.  We could possibly
> >> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> >> I don't really see the point.  They'll compare to other timestamp
> >> values correctly without that, cf timestamp_cmp_internal().
> >> The example you give seems to me to be working sanely, or at least
> >> as sanely as it can given the number of histogram points available,
> >> with the existing code.  In any case, shoving NaNs into the
> >> computation is not going to make anything better.
> 
> > As I see it, the problem is that the existing code tests for isnan(), but
> > infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
> > absurdly large values being used as if they were isnormal().
> 
> I still say that (1) you're confusing NaN with Infinity, and (2)
> you haven't actually shown that there's a problem to fix.
> These endpoint values are *not* NaNs.

I probably did confuse it while trying to make the behavior match the comment
for timestamps.
The Subject says NAN since isnan(binfrac) is what's supposed to be hit for that
case.

The NAN is intended to come from:

|binfrac = (val - low) / (high - low);

which is some variation of -inf / inf.

Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Decade indication
Next
From: Michael Paquier
Date:
Subject: Re: Removal of support for OpenSSL 0.9.8 and 1.0.0