infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN) - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN) |
Date | |
Msg-id | 20200102135538.GB12890@telsasoft.com Whole thread Raw |
In response to | Re: comment regarding double timestamps; and, infinite timestamps and NaN (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: infinite histogram bounds and nan (Re: comment regarding double timestamps; and, infinite timestamps and NaN)
|
List | pgsql-hackers |
On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote: > > On v12, my test gives: > > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); > > |INSERT INTO t VALUES('-infinity'); > > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; > > |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; > > | Seq Scan on t (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1) > > This is what it should do. There's only one histogram bucket, and > it extends down to -infinity, so the conclusion is going to be that > the WHERE clause excludes all but a small part of the bucket. This > is the correct answer based on the available stats; the problem is > not with the calculation, but with the miserable granularity of the > available stats. > > > vs patched master: > > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); > > |INSERT INTO t VALUES('-infinity'); > > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; > > |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; > > | Seq Scan on t (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1) > > This answer is simply broken. You've caused it to estimate half > of the bucket, which is an insane estimate for the given bucket > boundaries and WHERE constraint. > > > IMO 146 rows is a reasonable estimate given a single histogram bucket of > > infinite width, > > No, it isn't. When using floats, v12 also returns half the histogram: DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(0, 99, 1)::float; INSERT INTO t VALUES('-Infinity'); ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; explain analyze SELECT * FROM t WHERE t>='50'; Seq Scan on t (cost=0.00..2.26 rows=51 width=8) (actual time=0.014..0.020 rows=50 loops=1) I'm fine if the isnan() logic changes, but the comment indicates it's intended to be hit for an infinite histogram bound, but that doesn't work for timestamps (convert_to_scalar() should return (double)INFINITY and not (double)INT64_MIN/MAX). On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote: > >> Uh, what? This seems completely wrong to me. We could possibly > >> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but > >> I don't really see the point. They'll compare to other timestamp > >> values correctly without that, cf timestamp_cmp_internal(). > >> The example you give seems to me to be working sanely, or at least > >> as sanely as it can given the number of histogram points available, > >> with the existing code. In any case, shoving NaNs into the > >> computation is not going to make anything better. > > > As I see it, the problem is that the existing code tests for isnan(), but > > infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's > > absurdly large values being used as if they were isnormal(). > > I still say that (1) you're confusing NaN with Infinity, and (2) > you haven't actually shown that there's a problem to fix. > These endpoint values are *not* NaNs. I probably did confuse it while trying to make the behavior match the comment for timestamps. The Subject says NAN since isnan(binfrac) is what's supposed to be hit for that case. The NAN is intended to come from: |binfrac = (val - low) / (high - low); which is some variation of -inf / inf. Justin
pgsql-hackers by date: