Home > mailing lists

Re: comment regarding double timestamps; and, infinite timestamps and NaN - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: comment regarding double timestamps; and, infinite timestamps and NaN
Date	December 30, 2019 22:18:17
Msg-id	7411.1577733497@sss.pgh.pa.us Whole thread Raw
In response to	Re: comment regarding double timestamps; and, infinite timestampsand NaN (Justin Pryzby <pryzby@telsasoft.com>)
Responses	infinite histogram bounds and nan (Re: comment regarding doubletimestamps; and, infinite timestamps and NaN) (Justin Pryzby <pryzby@telsasoft.com>)
List	pgsql-hackers

Tree view

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
>> Uh, what?  This seems completely wrong to me.  We could possibly
>> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
>> I don't really see the point.  They'll compare to other timestamp
>> values correctly without that, cf timestamp_cmp_internal().
>> The example you give seems to me to be working sanely, or at least
>> as sanely as it can given the number of histogram points available,
>> with the existing code.  In any case, shoving NaNs into the
>> computation is not going to make anything better.

> As I see it, the problem is that the existing code tests for isnan(), but
> infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
> absurdly large values being used as if they were isnormal().

I still say that (1) you're confusing NaN with Infinity, and (2)
you haven't actually shown that there's a problem to fix.
These endpoint values are *not* NaNs.

> On v12, my test gives:
> |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> |INSERT INTO t VALUES('-infinity');
> |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> | Seq Scan on t  (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1)

This is what it should do.  There's only one histogram bucket, and
it extends down to -infinity, so the conclusion is going to be that
the WHERE clause excludes all but a small part of the bucket.  This
is the correct answer based on the available stats; the problem is
not with the calculation, but with the miserable granularity of the
available stats.

> vs patched master:
> |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> |INSERT INTO t VALUES('-infinity');
> |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> | Seq Scan on t  (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1)

This answer is simply broken.  You've caused it to estimate half
of the bucket, which is an insane estimate for the given bucket
boundaries and WHERE constraint.

> IMO 146 rows is a reasonable estimate given a single histogram bucket of
> infinite width,

No, it isn't.

            regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 30 December 2019, 21:52:50
Subject: Re: [HACKERS] pg_shmem_allocations view

From: David Fetter
Date: 30 December 2019, 22:27:12
Subject: Re: Recognizing superuser in pg_hba.conf

Re: comment regarding double timestamps; and, infinite timestamps and NaN - Mailing list pgsql-hackers

Previous

Next