Re: Can't figure out how to use now() in default for tsrange column (PG 9.2) - Mailing list pgsql-general

From Daniele Varrazzo
Subject Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
Date
Msg-id CA+mi_8YZ_O2OukZ+TaKimak=-Wn-xC1JNm7Dvamg4PYMmZ_WGA@mail.gmail.com
Whole thread Raw
In response to Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)  (Alban Hertroys <haramrae@gmail.com>)
Responses Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
List pgsql-general
On Mon, Jul 16, 2012 at 10:30 PM, Chris Bartlett
<c.bartlett@paradise.net.nz> wrote:
> At 8:35 PM +0100 16/7/12, Daniele Varrazzo wrote:
>>
>> Even without this quirk, the problem of mapping timestamps to other
>> languages data types could be an even stronger design factor. I've
>> personally settled for 9999-12-31 which is python's datetime.max, maps
>> ok to doubles and won't create problems for almost 8000 years.

> Stephane Faroult ('The Art of SQL') says that dates so far in the future can
> throw off query planners. He explains it nicely here:
> http://www.youtube.com/watch?v=gu0WJJXgEFM

Nice "article". It made me worry about what I've put into my db... So
I've made a test: I've tried creating a table with about 100K records
uniformly spread across one year:

    => create table datetest (ts timestamp);
    => insert into datetest (ts) select
generate_series('2012-1-1'::timestamp, '2012-12-31'::timestamp, '5
minutes'::interval) ;
    INSERT 0 105121
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

you get a nice uniform histogram. I've tried screwing it up inserting
10, 100, 1000, 10K, 100K "infinity" and/or "9999-12-31": the histogram
doesn't move of an inch (the values get recorded in most_common_vals
but not in histogram_bounds).

This already confirms that postgres doesn't suffer of the problem
highlighted in the article. I've made another test, adding first 365,
then about 10K distinct values in the year 3000-3001 range:

    => insert into datetest (ts) select
generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1
day'::interval) ;
    INSERT 0 365
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

    => insert into datetest (ts) select
generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1
hour'::interval) ;
    INSERT 0 8737
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

and it moves... of an inch. In the first case only one bin (of the 100
PG 9.1 defaults) contains dates around the 3000, in the second case
only about 8. And the values around the Y3K are in fact the 8% of the
table. No bin is wasted to record data between the 2013 and the 3000.
This suggests me that the PG histogram doesn't count the occurrence
into bins of the same size over the domain, but instead records the
boundaries on the domain of bins containing the same amount of data.

Pretty solid!

-- Daniele

pgsql-general by date:

Previous
From: Michael Nolan
Date:
Subject: Re: Replication/cloning: rsync vs modification dates?
Next
From: Jeff Davis
Date:
Subject: Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)