Thread: Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Alban Hertroys
Date:
>> BTW, that second value looks a whole lot like a poorly thought out
>> substitute for 'infinity' ...
>>                         regards, tom lane
>
> That's certainly an interesting comment and I'm open to suggestions! The
> original db has two columns (from_timestamp, to_timestamp). I don't go for
> NULL in the to_timestamp column. Alternatively, a timestamp very, very far
> in the future can throw off query planners.

Tom is telling you that there is a special "timestamp" 'infinity':

alter table the_table alter column the_column set default
tsrange(now()::timestamp without time zone, 'infinity'::timestamp
without time zone);

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

On 07/16/2012 07:41 PM, Alban Hertroys wrote:
>>> BTW, that second value looks a whole lot like a poorly thought out
>>> substitute for 'infinity' ...
>>>                          regards, tom lane
>> That's certainly an interesting comment and I'm open to suggestions! The
>> original db has two columns (from_timestamp, to_timestamp). I don't go for
>> NULL in the to_timestamp column. Alternatively, a timestamp very, very far
>> in the future can throw off query planners.
> Tom is telling you that there is a special "timestamp" 'infinity':
>
> alter table the_table alter column the_column set default
> tsrange(now()::timestamp without time zone, 'infinity'::timestamp
> without time zone);
>

Yup. The 'infinity' value doesn't play well with all database access
APIs and languages, though. Many languages can't represent infinite
dates, so the DB access APIs have to use dirty hacks with placeholder
values. It can be safer not to use infinite dates. Java and Python are
two languages that I know don't have infinite date representations (even
JodaTime doesn't add one for Java, grr!).

eg:

#!/usr/bin/env python
import psycopg2
conn = psycopg2.connect("dbname=postgres")
curs = conn.cursor()
curs.execute("SELECT DATE 'infinity';")
curs.fetchall()
[(datetime.date(9999, 12, 31),)]

The trap here is if you fetch some data, modify it, and push it back to
the DB, your 'infinite' dates might not be anymore. You have to trust
the database access layer to translate their placeholders back to
'infinity' and many won't.

That's where the other part of Tom's comment comes in: *poorly thought
out* substitute for infinity. Remember y2k? If you're going to use a
future date for 'infinity', try one that's nice and far away like
2999-01-01 . Consider adding a CHECK constraint that excludes dates
several decades prior, and all dates after, making it really obviously a
special value, eg:

CHECK date_in_range (  some_col < DATE '2300-01-01' OR some_col = DATE
'2999-01-01' )

... and DOCUMENT THIS CLEARLY in your app's limitations.

--
Craig Ringer

Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 3:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/16/2012 07:41 PM, Alban Hertroys wrote:
>>>>
>>>> BTW, that second value looks a whole lot like a poorly thought out

> Yup. The 'infinity' value doesn't play well with all database access APIs
> and languages, though.

It doesn't even play well with PostgreSQL's extract(). I reported it
some times ago but as of 9.1.4 it has not been fixed.

=# select extract(epoch from 'infinity'::timestamp);
 date_part
-----------
         0

This makes 'infinity' a problematic choice in any application
requiring a mapping between dates and reals, such as when using
intervals in gist indexes.

-- Daniele

Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Alban Hertroys
Date:
On 16 Jul 2012, at 17:57, Daniele Varrazzo wrote:

> On Mon, Jul 16, 2012 at 3:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> On 07/16/2012 07:41 PM, Alban Hertroys wrote:
>>>>>
>>>>> BTW, that second value looks a whole lot like a poorly thought out
>
>> Yup. The 'infinity' value doesn't play well with all database access APIs
>> and languages, though.
>
> It doesn't even play well with PostgreSQL's extract(). I reported it
> some times ago but as of 9.1.4 it has not been fixed.
>
> =# select extract(epoch from 'infinity'::timestamp);
> date_part
> -----------
>         0
>
> This makes 'infinity' a problematic choice in any application
> requiring a mapping between dates and reals, such as when using
> intervals in gist indexes.


Well yeah, obviously. I don't think many mathematicians have wrapped their brain around questions like what day of the
weekinfinity is and whether it's a working day or not. Perhaps Douglas Adams did, in which case it was probably a
Tuesday.
I'm just saying, most of the date-parts that extract can retrieve from a timestamp are meaningless with infinity. But,
theyare also be meaningless with a placeholder date like 31-12-2999. 

That said, if it were up to me to decide what the proper epoch value would be for infinity, I'd say NULL - it is
unknownas computers simply can't count far enough. It's probably only a matter of time until someone thinks of a
solutionfor that though. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Alban Hertroys
> Sent: Monday, July 16, 2012 1:44 PM
> To: Daniele Varrazzo
> Cc: Craig Ringer; Chris Bartlett; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Can't figure out how to use now() in default for
> tsrange column (PG 9.2)
> That said, if it were up to me to decide what the proper epoch value would
> be for infinity, I'd say NULL - it is unknown as computers simply can't
count
> far enough. It's probably only a matter of time until someone thinks of a
> solution for that though.
>

Given:
A) extract() returns a "double precision"
B) "... In addition to ordinary numeric values, the floating-point types
have several special values: ... 'Infinity ..."

I would vote that the epoch result should be "Infinity".

Specific, but unknown (e.g., day of week, month, year, etc...) results could
return "NaN" though "NULL" is also, probably more, reasonable given the
context.

The goal would be to use "Infinity" in case where "<>" comparisons are
common and use "NULL" where "=" comparisons are common.

Personally I'd prefer to generate an error in places where "NULL" would be
the result in order to minimize bugs.

David J.









Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 7:08 PM, David Johnston <polobo@yahoo.com> wrote:

> Given:
> A) extract() returns a "double precision"
> B) "... In addition to ordinary numeric values, the floating-point types
> have several special values: ... 'Infinity ..."
>
> I would vote that the epoch result should be "Infinity".

That's what I think the only reasonable value if extract(epoch) was to
be used to map timestamps to doubles: the order relation is preserved,
hence indexes work as they should. Then, of course, I may be asking
too much from that function (although in that case the need for a
purposely designed mapping function would be felt. At least, I've felt
it).


> Specific, but unknown (e.g., day of week, month, year, etc...) results could
> return "NaN" though "NULL" is also, probably more, reasonable given the
> context.
>
> The goal would be to use "Infinity" in case where "<>" comparisons are
> common and use "NULL" where "=" comparisons are common.
>
> Personally I'd prefer to generate an error in places where "NULL" would be
> the result in order to minimize bugs.

The use case of extracting anything else than epoch from infinity is
not so compelling. And of course the result is undetermined. Having to
choose one, I'd go for the same result of sin(inf), which is NaN.

Even without this quirk, the problem of mapping timestamps to other
languages data types could be an even stronger design factor. I've
personally settled for 9999-12-31 which is python's datetime.max, maps
ok to doubles and won't create problems for almost 8000 years.


-- Daniele

Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Chris Bartlett
Date:
At 8:35 PM +0100 16/7/12, Daniele Varrazzo wrote:
>Even without this quirk, the problem of mapping timestamps to other
>languages data types could be an even stronger design factor. I've
>personally settled for 9999-12-31 which is python's datetime.max, maps
>ok to doubles and won't create problems for almost 8000 years.
>
>
>-- Daniele


Stephane Faroult ('The Art of SQL') says that dates so far in the
future can throw off query planners. He explains it nicely here:
http://www.youtube.com/watch?v=gu0WJJXgEFM

Interesting discussion, thanks.

Chris


Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 10:30 PM, Chris Bartlett
<c.bartlett@paradise.net.nz> wrote:
> At 8:35 PM +0100 16/7/12, Daniele Varrazzo wrote:
>>
>> Even without this quirk, the problem of mapping timestamps to other
>> languages data types could be an even stronger design factor. I've
>> personally settled for 9999-12-31 which is python's datetime.max, maps
>> ok to doubles and won't create problems for almost 8000 years.

> Stephane Faroult ('The Art of SQL') says that dates so far in the future can
> throw off query planners. He explains it nicely here:
> http://www.youtube.com/watch?v=gu0WJJXgEFM

Nice "article". It made me worry about what I've put into my db... So
I've made a test: I've tried creating a table with about 100K records
uniformly spread across one year:

    => create table datetest (ts timestamp);
    => insert into datetest (ts) select
generate_series('2012-1-1'::timestamp, '2012-12-31'::timestamp, '5
minutes'::interval) ;
    INSERT 0 105121
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

you get a nice uniform histogram. I've tried screwing it up inserting
10, 100, 1000, 10K, 100K "infinity" and/or "9999-12-31": the histogram
doesn't move of an inch (the values get recorded in most_common_vals
but not in histogram_bounds).

This already confirms that postgres doesn't suffer of the problem
highlighted in the article. I've made another test, adding first 365,
then about 10K distinct values in the year 3000-3001 range:

    => insert into datetest (ts) select
generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1
day'::interval) ;
    INSERT 0 365
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

    => insert into datetest (ts) select
generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1
hour'::interval) ;
    INSERT 0 8737
    => analyze datetest ;
    => select * from pg_stats where tablename = 'datetest' and attname = 'ts';

and it moves... of an inch. In the first case only one bin (of the 100
PG 9.1 defaults) contains dates around the 3000, in the second case
only about 8. And the values around the Y3K are in fact the 8% of the
table. No bin is wasted to record data between the 2013 and the 3000.
This suggests me that the PG histogram doesn't count the occurrence
into bins of the same size over the domain, but instead records the
boundaries on the domain of bins containing the same amount of data.

Pretty solid!

-- Daniele

On Mon, 2012-07-16 at 13:41 +0200, Alban Hertroys wrote:
> >> BTW, that second value looks a whole lot like a poorly thought out
> >> substitute for 'infinity' ...
> >>                         regards, tom lane
> >
> > That's certainly an interesting comment and I'm open to suggestions! The
> > original db has two columns (from_timestamp, to_timestamp). I don't go for
> > NULL in the to_timestamp column. Alternatively, a timestamp very, very far
> > in the future can throw off query planners.
>
> Tom is telling you that there is a special "timestamp" 'infinity':

Or, perhaps specify NULL for the upper bound, indicating that there is
no upper bound and the range will be infinite.

Note that this does not mean that the upper bound is NULL in the
"unknown" sense, it means that there is no upper bound.

Ranges have their own internal concept of unbounded ranges, so they work
for other data types that don't have a concept of infinity (like
"integer").

Regards,
    Jeff Davis


Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> This suggests me that the PG histogram doesn't count the occurrence
> into bins of the same size over the domain, but instead records the
> boundaries on the domain of bins containing the same amount of data.

Right.  I don't believe the PG planner has any issues with infinite
dates (and if you do find one, we'll do our best to fix it).  But the
point that client-side code might not cope with infinite dates is
very valid, and that's not something that we can undertake to fix
if you run into it.

            regards, tom lane

Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
Rafal Pietrak
Date:
On Mon, 2012-07-16 at 14:08 -0400, David Johnston wrote:
[------------]
>
> Specific, but unknown (e.g., day of week, month, year, etc...) results could
> return "NaN" though "NULL" is also, probably more, reasonable given the
> context.
>
> The goal would be to use "Infinity" in case where "<>" comparisons are
> common and use "NULL" where "=" comparisons are common.

Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE
start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both
start_date and end_date possibly have 'infinity')

Anyway, "NaN" looks quite appealing, particulary since currently:

SELECT date_part('year','infinity'::timestamp )  ;
 date_part
-----------
         0
(1 row)

... can lead to applications misbehaving in strange ways.

I feal that date_part() on infinity, should behave "similarly to"
division by zero - an exception. But seeing a lot of code obfuscated
with checks for division by zero before doing an opperation, I'd opt for
silently returning a NaN in most cases, with fields like 'year',
'century', 'epoch', etc. returning 'Infinity'.

-R


Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)

From
David Johnston
Date:
On Jul 17, 2012, at 2:32, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:

> On Mon, 2012-07-16 at 14:08 -0400, David Johnston wrote:
> [------------]
>>
>> Specific, but unknown (e.g., day of week, month, year, etc...) results could
>> return "NaN" though "NULL" is also, probably more, reasonable given the
>> context.
>>
>> The goal would be to use "Infinity" in case where "<>" comparisons are
>> common and use "NULL" where "=" comparisons are common.
>
> Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE
> start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both
> start_date and end_date possibly have 'infinity')

I was unclear.  I intended "<>" to mean "greater than and less than comparisons" as opposed to not equals comparisons.
Equalityand inequality are two sides to the same coin. 


>
> Anyway, "NaN" looks quite appealing, particulary since currently:
>
> SELECT date_part('year','infinity'::timestamp )  ;
> date_part
> -----------
>         0
> (1 row)
>
> ... can lead to applications misbehaving in strange ways.
>
> I feal that date_part() on infinity, should behave "similarly to"
> division by zero - an exception. But seeing a lot of code obfuscated
> with checks for division by zero before doing an opperation, I'd opt for
> silently returning a NaN in most cases, with fields like 'year',
> 'century', 'epoch', etc. returning 'Infinity'.
>
> -R
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

On 2012-07-17, David Johnston <polobo@yahoo.com> wrote:
> On Jul 17, 2012, at 2:32, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote:
>>
>> Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE
>> start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both
>> start_date and end_date possibly have 'infinity')
>
> I was unclear.  I intended "<>" to mean "greater than and less than comparisons" as opposed to not equals
comparisons. Equality and inequality are two sides to the same coin. 

confusingly in mathematics greater-than and less-than are classed as inequalities


--
⚂⚃ 100% natural