Thread: Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Alban Hertroys
Date:
>> BTW, that second value looks a whole lot like a poorly thought out >> substitute for 'infinity' ... >> regards, tom lane > > That's certainly an interesting comment and I'm open to suggestions! The > original db has two columns (from_timestamp, to_timestamp). I don't go for > NULL in the to_timestamp column. Alternatively, a timestamp very, very far > in the future can throw off query planners. Tom is telling you that there is a special "timestamp" 'infinity': alter table the_table alter column the_column set default tsrange(now()::timestamp without time zone, 'infinity'::timestamp without time zone); -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Craig Ringer
Date:
On 07/16/2012 07:41 PM, Alban Hertroys wrote: >>> BTW, that second value looks a whole lot like a poorly thought out >>> substitute for 'infinity' ... >>> regards, tom lane >> That's certainly an interesting comment and I'm open to suggestions! The >> original db has two columns (from_timestamp, to_timestamp). I don't go for >> NULL in the to_timestamp column. Alternatively, a timestamp very, very far >> in the future can throw off query planners. > Tom is telling you that there is a special "timestamp" 'infinity': > > alter table the_table alter column the_column set default > tsrange(now()::timestamp without time zone, 'infinity'::timestamp > without time zone); > Yup. The 'infinity' value doesn't play well with all database access APIs and languages, though. Many languages can't represent infinite dates, so the DB access APIs have to use dirty hacks with placeholder values. It can be safer not to use infinite dates. Java and Python are two languages that I know don't have infinite date representations (even JodaTime doesn't add one for Java, grr!). eg: #!/usr/bin/env python import psycopg2 conn = psycopg2.connect("dbname=postgres") curs = conn.cursor() curs.execute("SELECT DATE 'infinity';") curs.fetchall() [(datetime.date(9999, 12, 31),)] The trap here is if you fetch some data, modify it, and push it back to the DB, your 'infinite' dates might not be anymore. You have to trust the database access layer to translate their placeholders back to 'infinity' and many won't. That's where the other part of Tom's comment comes in: *poorly thought out* substitute for infinity. Remember y2k? If you're going to use a future date for 'infinity', try one that's nice and far away like 2999-01-01 . Consider adding a CHECK constraint that excludes dates several decades prior, and all dates after, making it really obviously a special value, eg: CHECK date_in_range ( some_col < DATE '2300-01-01' OR some_col = DATE '2999-01-01' ) ... and DOCUMENT THIS CLEARLY in your app's limitations. -- Craig Ringer
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 3:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > On 07/16/2012 07:41 PM, Alban Hertroys wrote: >>>> >>>> BTW, that second value looks a whole lot like a poorly thought out > Yup. The 'infinity' value doesn't play well with all database access APIs > and languages, though. It doesn't even play well with PostgreSQL's extract(). I reported it some times ago but as of 9.1.4 it has not been fixed. =# select extract(epoch from 'infinity'::timestamp); date_part ----------- 0 This makes 'infinity' a problematic choice in any application requiring a mapping between dates and reals, such as when using intervals in gist indexes. -- Daniele
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Alban Hertroys
Date:
On 16 Jul 2012, at 17:57, Daniele Varrazzo wrote: > On Mon, Jul 16, 2012 at 3:56 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: >> On 07/16/2012 07:41 PM, Alban Hertroys wrote: >>>>> >>>>> BTW, that second value looks a whole lot like a poorly thought out > >> Yup. The 'infinity' value doesn't play well with all database access APIs >> and languages, though. > > It doesn't even play well with PostgreSQL's extract(). I reported it > some times ago but as of 9.1.4 it has not been fixed. > > =# select extract(epoch from 'infinity'::timestamp); > date_part > ----------- > 0 > > This makes 'infinity' a problematic choice in any application > requiring a mapping between dates and reals, such as when using > intervals in gist indexes. Well yeah, obviously. I don't think many mathematicians have wrapped their brain around questions like what day of the weekinfinity is and whether it's a working day or not. Perhaps Douglas Adams did, in which case it was probably a Tuesday. I'm just saying, most of the date-parts that extract can retrieve from a timestamp are meaningless with infinity. But, theyare also be meaningless with a placeholder date like 31-12-2999. That said, if it were up to me to decide what the proper epoch value would be for infinity, I'd say NULL - it is unknownas computers simply can't count far enough. It's probably only a matter of time until someone thinks of a solutionfor that though. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
"David Johnston"
Date:
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Alban Hertroys > Sent: Monday, July 16, 2012 1:44 PM > To: Daniele Varrazzo > Cc: Craig Ringer; Chris Bartlett; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Can't figure out how to use now() in default for > tsrange column (PG 9.2) > That said, if it were up to me to decide what the proper epoch value would > be for infinity, I'd say NULL - it is unknown as computers simply can't count > far enough. It's probably only a matter of time until someone thinks of a > solution for that though. > Given: A) extract() returns a "double precision" B) "... In addition to ordinary numeric values, the floating-point types have several special values: ... 'Infinity ..." I would vote that the epoch result should be "Infinity". Specific, but unknown (e.g., day of week, month, year, etc...) results could return "NaN" though "NULL" is also, probably more, reasonable given the context. The goal would be to use "Infinity" in case where "<>" comparisons are common and use "NULL" where "=" comparisons are common. Personally I'd prefer to generate an error in places where "NULL" would be the result in order to minimize bugs. David J.
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 7:08 PM, David Johnston <polobo@yahoo.com> wrote: > Given: > A) extract() returns a "double precision" > B) "... In addition to ordinary numeric values, the floating-point types > have several special values: ... 'Infinity ..." > > I would vote that the epoch result should be "Infinity". That's what I think the only reasonable value if extract(epoch) was to be used to map timestamps to doubles: the order relation is preserved, hence indexes work as they should. Then, of course, I may be asking too much from that function (although in that case the need for a purposely designed mapping function would be felt. At least, I've felt it). > Specific, but unknown (e.g., day of week, month, year, etc...) results could > return "NaN" though "NULL" is also, probably more, reasonable given the > context. > > The goal would be to use "Infinity" in case where "<>" comparisons are > common and use "NULL" where "=" comparisons are common. > > Personally I'd prefer to generate an error in places where "NULL" would be > the result in order to minimize bugs. The use case of extracting anything else than epoch from infinity is not so compelling. And of course the result is undetermined. Having to choose one, I'd go for the same result of sin(inf), which is NaN. Even without this quirk, the problem of mapping timestamps to other languages data types could be an even stronger design factor. I've personally settled for 9999-12-31 which is python's datetime.max, maps ok to doubles and won't create problems for almost 8000 years. -- Daniele
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Chris Bartlett
Date:
At 8:35 PM +0100 16/7/12, Daniele Varrazzo wrote: >Even without this quirk, the problem of mapping timestamps to other >languages data types could be an even stronger design factor. I've >personally settled for 9999-12-31 which is python's datetime.max, maps >ok to doubles and won't create problems for almost 8000 years. > > >-- Daniele Stephane Faroult ('The Art of SQL') says that dates so far in the future can throw off query planners. He explains it nicely here: http://www.youtube.com/watch?v=gu0WJJXgEFM Interesting discussion, thanks. Chris
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Daniele Varrazzo
Date:
On Mon, Jul 16, 2012 at 10:30 PM, Chris Bartlett <c.bartlett@paradise.net.nz> wrote: > At 8:35 PM +0100 16/7/12, Daniele Varrazzo wrote: >> >> Even without this quirk, the problem of mapping timestamps to other >> languages data types could be an even stronger design factor. I've >> personally settled for 9999-12-31 which is python's datetime.max, maps >> ok to doubles and won't create problems for almost 8000 years. > Stephane Faroult ('The Art of SQL') says that dates so far in the future can > throw off query planners. He explains it nicely here: > http://www.youtube.com/watch?v=gu0WJJXgEFM Nice "article". It made me worry about what I've put into my db... So I've made a test: I've tried creating a table with about 100K records uniformly spread across one year: => create table datetest (ts timestamp); => insert into datetest (ts) select generate_series('2012-1-1'::timestamp, '2012-12-31'::timestamp, '5 minutes'::interval) ; INSERT 0 105121 => analyze datetest ; => select * from pg_stats where tablename = 'datetest' and attname = 'ts'; you get a nice uniform histogram. I've tried screwing it up inserting 10, 100, 1000, 10K, 100K "infinity" and/or "9999-12-31": the histogram doesn't move of an inch (the values get recorded in most_common_vals but not in histogram_bounds). This already confirms that postgres doesn't suffer of the problem highlighted in the article. I've made another test, adding first 365, then about 10K distinct values in the year 3000-3001 range: => insert into datetest (ts) select generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1 day'::interval) ; INSERT 0 365 => analyze datetest ; => select * from pg_stats where tablename = 'datetest' and attname = 'ts'; => insert into datetest (ts) select generate_series('3000-1-1'::timestamp, '3000-12-31'::timestamp, '1 hour'::interval) ; INSERT 0 8737 => analyze datetest ; => select * from pg_stats where tablename = 'datetest' and attname = 'ts'; and it moves... of an inch. In the first case only one bin (of the 100 PG 9.1 defaults) contains dates around the 3000, in the second case only about 8. And the values around the Y3K are in fact the 8% of the table. No bin is wasted to record data between the 2013 and the 3000. This suggests me that the PG histogram doesn't count the occurrence into bins of the same size over the domain, but instead records the boundaries on the domain of bins containing the same amount of data. Pretty solid! -- Daniele
On Mon, 2012-07-16 at 13:41 +0200, Alban Hertroys wrote: > >> BTW, that second value looks a whole lot like a poorly thought out > >> substitute for 'infinity' ... > >> regards, tom lane > > > > That's certainly an interesting comment and I'm open to suggestions! The > > original db has two columns (from_timestamp, to_timestamp). I don't go for > > NULL in the to_timestamp column. Alternatively, a timestamp very, very far > > in the future can throw off query planners. > > Tom is telling you that there is a special "timestamp" 'infinity': Or, perhaps specify NULL for the upper bound, indicating that there is no upper bound and the range will be infinite. Note that this does not mean that the upper bound is NULL in the "unknown" sense, it means that there is no upper bound. Ranges have their own internal concept of unbounded ranges, so they work for other data types that don't have a concept of infinity (like "integer"). Regards, Jeff Davis
Daniele Varrazzo <daniele.varrazzo@gmail.com> writes: > This suggests me that the PG histogram doesn't count the occurrence > into bins of the same size over the domain, but instead records the > boundaries on the domain of bins containing the same amount of data. Right. I don't believe the PG planner has any issues with infinite dates (and if you do find one, we'll do our best to fix it). But the point that client-side code might not cope with infinite dates is very valid, and that's not something that we can undertake to fix if you run into it. regards, tom lane
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
Rafal Pietrak
Date:
On Mon, 2012-07-16 at 14:08 -0400, David Johnston wrote: [------------] > > Specific, but unknown (e.g., day of week, month, year, etc...) results could > return "NaN" though "NULL" is also, probably more, reasonable given the > context. > > The goal would be to use "Infinity" in case where "<>" comparisons are > common and use "NULL" where "=" comparisons are common. Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both start_date and end_date possibly have 'infinity') Anyway, "NaN" looks quite appealing, particulary since currently: SELECT date_part('year','infinity'::timestamp ) ; date_part ----------- 0 (1 row) ... can lead to applications misbehaving in strange ways. I feal that date_part() on infinity, should behave "similarly to" division by zero - an exception. But seeing a lot of code obfuscated with checks for division by zero before doing an opperation, I'd opt for silently returning a NaN in most cases, with fields like 'year', 'century', 'epoch', etc. returning 'Infinity'. -R
Re: Can't figure out how to use now() in default for tsrange column (PG 9.2)
From
David Johnston
Date:
On Jul 17, 2012, at 2:32, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote: > On Mon, 2012-07-16 at 14:08 -0400, David Johnston wrote: > [------------] >> >> Specific, but unknown (e.g., day of week, month, year, etc...) results could >> return "NaN" though "NULL" is also, probably more, reasonable given the >> context. >> >> The goal would be to use "Infinity" in case where "<>" comparisons are >> common and use "NULL" where "=" comparisons are common. > > Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE > start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both > start_date and end_date possibly have 'infinity') I was unclear. I intended "<>" to mean "greater than and less than comparisons" as opposed to not equals comparisons. Equalityand inequality are two sides to the same coin. > > Anyway, "NaN" looks quite appealing, particulary since currently: > > SELECT date_part('year','infinity'::timestamp ) ; > date_part > ----------- > 0 > (1 row) > > ... can lead to applications misbehaving in strange ways. > > I feal that date_part() on infinity, should behave "similarly to" > division by zero - an exception. But seeing a lot of code obfuscated > with checks for division by zero before doing an opperation, I'd opt for > silently returning a NaN in most cases, with fields like 'year', > 'century', 'epoch', etc. returning 'Infinity'. > > -R > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On 2012-07-17, David Johnston <polobo@yahoo.com> wrote: > On Jul 17, 2012, at 2:32, Rafal Pietrak <rafal@zorro.isa-geek.com> wrote: >> >> Is that even possible to implement? (e.g.: "SELECT * FROM log WHERE >> start_date <> 'XXXX-YY-ZZ' and end_date = 'ZZZZ-AA-BB'" - when both >> start_date and end_date possibly have 'infinity') > > I was unclear. I intended "<>" to mean "greater than and less than comparisons" as opposed to not equals comparisons. Equality and inequality are two sides to the same coin. confusingly in mathematics greater-than and less-than are classed as inequalities -- ⚂⚃ 100% natural