Thread: Data type confusion
Tom, Stephan, I'm writing up the date/time FAQ, and I came across some operator behavior that confuses me: If INTERVAL / INTEGER = INTERVAL then why does INTERVAL / INTERVAL = ERROR? Shouldn't INTERVAL / INTERVAL = INTEGER? I'd like to answer this before I finish the FAQ, as it seems inconsistent behavior. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Attachment
Josh Berkus wrote: > This is a multi-part MIME message > > --_===97089====davinci.ethosmedia.com===_ > Content-Type: text/plain; charset="ISO-8859-1" > Content-Transfer-Encoding: 8bit > > Tom, Stephan, > > I'm writing up the date/time FAQ, and I came across some operator > behavior that confuses me: > > If > INTERVAL / INTEGER = INTERVAL > > then why does > INTERVAL / INTERVAL = ERROR? > > Shouldn't > INTERVAL / INTERVAL = INTEGER? > > I'd like to answer this before I finish the FAQ, as it seems > inconsistent behavior. > > -Josh > > Josh, I'm not Tom or Stephan (sorry) but in your scenario what would be the result of, say, dividing '3 months ago' with '6 seconds'in the future? I don't think it makes conceptual sense to divide intervals.... Allan.
"Josh Berkus" <josh@agliodbs.com> writes: > If > INTERVAL / INTEGER = INTERVAL Actually the operator appears to be INTERVAL / FLOAT8. > then why does > INTERVAL / INTERVAL = ERROR? Because no one got around to creating an INTERVAL / INTERVAL operator. There are plenty of such gaps in our operator set... > Shouldn't > INTERVAL / INTERVAL = INTEGER? I'd think the output should be FLOAT8, myself, since the result could be fractional. Anyway, the generic response to such questions is "feel free to code it up and submit a patch". regards, tom lane
Tom, > > then why does > > INTERVAL / INTERVAL = ERROR? > > Because no one got around to creating an INTERVAL / INTERVAL > operator. > There are plenty of such gaps in our operator set... Bummer. If I could "C" then maybe I'd do something about it. > > > Shouldn't > > INTERVAL / INTERVAL = INTEGER? > > I'd think the output should be FLOAT8, myself, since the result > could be fractional. > > Anyway, the generic response to such questions is "feel free to > code it up and submit a patch". Sorry. :( I'm stricly a "high-level user". I can, however, document it so that others won't pester you for questions about why it doesn't work. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Attachment
Allan Engelhardt <allane@cybaea.com> writes: > I don't think it makes conceptual sense to divide intervals.... It is kinda bogus, given the underlying semantics of intervals (integer months plus float seconds). The problem already arises for the existing interval * float8 and interval / float8 operators, though, so it'd be easy enough to make an interval / interval operator that is consistent with them. What those operators do is to convert any fractional-month result into seconds at an arbitrary conversion factor of 30 days to the month. For example, consider regression=# select '5 months 9 days'::interval; ?column? ---------------5 mons 9 days (1 row) regression=# select '5 months 9 days'::interval * 0.5; ?column? ----------------------2 mons 19 days 12:00 (1 row) The initial product is effectively 2.5 months plus 4.5 days, and then we translate the .5 months into 15 days. This is pretty grotty, and AFAIK not documented anywhere --- I found it out by looking at the C code for these operators. But I'm not sure how to do better. regards, tom lane
Tom Lane writes: > It is kinda bogus, given the underlying semantics of intervals > (integer months plus float seconds). > This is pretty grotty, and AFAIK not documented anywhere --- I found it > out by looking at the C code for these operators. But I'm not sure > how to do better. One day we will have to accept the fact that months and seconds must not be mixed, period. You can have year/month intervals or day/hour/minute/second intervals, not a combination. An interval of '5 years 3 minutes' has no meaning with the natural calendar rules. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > One day we will have to accept the fact that months and seconds must not > be mixed, period. You can have year/month intervals or > day/hour/minute/second intervals, not a combination. An interval of '5 > years 3 minutes' has no meaning with the natural calendar rules. I don't agree --- five years and three minutes is perfectly meaningful. There are only certain things you can validly do with it, however, and scaling by a floating-point number isn't one of them, because fractional months aren't well-defined. But you can, for example, add it to or subtract it from a timestamp to produce a well-defined result timestamp. The real bogosity in the interval type is that months and seconds are not sufficient: it should be months, days, and seconds. As we get reminded twice a year by the regression tests, "1 day" and "24 hours" are not the same thing. regards, tom lane
Tom Lane writes: > I don't agree --- five years and three minutes is perfectly meaningful. > There are only certain things you can validly do with it, however, and > scaling by a floating-point number isn't one of them, because fractional > months aren't well-defined. But you can, for example, add it to or > subtract it from a timestamp to produce a well-defined result timestamp. Maybe. Or maybe not. Take 1 year and 3 seconds. E.g., '2001-08-06 03:03:03' - '1 year 3 seconds' = '2000-08-06 03:03:00' '2000-08-06 03:03:03' - '1 year 3 seconds' = '1999-08-06 03:03:00' but '2001-08-06 03:03:03' - '2000-08-06 03:03:00' = '365 days 3 seconds' '2000-08-06 03:03:03' - '1999-08-06 03:03:00' = '366 days 3 seconds' This means either a) A value such as '1 year 3 seconds' varies depending on context, which is not how our system is intended to work, or b) The normal rules of arithmetic do not hold. I doubt the following is was good idea: select timestamp '2000-08-06 03:03:03' - ( timestamp '2000-08-06 03:03:03' - interval '1 year 3 seconds' ); ?column? -------------------366 days 00:00:03 select timestamp '2000-08-06 03:03:03' - timestamp '2000-08-06 03:03:03' + interval '1 year 3 seconds' ; ?column? -----------------1 year 00:00:03 On the other hand, in certain applications even fractional months may be useful. Banks sometimes organize a year as 360 days and months as 30 days, so talking about 0.5 months might make sense. However, in this case again, years/months and days/seconds must not be mixed. Another interesting tidbit here: select interval '1 year 00:00:03' = interval '360 days 00:00:03' ;?column? ----------t > The real bogosity in the interval type is that months and seconds are > not sufficient: it should be months, days, and seconds. As we get > reminded twice a year by the regression tests, "1 day" and "24 hours" > are not the same thing. Agreed. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > but > '2001-08-06 03:03:03' - '2000-08-06 03:03:00' = '365 days 3 seconds' > '2000-08-06 03:03:03' - '1999-08-06 03:03:00' = '366 days 3 seconds' What I said was that timestamp plus or minus interval is well-defined (when "interval" is a multi-part symbolic interval). It's quite obvious that timestamp minus timestamp yielding interval is not uniquely defined: in the above examples one could express the result either as you show or as '1 year 3 seconds', which I would argue is preferable. For a 3-part (month/day/second) interval, I think the preferable rule for timestamp subtraction is to use the largest symbolic component possible, ie, use the largest number of months/years you can, then use the largest number of days fitting in the remainder, then express what's left as seconds. This is an arbitrary choice among the many possible 3-part representations of a given interval, but it seems like the most natural one for many applications. > a) A value such as '1 year 3 seconds' varies depending on context, which > is not how our system is intended to work, or Isn't it? The relationship between years, days, and seconds is *inherently* context dependent in the common calendar. It might not be too sensible, but sensibleness has never held sway in calendars, at least not since the Romans. I think that the actually useful operations for symbolic intervals have to do with adding them to (or subtracting them from) timestamps. For example, I know exactly what I think should happen when I write now() + '1 day'::interval, and that two days out of the year this should yield a different result from now() + '24 hours'::interval. Whatever else we do with intervals has to mesh with that as best we can make it happen. I'm not sure your notion of fractional months really holds water, at least not for this particular operation. When is 25 Feb 2000 plus 0.95 month? Is the 0.95 measured with respect to the length of February, or of March? Does it matter that 2000 is a leap year? There may be some other operations that have sensible interpretations for such a datatype, however. regards, tom lane
Folks, Wow. Talk about asking dangerous questions ... > For a 3-part (month/day/second) interval, I think the preferable rule > for timestamp subtraction is to use the largest symbolic component > possible, ie, use the largest number of months/years you can, then > use the largest number of days fitting in the remainder, then express > what's left as seconds. This is an arbitrary choice among the many > possible 3-part representations of a given interval, but it seems > like > the most natural one for many applications. Sure, that makes sense. In the meantime, I'll add a note to the FAQ which says "Adding and subtracting wildly disparate time values (e.g. '1 year'::INTERVAL - '3 seconds'::INTERVAL) may cause the database to make unusual interval value choices which could impair accuracy. Please test extensively before relying on operations of this sort." > Isn't it? The relationship between years, days, and seconds is > *inherently* context dependent in the common calendar. It might not > be > too sensible, but sensibleness has never held sway in calendars, at > least not since the Romans. Peter is absolutely correct here. '1 year'::INTERVAL - '1 day'::INTERVAL is '364 days'::INTERVAL most of the time. However, on leap years it is '365 days'. > I'm not sure your notion of fractional months really holds water, > at least not for this particular operation. When is 25 Feb 2000 > plus 0.95 month? Is the 0.95 measured with respect to the length > of February, or of March? Does it matter that 2000 is a leap year? > There may be some other operations that have sensible interpretations > for such a datatype, however. One way to simplify this would be not to allow any division operations on INTERVALS that result in a modulo of a smaller increment than the INTERVAL value expressed. Thus, one could "'3 months'::INTERVAL / 3" but would not be allowed to "'2 months::INTERVAL / 3". However, this seems kind of unfair to hour, minute, and second values whose fractions are well-defined and easily manipulated. Or, to put it another way, 95% of the time users just want to do simple things. Like we want to know how many weeks an employee has been with us for: '2 years 3 months'::INTERVAL / '1 week'::INTERVAL (and we don't care about the fractional week left over). Thus we don't want to hold up simple and obvious date multiplication and division just to deal with the wierdo cases. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Attachment
"Josh Berkus" <josh@agliodbs.com> writes: > Peter is absolutely correct here. '1 year'::INTERVAL - '1 > day'::INTERVAL is '364 days'::INTERVAL most of the time. However, on > leap years it is '365 days'. Au contraire, it is always '1 year - 1 day'::INTERVAL. That is a two-part interval value and is not reduced further. When you add it to a date or timestamp, *then* you find out how many days are meant. > One way to simplify this would be not to allow any division operations > on INTERVALS that result in a modulo of a smaller increment than the > INTERVAL value expressed. Thus, one could "'3 months'::INTERVAL / 3" > but would not be allowed to "'2 months::INTERVAL / 3". However, this > seems kind of unfair to hour, minute, and second values whose fractions > are well-defined and easily manipulated. I was toying with the notion of allowing scalings whose results didn't introduce any fractional part to the "months" field. For example '2 months + 1 day' / 2.0 = '1 month + 12hrs' '3 months + 1 day' / 2.0 = error (can't have a half month) '61 days' / 2.0 = '30 days 12hrs' However, I fear that this would make no sense to anyone who hadn't thought about the issues as carefully as we have in this thread. > Or, to put it another way, 95% of the time users just want to do simple > things. Like we want to know how many weeks an employee has been with > us for: '2 years 3 months'::INTERVAL / '1 week'::INTERVAL (and we > don't care about the fractional week left over). Good point. Ugly as the "30 day" convention is, it is probably close enough for that sort of thing. regards, tom lane
Josh Berkus wrote: > Or, to put it another way, 95% of the time users just want to do simple > things. Like we want to know how many weeks an employee has been with > us for: '2 years 3 months'::INTERVAL / '1 week'::INTERVAL (and we > don't care about the fractional week left over). > Thus we don't want to hold up simple and obvious date multiplication and > division just to deal with the wierdo cases. I see now what you are trying to do. It sort of makes sense, but I'm still really reluctant to give (semantic or otherwise)meaning to "yesterday divided by tomorrow" ..... Would it be a better solution if there was a conversion function a la convert(text, interval) RETURNS double precision where text in ('day','week','hour', ....). The function would convert the interval to the specified unit, with some considerabledegree of fuzziness as discussed in this thread. Then your query would be simply convert('week', '2 years 3 months') / 1 and would return something in the vicinity of 117.0 :-) It seems to me that such a function would be more generally useful than the division of intervals. What you really wantto do is not to divide intervals, but to express them in different time units. Or am I missing something (again)? Comments? Allan.
Allan, > I see now what you are trying to do. It sort of makes sense, but I'm > still really reluctant to give (semantic or otherwise) meaning to > "yesterday divided by tomorrow" ..... I don't agree. Consider, for example, this statement: '30 weeks ago'::INTERVAL / '2 weeks'::INTERVAL = -15 Just as -30 / 2 = -15 To phrase the equation above: "How many two week periods is thirty weeks ago? Minus fifteen, or fifteen ago." This makes perfect sense to me. > It seems to me that such a function would be more generally useful > than the division of intervals. What you really want to do is not to > divide intervals, but to express them in different time units. Or am > I missing something (again)? From my pespective? Yes, you are. (For one thing, the CONVERT function in Postgres converts between unicode character sets, not data-types). Look, if I'm designing a payroll application for a company with bi-weekly payroll, I will want a report that shows how many payroll periods for which an employee has been employed. Thus I will want to: periods_employed := (current_timestamp - date_hired) / '2 weeks'::INTERVAL I don't want to go through a bunch of non-ANSI SQL-compliant conversion functions to do it. Especially not as this is just what the ANSI SQL data type and operator specs are designed to support. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
"Josh Berkus" <josh@agliodbs.com> writes: > I don't want to go through a bunch of non-ANSI SQL-compliant conversion > functions to do it. Especially not as this is just what the ANSI SQL > data type and operator specs are designed to support. Curiously enough, ANSI doesn't define an INTERVAL-divided-by-INTERVAL function either. Also, it rather looks like ANSI adopted the position Peter E. expressed: Year-month intervals are mutually comparable only with other year- month intervals. [...] Day-timeintervals are mutually comparable only with other day- time intervals. [...] Operations involving itemsof type datetime require that the date- time items be mutually comparable. Operations involving items of type interval require that the interval items be mutually compara- ble. regards, tom lane
Tom, > Curiously enough, ANSI doesn't define an INTERVAL-divided-by-INTERVAL > function either. Also, it rather looks like ANSI adopted the > position > Peter E. expressed: > > Year-month intervals are mutually comparable only with other > year- > month intervals. [...] > Day-time intervals are mutually comparable only with other > day- > time intervals. [...] > Operations involving items of type datetime require that the > date- > time items be mutually comparable. Operations involving > items of > type interval require that the interval items be mutually > compara- > ble. Hmmm ... does this mean that I couldn't divide '1 year' by '1 week'? I can certaily see not allowing division of '1 year' by '28 seconds' as it spares us a whole bunch of calendar-generated fuzziness. It seems to me that: years,months,weeks,days / years,months,weeks,days is OK, and days,hours,minutes,seconds / days,hours,minutes,seconds is also easy, but years,months,weeks / hours,minutes,seconds is where we get in trouble. So I propose that we suppot the first two and disallow the third. Thus I think that we can adhere to the spec, while still providing the functionality developers want and avoiding a whole lot of '5 months 11 minutes' type headaches. -Josh ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
"Josh Berkus" <josh@agliodbs.com> writes: > Hmmm ... does this mean that I couldn't divide '1 year' by '1 week'? That's exactly what it says. regards, tom lane
Tom, > > Hmmm ... does this mean that I couldn't divide '1 year' by '1 > week'? > > That's exactly what it says. If that's the case, we'd need to create some sort of function to specify the time unit to output timestamp operation into: to_weeks(current_timestamp - hire_date) / '2 weeks' ... otherwise division and multiplication operators for time values don't do us much good, as we'd be forced to integer-ize all intervals before we can perform any operations on them at all. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete informationtechnology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Josh, Thanks for your explanation. I'd like to get hold of a copy of SQL99/PKG001 to see what they have actually defined. I think the INTERVAL type sux :-) Long rant follows - consider hitting the delete button now. In this area, there are a number of different concepts that it would make sense to separate. Let's call one UTIME. It is time defined as (the time-coordinate of) an event in the history of the Universe. [We'll ignoreEinstein for the following.] It has physical meaning. Let call another LDATE. It is what you and I normally call a date. Specifically, it is a legal (or social or religious)representation of a UTIME. You can make a contract (legal with man, or religiously with God) using LDATEs. Let's invent a CALENDAR. It translates between UTIME and LDATE. It changes all the time ;-( Well, maybe not *all* thetime but it is sufficient unstable to be a problem for some applications. There is not only the problem of Julian vsGregorian vs some other calendar. Consider the issue of changing the rules governing summer-time. It is not long agothat Britain changed the rule for ending summer time from being the fourth Sunday in October to being the last Sundayin October. Some countries seem to decide on a year to year basis if they want to have summer time. Finally, theinternational committee that governs the CALENDAR only have to give eight weeks notice when it introduces a leap-second. (Leap seconds are the reason that struct tm.tm_sec has a range of 0,...,61.) This means that the CALENDAR is, strictly speaking, only known eight weeks in advance. It also highlights the differencebetween LDATE and UTIME: if we have a contract for me to start a machine on a specific LDATE and a leap secondis introduced between now and then, then I'd better change that sleep(n) statement in my control program to sleep(n+1)or the factory will start too early. I once spent an unhappy week debugging a problem related to this :-( It really would have started the factory one hour toolate. Now of course you can define deltas. Unix systems kind of keep UTIME using a delta: a variable of type time_t holds thenumber of seconds since a specific event (defined as a given LDATE for a fixed CALENDAR). Let's call them DUTIME andDLDATE. They are conceptually different: One day of DLDATE may be 23, 24, or 25 hours of DUTIME, depending on summer time rules. It may be 86,400 or 86,401 (rarely: 86,402) seconds depending on leap seconds. Important: There is no meaningful translation between DUTIME and DLDATE except for a fixed (start- or end-) UTIME (or DLDATE). (No, really!!) Let's introduce a final concept: a unit of UTIME. Maybe we call it TIMEU. You need to measure UTIME is something, maybeseconds is the fundamental unit (as in SI) and you have other, derived units. We don't need a similar concept for LDATE - it is effectively provided by the CALENDAR. It is it that which defines years,months, or whatever the Incas used to use... Now BACK ON-TOPIC: What's an INTERVAL supposed to be? DUTIME, DLDATE, TIMEU or something that really belongs to the CALENDAR? All of the above? Methinks SQL99 (or PostgreSQL's implementation) is going for the latter option. Methinks it is confusing. But then, I'mthick. :-) Apologies for the interruption -- We now continue the regular scheduled program.... Allan.