Thread: A proposal for Interval Math
Developers: What follows is a proposal for a spec for PostgreSQL handling of Interval multiplication and divsion. I freely admit that I can't do much to help implement this spec (my C skills are infantile). However, I think it would be helpful to at least have a roadmap for eventual Interval map support. In theory, all of the following calculations should be possible using Postgres data types: INTERVAL * NUMBER INTERVAL / NUMBER INTERVAL / INTERVAL However, this has yet to be implemented due partly to the difficult definition of the above in the SQL92 standard, in which intervals should be divisible by floats, and the data type should be "sub-typed", with multiplication and division being possible only for like interval units. As far as I know, no RDBMS has been able to implement the full SQL92 standard for Intervals. Were I on the committee, I would argue that it is conceptually improbable, and not particularly useful. Are they accepting applicants? :-) Therefore, I am going to argue for a slightly altered implementation, with adjustments intended to overcome the essential atomic nature of date units (i.e. the fact that "0.4 Months" cannot be accurately measured in smaller interval units). My proposal is this: that Intervals should be treated in the same way that Integer math is handled: with automatic rounding and remainders. Further, Intervals would be multiplied only by Integers, not by floats or numeric values. This solves a lot of the problems of date unit division by simply refusing to perform it. Thus: INTERVAL / INTEGER = INTERVAL + ( INTERVAL % INTEGER ) INTERVAL * INTEGER = INTERVAL INTERVAL / INTERVAL = INTEGER + ( INTERVAL % INTERVAL ) In other words: '7 months' / 2 = '3 months' '7 months' % 2 = '1 month' '7 months' * 2 = '14 months' '7 months' / '3 months' = 2 '7 months' % '3 months' = 1 Ah, you ask, but what about '7 months' / '3 days'? This requires the addtion of "Interval Rounding" to the Postgresql system. Any division of unlike Interval increments would require Interval Rounding to the smallest interval unit. It would be understood that such rounding carries inherent inaccuracies, in the the same way that Float rounding does. In a "rounded" Interval value, all of the following would be true: 1 minute = 60 seconds 1 hour = 60 minutes 1 day = 24 hours 1 week = 7 days 1 month = 30 days 1 year = 365 days = 12 months = 52 weeks Thus: '7 months' / '3 days' = 70 ... even though this is frequently off by +/- 1 on real calendars. Of course, this would require a builtin function interval_round(INTERVAL, Interval Unit). Also, the Interval data type would have to be able to hold counts of interval units higher than the threshold for the next unit increment, i.e. '345 hours' would have to be a valid INTERVAL on its own without being automatically rounded into days + hours by the system. Comments, please? -- -Josh Berkus
On Sun, 2002-05-19 at 21:36, Josh Berkus wrote: > Ah, you ask, but what about '7 months' / '3 days'? This requires the addtion > of "Interval Rounding" to the Postgresql system. As I understand the standard, this is an illegal operation, because the two intervals are of incompatible types and a meaningful result is not possible. I do not think anyone should try to implement it. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "But as it is written, Eye hath not seen, nor ear heard, neither have entered into the heart of man, the thingswhich God hath prepared for them that love him." I Corinthians 2:9
Oliver, > As I understand the standard, this is an illegal operation, because the > two intervals are of incompatible types and a meaningful result is not > possible. I do not think anyone should try to implement it. The reason I propose it is that regardless of the standard, PostgreSQL users need to, for example, calculate the number of 2-week intervals in a given interval column all the time. Right now, everybody is using the Date data type and integers, which has some significant shortcomings. I'm a big proponent of standards, and you will find me on this and other forums pushing sticking to the standard as a #1 priority. However, in this case, the standard leaves DBAs with no alternative. SQL92 neither allows me to divide unlike interval units, nor to convert unlike interval units into like units. Either I'm misreading it, or the ANSI committee really painted us into a corner on this one. I would be quite positive to an argument that elimintates the second restriction but not the first, for example: '7 months' / '3 days' = ERROR, but interval_round('7 months','days') / '3 days' = 70 However, I'll continue pushing for some solution that allows me to calculate pay periods in real intervals. -- -Josh Berkus
On Sun, 2002-05-19 at 22:21, Josh Berkus wrote: > However, I'll continue pushing for some solution that allows me to calculate > pay periods in real intervals. But this is a matter for your local business rules. If someone is paid monthly and leaves one week into the month, there must be a local rule to determine what proportion of his monthly pay he gets - is it always 7/28 or 7/30 or is it 7/{28,29,30,31} according to which month it is? You are proposing a mathematical operation that looks exact but actually has a fuzzy result and the reason it is fuzzy is that you are trying to shortcut the process of deciding what rule to apply. You mentioned calculating the number of 2-week intervals in an interval column: if you need to do this, the column should not be defined as YEAR TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented these subtypes of INTERVAL). If you *have* to do it with a YEAR TO MONTH column, you have implicitly defined a local rule that averages out the difference in month lengths, because in a YEAR-MONTH interval, you don't know how long a year or a month is. You gave the example interval_round('7 months','days') / '3 days' = 70 which seems to assume that a month is 30 days. On a different basis the result would be 365.2425 * 7 / 12 / 3 = 71.019375 or 365 * 7 / 12 / 3 = 70.972222. Both of those round to 71 days. Your 7 months interval is anything from 209 to 213 days depending on which months it includes, which is unknowable without a context. So if you define interval_round() you must also include some means of configuring what rule to use, because that is not necessarily going to be the same in every organisation. It might not even be the same between different applications in one organisation. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "For all that is in the world, the lust of the flesh, and the lust of the eyes, and the pride of life, is notof the Father, but is of the world." I John 2:17
Oliver, > You are proposing a mathematical operation that looks exact but > actually > has a fuzzy result and the reason it is fuzzy is that you are trying > to > shortcut the process of deciding what rule to apply. But it is a fuzzy process. Three months is not the same number of days from January-March as it is from June-August. For that matter, March 7th 2002 has 25 hours in California, but only 24 hours in Arizona.While an ideal database would allow accounting for allof these rules, it is practically impossible. > You mentioned calculating the number of 2-week intervals in an > interval > column: if you need to do this, the column should not be defined as > YEAR > TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented > these > subtypes of INTERVAL). If you *have* to do it with a YEAR TO MONTH > column, you have implicitly defined a local rule that averages out > the > difference in month lengths, because in a YEAR-MONTH interval, you > don't > know how long a year or a month is. Precisely. You have to estimate. And the database system should allow for either implicit or explicit estimation. >You gave the example > > interval_round('7 months','days') / '3 days' = 70 > > which seems to assume that a month is 30 days. On a different basis > the > result would be 365.2425 * 7 / 12 / 3 = 71.019375 or 365 * 7 / 12 / > 3 > = 70.972222. Both of those round to 71 days. Your 7 months interval > is > anything from 209 to 213 days depending on which months it includes, > which is unknowable without a context. Absolutely. > So if you define interval_round() you must also include some means of > configuring what rule to use, because that is not necessarily going > to > be the same in every organisation. It might not even be the same > between different applications in one organisation. This makes sense but it would be harder to implement. I suppose it all depends on what programmmers have time, yes? I take it you're proposing: interval_round(interval_value, interval_unit, interval_rule) ? Given that you seem to have given this some thought as well, I'd like to hear your ideas on a workable solution for interval math. Also, I notice that you did not comment on my proposal that we allow interval multiplication only by integers. Does that mean you agree? -Josh Berkus
On Mon, 2002-05-20 at 17:03, Josh Berkus wrote: > Oliver, > But it is a fuzzy process. Three months is not the same number of days > from January-March as it is from June-August. For that matter, March > 7th 2002 has 25 hours in California, but only 24 hours in Arizona. > While an ideal database would allow accounting for all of these rules, > it is practically impossible. > > > You mentioned calculating the number of 2-week intervals in an > > interval > > column: if you need to do this, the column should not be defined as > > YEAR > > TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented > > these > > subtypes of INTERVAL). If you *have* to do it with a YEAR TO MONTH > > column, you have implicitly defined a local rule that averages out > > the > > difference in month lengths, because in a YEAR-MONTH interval, you > > don't > > know how long a year or a month is. > > Precisely. You have to estimate. And the database system should allow > for either implicit or explicit estimation. I don't think I am convinced that this needs to be provided by the database. If you say that your application just has to divide frogs by oranges (naturally the result is going to be somewhat fuzzy) I might request further justification! I think I am right in being suspicious of any operation on fuzzy data that purports to give a firm result. Why do we have to do it this way? doesn't the fact that you want to do this demonstrate that there is something wrong with the database design or with the business rules? ... > > So if you define interval_round() you must also include some means of > > configuring what rule to use, because that is not necessarily going > > to > > be the same in every organisation. It might not even be the same > > between different applications in one organisation. > > This makes sense but it would be harder to implement. I suppose it all > depends on what programmmers have time, yes? I take it you're > proposing: > interval_round(interval_value, interval_unit, interval_rule) ? Or a SETtable parameter -- IF we go so far as to do this! However, since I see we can do this: junk=# select cast('7 months' as interval) / 3; ?column? ----------------2 mons 10 days (1 row) my objections are beside the point and we might as well do it.. > Given that you seem to have given this some thought as well, I'd like > to hear your ideas on a workable solution for interval math. Also, I > notice that you did not comment on my proposal that we allow interval > multiplication only by integers. Does that mean you agree? INTERVAL * {INTEGER|FLOAT|NUMERIC} would be meaningful; why restrict it to integer? For example, one might very well want INTERVAL * 1.5. The result of any of these would be INTERVAL of the same subtype as the multiplied interval. I agree that it doesn't make much sense to multiply intervals together. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "For all that is in the world, the lust of the flesh, and the lust of the eyes, and the pride of life, is notof the Father, but is of the world." I John 2:17
Oliver, Thomas Lockhart, the Postgres Date/Time maintainer, has requested that we move this discussion to PGSQL-HACKERS so that he can participate.You game? -Josh Berkus