Thread: A proposal for Interval Math

A proposal for Interval Math

From
Josh Berkus
Date:
Developers:

What follows is a proposal for a spec for PostgreSQL handling of Interval
multiplication and divsion.  I freely admit that I can't do much to help
implement this spec (my C skills are infantile).  However, I think it would
be helpful to at least have a roadmap for eventual Interval map support.

In theory, all of the following calculations should be possible using Postgres
data types:
INTERVAL * NUMBER
INTERVAL / NUMBER
INTERVAL / INTERVAL
However, this has yet to be implemented due partly to the difficult definition
of the above in the SQL92 standard, in which intervals should be divisible by
floats, and the data type should be "sub-typed", with multiplication and
division being possible only for like interval units.
As far as I know, no RDBMS has been able to implement the full SQL92 standard
for Intervals.  Were I on the committee, I would argue that it is
conceptually improbable, and not particularly useful.  Are they accepting
applicants?  :-)

Therefore, I am going to argue for a slightly altered implementation, with
adjustments intended to overcome the essential atomic nature of date units
(i.e. the fact that "0.4 Months" cannot be accurately measured in smaller
interval units).

My proposal is this:  that Intervals should be treated in the same way that
Integer math is handled: with automatic rounding and remainders.  Further,
Intervals would be multiplied only by Integers, not by floats or numeric
values.  This solves a lot of the problems of date unit division by simply
refusing to perform it.

Thus:
INTERVAL / INTEGER = INTERVAL + ( INTERVAL % INTEGER )
INTERVAL * INTEGER = INTERVAL
INTERVAL / INTERVAL = INTEGER + ( INTERVAL % INTERVAL )

In other words:
'7 months' / 2 = '3 months'
'7 months' % 2 = '1 month'
'7 months' * 2 = '14 months'
'7 months' / '3 months' = 2
'7 months' % '3 months' = 1

Ah, you ask, but what about '7 months' / '3 days'?  This requires the addtion
of "Interval Rounding" to the Postgresql system.    Any division of unlike
Interval increments would require Interval Rounding to the smallest interval
unit.  It would be understood that such rounding carries inherent
inaccuracies, in the the same way that Float rounding does.  In a "rounded"
Interval value, all of the following would be true:
1 minute = 60 seconds
1 hour = 60 minutes
1 day = 24 hours
1 week = 7 days
1 month = 30 days
1 year = 365 days = 12 months = 52 weeks

Thus:
'7 months' / '3 days' = 70
... even though this is frequently off by +/- 1 on real calendars.

Of course, this would require a builtin function interval_round(INTERVAL,
Interval Unit).  Also, the Interval data type would have to be able to hold
counts of interval units higher than the threshold for the next unit
increment, i.e. '345 hours' would have to be a valid INTERVAL on its own
without being automatically rounded into days + hours by the system.

Comments, please?

--
-Josh Berkus



Re: A proposal for Interval Math

From
Oliver Elphick
Date:
On Sun, 2002-05-19 at 21:36, Josh Berkus wrote:

> Ah, you ask, but what about '7 months' / '3 days'?  This requires the addtion
> of "Interval Rounding" to the Postgresql system.

As I understand the standard, this is an illegal operation, because the
two intervals are of incompatible types and a meaningful result is not
possible.  I do not think anyone should try to implement it.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "But as it is written, Eye hath not seen, nor ear      heard, neither have entered into the heart of man, the
thingswhich God hath prepared for them that love      him."      I Corinthians 2:9  

Re: A proposal for Interval Math

From
Josh Berkus
Date:
Oliver,

> As I understand the standard, this is an illegal operation, because the
> two intervals are of incompatible types and a meaningful result is not
> possible.  I do not think anyone should try to implement it.

The reason I propose it is that regardless of the standard, PostgreSQL users
need to, for example, calculate the number of 2-week intervals in a given
interval column all the time.   Right now, everybody is using the Date data
type and integers, which has some significant shortcomings.

I'm  a big proponent of standards, and you will find me on this and other
forums pushing sticking to the standard as a #1 priority.   However, in this
case, the standard leaves DBAs with no alternative.  SQL92 neither allows me
to divide unlike interval units, nor to convert unlike interval units into
like units.  Either I'm misreading it, or the ANSI committee really painted
us into a corner on this one.

I would be quite positive to an argument that elimintates the second
restriction but not the first, for example:
'7 months' / '3 days' = ERROR,   but
interval_round('7 months','days') / '3 days' = 70

However, I'll continue pushing for some solution that allows me to calculate
pay periods in real intervals.

--
-Josh Berkus



Re: A proposal for Interval Math

From
Oliver Elphick
Date:
On Sun, 2002-05-19 at 22:21, Josh Berkus wrote:

> However, I'll continue pushing for some solution that allows me to calculate
> pay periods in real intervals.

But this is a matter for your local business rules.  If someone is paid
monthly and leaves one week into the month, there must be a local rule
to determine what proportion of his monthly pay he gets - is it always
7/28 or 7/30 or is it 7/{28,29,30,31} according to which month it is?
You are proposing a mathematical operation that looks exact but actually
has a fuzzy result and the reason it is fuzzy is that you are trying to
shortcut the process of deciding what rule to apply.

You mentioned calculating the number of 2-week intervals in an interval
column: if you need to do this, the column should not be defined as YEAR
TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented these
subtypes of INTERVAL).  If you *have* to do it with a YEAR TO MONTH
column, you have implicitly defined a local rule that averages out the
difference in month lengths, because in a YEAR-MONTH interval, you don't
know how long a year or a month is.  You gave the example
 interval_round('7 months','days') / '3 days' = 70

which seems to assume that a month is 30 days.  On a different basis the
result would be 365.2425 * 7 / 12 / 3  = 71.019375 or 365 * 7 / 12 / 3
= 70.972222.  Both of those round to 71 days.  Your 7 months interval is
anything from 209 to 213 days depending on which months it includes,
which is unknowable without a context.

So if you define interval_round() you must also include some means of
configuring what rule to use, because that is not necessarily going to
be the same in every organisation.  It might not even be the same
between different applications in one organisation.
--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "For all that is in the world, the lust of the flesh,      and the lust of the eyes, and the pride of life, is
notof the Father, but is of the world."                                   I John 2:17 

Re: A proposal for Interval Math

From
"Josh Berkus"
Date:
Oliver,

> You are proposing a mathematical operation that looks exact but
> actually
> has a fuzzy result and the reason it is fuzzy is that you are trying
> to
> shortcut the process of deciding what rule to apply.

But it is a fuzzy process.  Three months is not the same number of days
from January-March as it is from June-August.    For that matter, March
7th 2002 has 25 hours in California, but only 24 hours in Arizona.While an ideal database would allow accounting for
allof these rules,
 
it is practically impossible.

> You mentioned calculating the number of 2-week intervals in an
> interval
> column: if you need to do this, the column should not be defined as
> YEAR
> TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented
> these
> subtypes of INTERVAL).  If you *have* to do it with a YEAR TO MONTH
> column, you have implicitly defined a local rule that averages out
> the
> difference in month lengths, because in a YEAR-MONTH interval, you
> don't
> know how long a year or a month is.  

Precisely.  You have to estimate.  And the database system should allow
for either implicit or explicit estimation.

>You gave the example
> 
>   interval_round('7 months','days') / '3 days' = 70
> 
> which seems to assume that a month is 30 days.  On a different basis
> the
> result would be 365.2425 * 7 / 12 / 3  = 71.019375 or 365 * 7 / 12 /
> 3 
> = 70.972222.  Both of those round to 71 days.  Your 7 months interval
> is
> anything from 209 to 213 days depending on which months it includes,
> which is unknowable without a context.

Absolutely.

> So if you define interval_round() you must also include some means of
> configuring what rule to use, because that is not necessarily going
> to
> be the same in every organisation.  It might not even be the same
> between different applications in one organisation.

This makes sense but it would be harder to implement.  I suppose it all
depends on what programmmers have time, yes?  I take it you're
proposing:
interval_round(interval_value, interval_unit, interval_rule)  ?

Given that you seem to have given this some thought as well, I'd like
to hear your ideas on a workable solution for interval math.  Also, I
notice that you did not comment on my proposal that we allow interval
multiplication only by integers.  Does that mean you agree?

-Josh Berkus


Re: A proposal for Interval Math

From
Oliver Elphick
Date:
On Mon, 2002-05-20 at 17:03, Josh Berkus wrote:
> Oliver,
> But it is a fuzzy process.  Three months is not the same number of days
> from January-March as it is from June-August.    For that matter, March
> 7th 2002 has 25 hours in California, but only 24 hours in Arizona.
>  While an ideal database would allow accounting for all of these rules,
> it is practically impossible.
>
> > You mentioned calculating the number of 2-week intervals in an
> > interval
> > column: if you need to do this, the column should not be defined as
> > YEAR
> > TO MONTH but as DAY (of course, PostgreSQL hasn't yet implemented
> > these
> > subtypes of INTERVAL).  If you *have* to do it with a YEAR TO MONTH
> > column, you have implicitly defined a local rule that averages out
> > the
> > difference in month lengths, because in a YEAR-MONTH interval, you
> > don't
> > know how long a year or a month is.
>
> Precisely.  You have to estimate.  And the database system should allow
> for either implicit or explicit estimation.

I don't think I am convinced that this needs to be provided by the
database.  If you say that your application just has to divide frogs by
oranges (naturally the result is going to be somewhat fuzzy) I might
request further justification!  I think I am right in being suspicious
of any operation on fuzzy data that purports to give a firm result.  Why
do we have to do it this way? doesn't the fact that you want to do this
demonstrate that there is something wrong with the database design or
with the business rules?

...
> > So if you define interval_round() you must also include some means of
> > configuring what rule to use, because that is not necessarily going
> > to
> > be the same in every organisation.  It might not even be the same
> > between different applications in one organisation.
>
> This makes sense but it would be harder to implement.  I suppose it all
> depends on what programmmers have time, yes?  I take it you're
> proposing:
> interval_round(interval_value, interval_unit, interval_rule)  ?

Or a SETtable parameter -- IF we go so far as to do this!

However, since I see we can do this:

junk=# select cast('7 months' as interval) / 3;                            ?column?
----------------2 mons 10 days
(1 row)

my objections are beside the point and we might as well do it..

> Given that you seem to have given this some thought as well, I'd like
> to hear your ideas on a workable solution for interval math.  Also, I
> notice that you did not comment on my proposal that we allow interval
> multiplication only by integers.  Does that mean you agree?

INTERVAL * {INTEGER|FLOAT|NUMERIC} would be meaningful; why restrict it
to integer?  For example, one might very well want INTERVAL * 1.5.  The
result of any of these would be INTERVAL of the same subtype as the
multiplied interval.  I agree that it doesn't make much sense to
multiply intervals together.

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
    "For all that is in the world, the lust of the flesh,      and the lust of the eyes, and the pride of life, is
notof the Father, but is of the world."                                   I John 2:17 

Re: A proposal for Interval Math

From
"Josh Berkus"
Date:
Oliver,

Thomas Lockhart, the Postgres Date/Time maintainer, has requested that
we move this discussion to PGSQL-HACKERS so that he can participate.You game?

-Josh Berkus