Timestamp/Interval proposals: Part 2 - Mailing list pgsql-hackers

From Josh Berkus
Subject Timestamp/Interval proposals: Part 2
Date
Msg-id 200206071634.34385.josh@agliodbs.com
Whole thread Raw
List pgsql-hackers
Developers,

Here's part to of my proposal to enhance, improve, and fix Timestamp and
Interval in PostgreSQL.   Part I is included after Part II in case everyone
has forgotten it.

Please give me feedback on this.   My interest is that I develop calendaring
apps based on Postgresql, and the current Timestamp + Interval limitations
and wierdnesses are giving me tsuris.    Thus I'm not particularly attached
to the specifics of my proposals, so long as we do *something* to fix the
issues.

Part II

Interval
-----------------------------------

There are a few problems currently with the Interval data type.  The biggest
is that the current rules give us no clear path for implementation of a full
set of operators.  The SQL92 standard is no help here; its implementation is
unintuitive and extremely limited ... more limited, in fact, than the current
incomplete implementation in PostgreSQL.


Proposal #3:  We should support the addition of "whole days".

Description: Interval should support a "Weeks to Days" increment which is
atomic per day, and not as a aggregate of hours.

Reason: Currently, the "days" increment in Interval is treated as "x 24 hours"
and not as whole days.  This can cause some confusion when date calculations
break over a DST change; users do *not* expect events to get an hour earlier
or later in the fall or the spring.  The current result is that a lot of
users give up on utilizing time zones because they can't deal with the time
shift in calendar applications.


Proposal #4:  Create to_char(INTERVAL, 'format string') Function.

Description: We could really use a built-in function that supports output
formatting of Intervals.

Reason:  self-evident, I think.


Proposal #5:  Two alternate proposals for overhaul of the interval data type.

Description:  Interval needs some radical changes to calculations and
operators.

Reason:  Currently, it is nearly impossible to conceive of an implementation
for a full set of operators for the interval data type ( + - / * ) because of
the variability of conversions from one interval increment to another.  For
example, what exactly should be the result of '3 months' / '4 days'?  Here
are two alternatives.

Alternative #1: Treat Interval Increments as Atomic, and Round

If we implemented this, each of the 3 sub-types of Interval (Year to Month,
Week to Day, and Hour to Millesecond per proposal #3) would be treated as
"atomic" and not renderable in terms of smaller increments, in the same way
that integers are not divisible beyond a prime.  In fact, rather than
expressing remainders in smaller increments, the modulo ( % ) operator would
be used to express the remainder.

Further, we would need to create a set of casting functions that allows for
the conversion of one interval subtype into another, using rounding by
approximates, such as 1 year = 365 days, 1 month = 30 days, 1 day = 24 hours,
etc.  This is not that different from how the DATE data type works.  If users
attempt multiplication and division with intervals of different subtypes, an
implicit cast would be made into the subtype of the smallest value.

Finally, multiplication and division by floats would be disallowed and
replaced by multiplication and division by integers.  Thus:

'1 month' + '33 days' = '1 month 33 days'
'1 month 33 days'::INTERVAL WEEK TO DAY = '63 days'
'1 month' + '33 days'::INTERVAL YEAR TO MONTH = '2 months'
'5 months' / '2 months' = 2
'5 months' % '2 months' = '1 month'
'5 months' / 2 = '2 months'
'5 months' % 2 = '1 month'
'9 months' / '2 weeks' = '270 days' / '14 days' = 19
'15 hours' * 20 = '300 hours' (not '12 days 12 hours')
etc.

Pros:  It's simple and relatively intuitive.   This approach also is similar
to the SQL92 spec, which focuses on interval subtypes.
Cons:  It requires an annoying implementation of subtypes, which is cumbersome
and difficult to manage when you have mixed intervals (e.g. '4 days 8 hours 9
minutes').  And, with every operation, rounding is being used which can
result in some ghastly inequalities:
'1 year'/12 --> '1 month'::INTERVAL WEEK TO DAY --> '30 days' * 12
--> '360 days' / '1 year' = 0


Alternative #2: Tie Intervals to a Specific Timestamp

This is the most robust interval implementation I can imagine.  The basic idea
is this:  instead of intervals being an "absolute" value, they would be
rooted in a specific timestamp.  For example, rather than:INTERVAL '45 Days'
We would use:INTERVAL '2002-03-30 +45 days'
This would allow us to ground our intervals in the real calendar, and any
subtype conversion problems could be eliminated by resorting to the calendar.
We would know, for example, that:'2002-05-30 +2 months' /  '2002-05-30 +2 weeks' = 4.35714...
and even that'2002-05-30 +2 months' / 14 = '2002-05-30 +4 days 8 hours 34 min 17 sec ...'

For simplicity, users would be allowed to use intervals which did not state a
start date.  In this case, the start date would be assumed to be a default
start date, such as '2000-01-01 00:00:00'.  Also, start dates could be
assumed from timestamp math:

'2002-07-30' - '2002-05-30' = '2002-07-30 -61 days'

Of course, this does not get us entirely away from subtyping.  For example, if
we did arithmatic with disparate dates, increments would have to be applied
per subtype.  That is:

'2002-05-30 +61 days' = '2002-05-30 +2 months'
but '2002-05-30 +2 months' + '2002-01-28' = '2002-01-28 +2 months' < '2002-01-28 +61 days'

Also, interval to interval math would no longer be commutative, becuase we
would need to use the start date of the first interval in the case of
disparate start dates:

'2002-05-30 + 61 days' + '2002-01-28 +59 days' = '2002-05-30 +120 days' < '2002-05-30 + 4 months'
even though '2002-05-30 + 61 days' = '2002-05-30 + 2 months'
and '2002-01-28 +59 days' = '2002-01-28 +2 months'

Pros: The most accurate interval calculations possible.
Cons:  How the heck would we implement it?  And *explain* it?  And it's pretty
darn far from the SQL92 implementation.

---------------------------------------------
And, a re-hash of Part I:

PROPOSAL FOR ADJUSTMENTS OF POSTGRESQL TIMESTAMP AND INTERVAL HANDLING
Draft 0.2

Timestamp
------------------------------
Proposal #1:  TIMESTAMP WITHOUT TIME ZONE as default

Description:  Currently, the data type invoked when users select TIMESTAMP is
TIMESTAMP WITH TIME ZONE.  We should change this so that TIMESTAMP defaults to
TIMESTAMP WITHOUT TIME ZONE unless WITH TIME ZONE is specificied.

Reason:  Handling time zones is tricky and non-intuitive for the beginning
user.  TIMESTAMP WITH TIME ZONE should be reserved for DBAs who know what
they're doing.

Resolution:  Taken care of in 7.3.


Proposal #2: We need more time zones.

Description:  We need to add, or be able to add, many new time zones to
Postgresql.  Ideal would be some kind of "create time zone" statement.

Reason:  Current included time zones do not cover all real-world time zones,
and the situation is likely to get worse as various governments play with
their calendars.  For example, there is no current time zone which would be
appropriate for the state of Arizona, i.e. "Central Standard Time without
Daylight Savings Time".

Further:  A CREATE TIME ZONE statement would have the following syntax:
CREATE TIME ZONE GMT_adjustment, abbreviation, uses_DST, DST_starts
(optional),
DST_ends (optional)
This would allow, to some degree, DBA creation of time zones to take into
account local laws and wierdnesses.

Alternative:  We can allow users to designate timezones according to GMT
offset and whether or not they support DST. Example "-8:00 DST" for PST/PDT,
and "-7:00 NDS" for the Arizona example above.


--
-Josh BerkusTechdocs Writer



pgsql-hackers by date:

Previous
From: Rachit Siamwalla
Date:
Subject: Question whether this is a known problem in 7.1.2
Next
From: Bruce Momjian
Date:
Subject: Internals question about buffers