Timestamp/Interval proposals: Part 2 - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Timestamp/Interval proposals: Part 2 |
Date | |
Msg-id | 200206071634.34385.josh@agliodbs.com Whole thread Raw |
List | pgsql-hackers |
Developers, Here's part to of my proposal to enhance, improve, and fix Timestamp and Interval in PostgreSQL. Part I is included after Part II in case everyone has forgotten it. Please give me feedback on this. My interest is that I develop calendaring apps based on Postgresql, and the current Timestamp + Interval limitations and wierdnesses are giving me tsuris. Thus I'm not particularly attached to the specifics of my proposals, so long as we do *something* to fix the issues. Part II Interval ----------------------------------- There are a few problems currently with the Interval data type. The biggest is that the current rules give us no clear path for implementation of a full set of operators. The SQL92 standard is no help here; its implementation is unintuitive and extremely limited ... more limited, in fact, than the current incomplete implementation in PostgreSQL. Proposal #3: We should support the addition of "whole days". Description: Interval should support a "Weeks to Days" increment which is atomic per day, and not as a aggregate of hours. Reason: Currently, the "days" increment in Interval is treated as "x 24 hours" and not as whole days. This can cause some confusion when date calculations break over a DST change; users do *not* expect events to get an hour earlier or later in the fall or the spring. The current result is that a lot of users give up on utilizing time zones because they can't deal with the time shift in calendar applications. Proposal #4: Create to_char(INTERVAL, 'format string') Function. Description: We could really use a built-in function that supports output formatting of Intervals. Reason: self-evident, I think. Proposal #5: Two alternate proposals for overhaul of the interval data type. Description: Interval needs some radical changes to calculations and operators. Reason: Currently, it is nearly impossible to conceive of an implementation for a full set of operators for the interval data type ( + - / * ) because of the variability of conversions from one interval increment to another. For example, what exactly should be the result of '3 months' / '4 days'? Here are two alternatives. Alternative #1: Treat Interval Increments as Atomic, and Round If we implemented this, each of the 3 sub-types of Interval (Year to Month, Week to Day, and Hour to Millesecond per proposal #3) would be treated as "atomic" and not renderable in terms of smaller increments, in the same way that integers are not divisible beyond a prime. In fact, rather than expressing remainders in smaller increments, the modulo ( % ) operator would be used to express the remainder. Further, we would need to create a set of casting functions that allows for the conversion of one interval subtype into another, using rounding by approximates, such as 1 year = 365 days, 1 month = 30 days, 1 day = 24 hours, etc. This is not that different from how the DATE data type works. If users attempt multiplication and division with intervals of different subtypes, an implicit cast would be made into the subtype of the smallest value. Finally, multiplication and division by floats would be disallowed and replaced by multiplication and division by integers. Thus: '1 month' + '33 days' = '1 month 33 days' '1 month 33 days'::INTERVAL WEEK TO DAY = '63 days' '1 month' + '33 days'::INTERVAL YEAR TO MONTH = '2 months' '5 months' / '2 months' = 2 '5 months' % '2 months' = '1 month' '5 months' / 2 = '2 months' '5 months' % 2 = '1 month' '9 months' / '2 weeks' = '270 days' / '14 days' = 19 '15 hours' * 20 = '300 hours' (not '12 days 12 hours') etc. Pros: It's simple and relatively intuitive. This approach also is similar to the SQL92 spec, which focuses on interval subtypes. Cons: It requires an annoying implementation of subtypes, which is cumbersome and difficult to manage when you have mixed intervals (e.g. '4 days 8 hours 9 minutes'). And, with every operation, rounding is being used which can result in some ghastly inequalities: '1 year'/12 --> '1 month'::INTERVAL WEEK TO DAY --> '30 days' * 12 --> '360 days' / '1 year' = 0 Alternative #2: Tie Intervals to a Specific Timestamp This is the most robust interval implementation I can imagine. The basic idea is this: instead of intervals being an "absolute" value, they would be rooted in a specific timestamp. For example, rather than:INTERVAL '45 Days' We would use:INTERVAL '2002-03-30 +45 days' This would allow us to ground our intervals in the real calendar, and any subtype conversion problems could be eliminated by resorting to the calendar. We would know, for example, that:'2002-05-30 +2 months' / '2002-05-30 +2 weeks' = 4.35714... and even that'2002-05-30 +2 months' / 14 = '2002-05-30 +4 days 8 hours 34 min 17 sec ...' For simplicity, users would be allowed to use intervals which did not state a start date. In this case, the start date would be assumed to be a default start date, such as '2000-01-01 00:00:00'. Also, start dates could be assumed from timestamp math: '2002-07-30' - '2002-05-30' = '2002-07-30 -61 days' Of course, this does not get us entirely away from subtyping. For example, if we did arithmatic with disparate dates, increments would have to be applied per subtype. That is: '2002-05-30 +61 days' = '2002-05-30 +2 months' but '2002-05-30 +2 months' + '2002-01-28' = '2002-01-28 +2 months' < '2002-01-28 +61 days' Also, interval to interval math would no longer be commutative, becuase we would need to use the start date of the first interval in the case of disparate start dates: '2002-05-30 + 61 days' + '2002-01-28 +59 days' = '2002-05-30 +120 days' < '2002-05-30 + 4 months' even though '2002-05-30 + 61 days' = '2002-05-30 + 2 months' and '2002-01-28 +59 days' = '2002-01-28 +2 months' Pros: The most accurate interval calculations possible. Cons: How the heck would we implement it? And *explain* it? And it's pretty darn far from the SQL92 implementation. --------------------------------------------- And, a re-hash of Part I: PROPOSAL FOR ADJUSTMENTS OF POSTGRESQL TIMESTAMP AND INTERVAL HANDLING Draft 0.2 Timestamp ------------------------------ Proposal #1: TIMESTAMP WITHOUT TIME ZONE as default Description: Currently, the data type invoked when users select TIMESTAMP is TIMESTAMP WITH TIME ZONE. We should change this so that TIMESTAMP defaults to TIMESTAMP WITHOUT TIME ZONE unless WITH TIME ZONE is specificied. Reason: Handling time zones is tricky and non-intuitive for the beginning user. TIMESTAMP WITH TIME ZONE should be reserved for DBAs who know what they're doing. Resolution: Taken care of in 7.3. Proposal #2: We need more time zones. Description: We need to add, or be able to add, many new time zones to Postgresql. Ideal would be some kind of "create time zone" statement. Reason: Current included time zones do not cover all real-world time zones, and the situation is likely to get worse as various governments play with their calendars. For example, there is no current time zone which would be appropriate for the state of Arizona, i.e. "Central Standard Time without Daylight Savings Time". Further: A CREATE TIME ZONE statement would have the following syntax: CREATE TIME ZONE GMT_adjustment, abbreviation, uses_DST, DST_starts (optional), DST_ends (optional) This would allow, to some degree, DBA creation of time zones to take into account local laws and wierdnesses. Alternative: We can allow users to designate timezones according to GMT offset and whether or not they support DST. Example "-8:00 DST" for PST/PDT, and "-7:00 NDS" for the Arizona example above. -- -Josh BerkusTechdocs Writer
pgsql-hackers by date: