Re: Redhat 7.3 time manipulation bug - Mailing list pgsql-hackers

From cbbrowne@cbbrowne.com
Subject Re: Redhat 7.3 time manipulation bug
Date
Msg-id 20020525003724.B11FD35B0F@cbbrowne.com
Whole thread Raw
In response to Re: Redhat 7.3 time manipulation bug  (Thomas Lockhart <lockhart@fourpalms.org>)
List pgsql-hackers
> > > The last phase could be extending the API to allow multiple simultaneous
> > > time zones, detection of bad time zones, etc etc. This would involve API
> > > changes or extensions, and breaks compatibility with system-supplied
> > > infrastructure.
> > One thing that wasn't clear to me, but could use investigation: if so
> > many systems are using the same underlying timezone database info, maybe
> > there is some commonality at a level below the ISO mktime/tzset/etc API.
> > If we could make use of the system-provided TZ database at a lower level
> > while still using our own APIs not tied to time_t, it'd answer the issue
> > of compatibility with the surrounding system.  (Which is a real issue,
> > I agree --- we should be able to accept the system's standard TZ setting
> > if possible.)

> The fundamental problem (which of course can have a fundamental
> solution ;) is that a time zone database built with a 32-bit time_t
> will have time zone info through 2038 only (it is a binary file with
> 32-bit time fields -- almost certainly anyway). So if we have an
> extended time zone infrastructure using something different for time_t
> we would need to handle the case of reading non-extended time zones
> databases, which puts us back to having limitations.

Ah, but the database in question _doesn't_ consist of 32 bit time_t
values.

It consists of things like:

# @(#)zone.tab    1.26
#
# TZ zone descriptions
#
# From Paul Eggert <eggert@twinsun.com> (1996-08-05):
#
# This file contains a table with the following columns:
# 1.  ISO 3166 2-character country code.  See the file `iso3166.tab'.
# 2.  Latitude and longitude of the zone's principal location
#     in ISO 6709 sign-degrees-minutes-seconds format,
#     either +-DDMM+-DDDMM or +-DDMMSS+-DDDMMSS,
#     first latitude (+ is north), then longitude (+ is east).
# 3.  Zone name used in value of TZ environment variable.
# 4.  Comments; present if and only if the country has multiple rows.
#
# Columns are separated by a single tab.
# The table is sorted first by country, then an order within the country that
# (1) makes some geographical sense, and
# (2) puts the most populous zones first, where that does not contradict (1).
#
# Lines beginning with `#' are comments.
#
#country-
#code    coordinates    TZ            comments
AD    +4230+00131    Europe/Andorra
AE    +2518+05518    Asia/Dubai
AF    +3431+06912    Asia/Kabul
AG    +1703-06148    America/Antigua
AI    +1812-06304    America/Anguilla
AL    +4120+01950    Europe/Tirane
AM    +4011+04430    Asia/Yerevan
AN    +1211-06900    America/Curacao
AO    -0848+01314    Africa/Luanda

Then a "leapseconds" table, looking like:
# The correction (+ or -) is made at the given time, so lines
# will typically look like:
#    Leap    YEAR    MON    DAY    23:59:60    +    R/S
# or
#    Leap    YEAR    MON    DAY    23:59:59    -    R/S

# If the leapsecond is Rolling (R) the given time is local time
# If the leapsecond is Stationary (S) the given time is UTC

# Leap    YEAR    MONTH    DAY    HH:MM:SS    CORR    R/S
Leap    1972    Jun    30    23:59:60    +    S
Leap    1972    Dec    31    23:59:60    +    S
Leap    1973    Dec    31    23:59:60    +    S
Leap    1974    Dec    31    23:59:60    +    S
Leap    1975    Dec    31    23:59:60    +    S
Leap    1976    Dec    31    23:59:60    +    S

And then a set of rules about timezone adjustments for all sorts of
localities, including the following:

# Rule    NAME    FROM    TO    TYPE    IN    ON    AT    SAVE    LETTER/S
# Summer Time Act, 1916
Rule    GB-Eire    1916    only    -    May    21    2:00s    1:00    BST
Rule    GB-Eire    1916    only    -    Oct     1    2:00s    0    GMT
# S.R.&O. 1917, No. 358
Rule    GB-Eire    1917    only    -    Apr     8    2:00s    1:00    BST
Rule    GB-Eire    1917    only    -    Sep    17    2:00s    0    GMT


# Zone    NAME        GMTOFF    RULES    FORMAT    [UNTIL]
Zone Antarctica/Casey    0    -    zzz    1969        8:00    -    WST    # Western (Aus) Standard Time
Zone Antarctica/Davis    0    -    zzz    1957 Jan 13        7:00    -    DAVT    1964 Nov # Davis Time        0    -
zzz    1969 Feb        7:00    -    DAVT
 
Zone Antarctica/Mawson    0    -    zzz    1954 Feb 13        6:00    -    MAWT    # Mawson Time

> I'm guessing that a better approach might be to have our time zone
> stuff inside our own API, which then could choose to call, for
> example, mktime() or pg_mktime(), which could each have different
> signatures.  Then the heuristics for matching one to the other are
> isolated to our thin API implementation, not to the underlying system-
> or pg-provided libraries.

> matching "stringy time zones" to numeric offsets for input date/times.
> The time zone databases themselves don't lend themselves to this,
> since the tables have those stringy zones somewhere on the right hand
> side of each row of information and the fields can change from year to
> year.

The ultimate goal would seem likely to be to store dates internally in
some form like UTC, with some reasonably huge dynamic range, that is,
not limited to 32 bit timestamps, but rather using something like a
proleptic Gregorian calendar (per _Calendrical Calculations_, page 50).

Some reasonable treatments would include:
 - 32 bits is an signed int indicating number of days since GREG_EPOCH,   where logical epochs would include January 1,
1,January 1, 1900, or   perhaps even something actually proleptic (proleptic indicates   "future"), such as January 1,
2038.
 - 8 bits indicating the month; 8 bits indicating the day of month;   16 bits providing a range of years from -32767 to
32768.

Both have merits...

Timestamps would then forcibly expand things by _at least_ 22 bits, the
minimum needed to express 1/100ths of seconds.  Might as well head on to
32 bits for the time and so have something that can easily represent
values down to well below a millisecond.

The "stringy stuff" indicates how values are to be displayed or parsed.
It does nothing about what is stored internally, or at least shouldn't.
--
(reverse (concatenate 'string "gro.gultn@" "enworbbc"))
http://www.cbbrowne.com/info/emacs.html
In the name of the Lord-High mutant, we sacrifice this suburban girl
-- `Future Schlock'


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Redhat 7.3 time manipulation bug
Next
From: "Nigel J. Andrews"
Date:
Subject: Re: Redhat 7.3 time manipulation bug