Thread: Fixing DetermineLocalTimeZone, this time for sure

Fixing DetermineLocalTimeZone, this time for sure

From
Tom Lane
Date:
I noticed today that we have once again managed to break PG's handling
of ambiguous timestamps during a DST backwards transition.  For example,
in EST5EDT time zone, '2004-10-31 01:30:00' is ambiguous.  The intention
of the code for a long time has been to treat such times as being local
standard time (so, EST in this case), but 7.4 and CVS tip get it wrong :-(
(I coulda sworn I tested this case last time we touched
DetermineLocalTimeZone; but anyway, what's done is done.)

If you feel like weighing in on what the behavior *should* be, please
add to the thread over in pgsql-general:
http://archives.postgresql.org/pgsql-general/2004-10/msg01546.php
What I want to talk about in this thread is implementation issues.

We have spent enough effort fruitlessly trying to make this code work,
first atop mktime() and then atop localtime(), to make me think it's
time for a fresh approach.  In particular, now that we have control of
our own timezone library, we should think about offering a new API that
would help out DetermineLocalTimeZone.

What I'm toying with is a function to determine the next DST transition
time, perhaps along the lines of

extern int pg_next_dst_boundary(const pg_time_t *timep,            long int *before_gmtoff,            int
*before_isdst,           pg_time_t *boundary,            long int *after_gmtoff,            int *after_isdst);
 

*timep is the input value, the other parameters are return values.

When the return value is 1, *boundary is set to the time_t
representation of the next DST transition time at or after *timep,
*before_gmtoff and *before_isdst are set to the GMT offset and isdst
state prevailing just before that boundary, and after_gmtoff and
after_isdst are set to the state prevailing just after that boundary.

When the return value is 0, there is no known DST transition at or
after *timep, but *before_gmtoff and *before_isdst indicate the GMT
offset and isdst state prevailing at *timep.  (This would occur in
DST-less time zones, for example.)

A return value of -1 could indicate failure (this would not actually
occur in our current implementation).

The way DetermineLocalTimeZone would use this is to compute a GMT-zone
time_t value from its input timestamp fields, subtract say 24 hours,
and call pg_next_dst_boundary.  Assuming no timezone has DST transitions
less than two days apart, this is guaranteed to find the closest
relevant DST transition.  If the result is zero, we're done:
before_gmtoff is the desired timezone offset.  If the result is one,
we can subtract before_gmtoff and after_gmtoff from the GMT-zone time_t
to determine two candidate actual timestamp interpretations of the
input.  If these are both less than the boundary, the desired zone is
before_gmtoff; if both greater, the desired zone is after_gmtoff;
otherwise we have an impossible or ambiguous timestamp input, which we
can resolve by choosing whichever gmtoff is standard time according to
the isdst flags.  (Or we can raise error, according to the suggestion
made in the other thread, or perhaps do something else --- we now have
enough information to implement any of several possible algorithms.)

Aside from giving us the opportunity to get the right answer reliably,
this should be substantially faster than our existing code, since
pg_next_dst_boundary is essentially just the first step of pg_localtime()
whereas the existing code requires several pg_localtime() calls plus
additional arithmetic.

Does this seem too ugly to anyone?  Do you have an idea for a cleaner
implementation?
        regards, tom lane