Thread: tzcode update

tzcode update

From
"Heikki Linnakangas"
Date:
We included the public domain timezone library by Arthur David Olson
back in 2004 into our source tree, but we haven't kept it up to date
with the upstream changes since.

We've made a number of small changes to our version of the library,
including:
- formatting changes, mostly thanks to pgindent
- widened time_t to 8 bytes
- we only include the parts of the library that we need

which means that we can't just take the new library version and drop it
in the src/timezone directory. We can debate whether those changes were
a good idea or not, given that they make the merging harder, but my take
is that it's not that bad given how seldom and little the upstream code
changes.

I was not able to find anything like release notes that would list the
differences between tzcode2003e, which I believe is the version that we
included back then, and the latest version tzcode2007k. So I just took a
diff between those, and deduced from there what has changed.

The main difference between those version seems to be support for 64-bit
time_t, including
   - extended data file format, with 64-bit time values
   - extrapolation of DST transitions in a 400 year cycle, for values
not directly covered by the transition table.

In addition to that, they've added "public domain" notice to the top of
ialloc.c, and some other cosmetic changes. We already had such a notice
in our version of the file, but the original did not.

I merged the upstream changes, mostly manually, trying to be faithful to
the original diff to make future merges easier. But I also made some
whitespace/comment changes, like we've done before to our version of the
code: no double-stars in comments, braces on lines of their own.
Attached is the resulting patch against Postgres CVS HEAD.

In addition to manually merging the patch, I had to teach
pg_next_dst_boundary how to extrapolate the DST transitions. The code
there is copy-pasted from localsub, so I copy-pasted that change from
there as well. Also, they now use a binary search instead of linear
search in localsub, and I copied that change to pg_next_dst_bounday as
well (that is why I started looking at this in the first place,
http://archives.postgresql.org/pgsql-patches/2007-09/msg00291.php)

I don't really know how to test this. It passes the regression tests,
after the fixes to pg_dst_next_boundary, but I was expecting there to be
some failures now that we support timezones for timestamps outside the
32-bit time_t range. Apparently our tests don't cover that?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: tzcode update

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> I was not able to find anything like release notes that would list the
> differences between tzcode2003e, which I believe is the version that we
> included back then, and the latest version tzcode2007k. So I just took a
> diff between those, and deduced from there what has changed.

Oh good, this has been on my to-do list for awhile ... but I'm happy
to let you do it ;-)

> I don't really know how to test this. It passes the regression tests,
> after the fixes to pg_dst_next_boundary, but I was expecting there to be
> some failures now that we support timezones for timestamps outside the
> 32-bit time_t range. Apparently our tests don't cover that?

Unless the 64-bit extension changed the semantics a lot more than I
think, any given compiled tzdata file covers only a finite range of
years.  The extension makes it *possible* for the data to extend outside
the time_t range, but you won't actually see a difference in behavior
unless (a) you do extend the range (what's zic's default now?) and
(b) you test a date falling within the extended range.  So I'm not
too surprised that there are no cases in the regression tests that
notice.  We should probably add some reaching out to 2100 or so.

            regards, tom lane

Re: tzcode update

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> I was not able to find anything like release notes that would list the
>> differences between tzcode2003e, which I believe is the version that we
>> included back then, and the latest version tzcode2007k. So I just took a
>> diff between those, and deduced from there what has changed.
>
> Oh good, this has been on my to-do list for awhile ... but I'm happy
> to let you do it ;-)
>
>> I don't really know how to test this. It passes the regression tests,
>> after the fixes to pg_dst_next_boundary, but I was expecting there to be
>> some failures now that we support timezones for timestamps outside the
>> 32-bit time_t range. Apparently our tests don't cover that?
>
> Unless the 64-bit extension changed the semantics a lot more than I
> think, any given compiled tzdata file covers only a finite range of
> years.  The extension makes it *possible* for the data to extend outside
> the time_t range, but you won't actually see a difference in behavior
> unless (a) you do extend the range (what's zic's default now?) and
> (b) you test a date falling within the extended range.  So I'm not
> too surprised that there are no cases in the regression tests that
> notice.  We should probably add some reaching out to 2100 or so.

zic does extend the range by default now, as witnessed by:

postgres=# set timezone = 'Europe/London';
SET
postgres=# SELECT '2090-07-01 15:00:00'::timestamptz;
       timestamptz
------------------------
  2090-07-01 15:00:00+01
(1 row)

Their new format is best described by this mail by Arhur David Olson
(the author of the library):

http://article.gmane.org/gmane.comp.time.tz/474/match=64+bit

> The name of the game would be to produce binary files with:
>     1. data with 32-bit time values through 2037 (for use by old
> systems)
>     2. data with 64-bit time values for "historic" years
>     3. a newline enclosed, POSIX-conforming
> TZ-environment-variable-style string
>        telling what to do in years beyond those covered by 2 above.
>
> There are a few places (Cairo, Godthab, Chile) that can't be represented by
> POSIX-conforming strings; for these we'd punt to the "write 400 years worth
> of data and work modulo 400" approach (leaving off the trailing TZ string).

I'll add some tests to cover timestamps > 2038.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: tzcode update

From
"Heikki Linnakangas"
Date:
Heikki Linnakangas wrote:
> I'll add some tests to cover timestamps > 2038.

Attached is a new patch, with a couple of new regression tests. No other
changes.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: tzcode update

From
"Heikki Linnakangas"
Date:
I just noticed that I had accidentally reverted this change in the patch:

>     /*
>      * Note: the point of adding 4800 is to ensure we make the same
>      * assumptions as Postgres' Julian-date routines about the placement of
>      * leap years in centuries BC, at least back to 4713BC which is as far as
>      * we'll go. This is effectively extending Gregorian timekeeping into
>      * pre-Gregorian centuries, which is a tad bogus but it conforms to the
>      * SQL spec...
>      */
> #define LEAPS_THRU_END_OF(y)    (((y) + 4800) / 4 - ((y) + 4800) / 100 + ((y) + 4800) / 400)

vs original in tzcode2003e:

> #define LEAPS_THRU_END_OF(y)    ((y) / 4 - (y) / 100 + (y) / 400)

Looking closer, I don't understand how that change was supposed to do
anything. If I'm doing my math right, our +4800 version of the code can
be written as: "y/4 - y/100 - y/400 + 1164". The only place this macro
is used is this:

>         days -= ((int64) (newy - y)) * DAYSPERNYEAR +
>             LEAPS_THRU_END_OF(newy - 1) -
>             LEAPS_THRU_END_OF(y - 1);

AFAICS, the constant " + 1164" is always cancelled out, making the
exercise of adding 4800 a waste of time.

Am I missing something?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: tzcode update

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Looking closer, I don't understand how that change was supposed to do
> anything.

The point of that patch is to avoid an off-by-one result for years BC.
The direction of rounding in integer division with a negative numerator
is undefined in C (or at least used to be --- did C99 tighten this up?).

            regards, tom lane

Re: tzcode update

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
>> Looking closer, I don't understand how that change was supposed to do
>> anything.
>
> The point of that patch is to avoid an off-by-one result for years BC.
> The direction of rounding in integer division with a negative numerator
> is undefined in C (or at least used to be --- did C99 tighten this up?).

Oh, I see. In that case we're good. The corresponding new code in tzcode
  actually looks like this:

> ! static int
> ! leaps_thru_end_of(const int y)
> ! {
> !       return (y >= 0) ? (y / 4 - y / 100 + y / 400) :
> !               -(leaps_thru_end_of(-(y + 1)) + 1);
> ! }


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: tzcode update

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Heikki Linnakangas wrote:
>> I'll add some tests to cover timestamps > 2038.

> Attached is a new patch, with a couple of new regression tests. No other
> changes.

Applied with minor revisions --- I found a couple of portability
problems while testing here.  int64_fast_t is already defined in
some system headers and causes a conflict, so I just made it be int64
instead.  Also there were undefined references to INT32_MIN and
INT32_MAX, which could possibly have been fixed with more #include's,
but I thought replacing them with a cast-based overflow check was at
least as good.

            regards, tom lane

Re: tzcode update

From
Bjorn Munch
Date:
On 13/02 10.51, Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> >I'll add some tests to cover timestamps > 2038.
>
> Attached is a new patch, with a couple of new regression tests. No other
> changes.

Ouch!  This fails on our Solaris builds, because we build with the
Solaris timezone files.  And these apparently don't work beyond 2038
and don't know that Finland plans to have DST also in the summer of
2050...

I haven't studied this in depth but I'm afraid the answer is "fix your
own timezone files"?

--
Bjorn Munch
Database Technology Group, Sun Microsystems
Trondheim, Norway



Re: tzcode update

From
Magnus Hagander
Date:
On Mon, Feb 18, 2008 at 04:32:58PM +0100, Bjorn Munch wrote:
> On 13/02 10.51, Heikki Linnakangas wrote:
> > Heikki Linnakangas wrote:
> > >I'll add some tests to cover timestamps > 2038.
> >
> > Attached is a new patch, with a couple of new regression tests. No other
> > changes.
>
> Ouch!  This fails on our Solaris builds, because we build with the
> Solaris timezone files.  And these apparently don't work beyond 2038
> and don't know that Finland plans to have DST also in the summer of
> 2050...
>
> I haven't studied this in depth but I'm afraid the answer is "fix your
> own timezone files"?

Or use the ones shipping with pg ;-)

This sounds exactly like the problem scenario brought forward when the
patch to use OS timezonedata was added. But I'd assume there are plans for
Solaris to include this data as well in the future, no? You'll have to
eventually....

//Magnus

Re: tzcode update

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Feb 18, 2008 at 04:32:58PM +0100, Bjorn Munch wrote:
>> Ouch!  This fails on our Solaris builds, because we build with the
>> Solaris timezone files.  And these apparently don't work beyond 2038
>> and don't know that Finland plans to have DST also in the summer of
>> 2050...
>>
>> I haven't studied this in depth but I'm afraid the answer is "fix your
>> own timezone files"?

> Or use the ones shipping with pg ;-)

Yeah.  I included the far-future cases in the committed patch
specifically to make it obvious if you were using tz files that lacked
post-2038 support.

I can't imagine that fixing this isn't pretty darn high on the Solaris
to-do list, anyway.  Financial apps doing, say, 30-year mortgage
projections are broken *today* on platforms without post-Y2038 calendar
support.

            regards, tom lane

Re: tzcode update

From
Bjorn Munch
Date:
On 18/02 14.22, Tom Lane wrote:
> I can't imagine that fixing this isn't pretty darn high on the Solaris
> to-do list, anyway.  Financial apps doing, say, 30-year mortgage
> projections are broken *today* on platforms without post-Y2038 calendar
> support.

Well, it IS mentioned in the BUGS section of at least one man page
(localtime), so it's clearly a known problem.

I don't know about 30-year mortgage, I don't think such financial apps
care whether the day 30 days into the future has DST or not. :-)  And
apart from that, the date was correct.

- Bjorn Munch

Re: tzcode update

From
"Heikki Linnakangas"
Date:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Mon, Feb 18, 2008 at 04:32:58PM +0100, Bjorn Munch wrote:
>>> Ouch!  This fails on our Solaris builds, because we build with the
>>> Solaris timezone files.  And these apparently don't work beyond 2038
>>> and don't know that Finland plans to have DST also in the summer of
>>> 2050...
>>>
>>> I haven't studied this in depth but I'm afraid the answer is "fix your
>>> own timezone files"?
>
>> Or use the ones shipping with pg ;-)
>
> Yeah.  I included the far-future cases in the committed patch
> specifically to make it obvious if you were using tz files that lacked
> post-2038 support.

Agreed, we want to give people a notice.

What we could do is to add comments to the expected output file, that
would show up in regression.diffs, that explain what the failure means.

> I can't imagine that fixing this isn't pretty darn high on the Solaris
> to-do list, anyway.

Yep, and PostgreSQL 8.4 won't be released for quite some time.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com