Thread: Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Arnold Mavromatis
Date:
Hi Stephan and everyone

We have tried PG 7.3.4 (the bug still presents itself)

These are the following results..

Welcome to psql 7.3.4, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

db2=> select cast(timestamptz '1901/12/13' as date);
    date
------------
 1901-12-13
(1 row)

db2=> select cast(timestamptz '1901/12/14' as date);
    date
------------
 1901-12-13
(1 row)

db2=> select cast(timestamptz '1901/12/15' as date);
    date
------------
 1901-12-15
(1 row)


Compiler switches..

./configure --prefix=/adamdb/postgres/pg_admin/pgsql_7.3.4 --enable-debug
--with-pgport=5433 --without-readline --without-zlib
--enable-integer-datetimes

Everything else was left at default settings

Readline and zlib were bypassed because they are not available.

HP-UX 11.11i

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Arnold Mavromatis wrote:

> Hi Stephan and everyone

Hmm, I just got my machine to give a similar failure mode with
a slightly wacky input.

sszabo=> select timestamptz '1901/12/13 23:59:59.99999';
        timestamptz
---------------------------
 1901-12-13 23:59:59.99999
(1 row)

sszabo=> select timestamptz '1901/12/13 23:59:59.999999999';
      timestamptz
------------------------
 1901-12-13 16:00:00-08
(1 row)

sszabo=> select timestamptz '1901/12/14 0:0:0';
      timestamptz
------------------------
 1901-12-14 00:00:00-08
(1 row)

This happens on both 7.3.4 and 7.4.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> Hmm, I just got my machine to give a similar failure mode with
> a slightly wacky input.

Perhaps more to the point:

regression=# select timestamptz '1901/12/13 0:0:0';
     timestamptz
---------------------
 1901-12-13 00:00:00
(1 row)

regression=# select timestamptz '1901/12/14 0:0:0';
      timestamptz
------------------------
 1901-12-14 00:00:00-05
(1 row)

Note the lack of timezone in the first output.

It looks like 1901/12/14 is the oldest date for which the system will
return timezone information; IIRC, this is the oldest date representable
as a 32-bit time_t.  PG implicitly assumes that timestamps before that
are always GMT.

This still doesn't explain why Arnold sees a failure with to_date and
we don't, though.

            regards, tom lane

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > Hmm, I just got my machine to give a similar failure mode with
> > a slightly wacky input.
>
> Perhaps more to the point:
>
> regression=# select timestamptz '1901/12/13 0:0:0';
>      timestamptz
> ---------------------
>  1901-12-13 00:00:00
> (1 row)
>
> regression=# select timestamptz '1901/12/14 0:0:0';
>       timestamptz
> ------------------------
>  1901-12-14 00:00:00-05
> (1 row)
>
> Note the lack of timezone in the first output.
>
> It looks like 1901/12/14 is the oldest date for which the system will
> return timezone information; IIRC, this is the oldest date representable
> as a 32-bit time_t.  PG implicitly assumes that timestamps before that
> are always GMT.

In my case the 23:59:59.99999 vs .99999999999 means that in one case the
system correctly determines that there's no timezone.  In the latter, it
thinks there's no timezone on input (because it hasn't added the
fractional seconds), but that ends up rounding up so that on output it
thinks it's on the 14th and therefore has timezone info (the
IS_VALID_UTIME check) and does the timezone thus giving back a time on the
13th with a timezone.  It's basically a wierd edge case we get wrong.

> This still doesn't explain why Arnold sees a failure with to_date and
> we don't, though.

I think it comes from (from his machine)

db1=> select timestamptz '1901/12/14';
     timestamptz
---------------------
 1901-12-13 13:00:00
(1 row)

I'm not sure what that's happening though.  He may need to go through with
the debugger.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Stephan Szabo wrote:

> On Thu, 21 Aug 2003, Tom Lane wrote:
>
> > Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > > Hmm, I just got my machine to give a similar failure mode with
> > > a slightly wacky input.
> >
> > Perhaps more to the point:
> >
> > regression=# select timestamptz '1901/12/13 0:0:0';
> >      timestamptz
> > ---------------------
> >  1901-12-13 00:00:00
> > (1 row)
> >
> > regression=# select timestamptz '1901/12/14 0:0:0';
> >       timestamptz
> > ------------------------
> >  1901-12-14 00:00:00-05
> > (1 row)
> >
> > Note the lack of timezone in the first output.
> >
> > It looks like 1901/12/14 is the oldest date for which the system will
> > return timezone information; IIRC, this is the oldest date representable
> > as a 32-bit time_t.  PG implicitly assumes that timestamps before that
> > are always GMT.
>
> In my case the 23:59:59.99999 vs .99999999999 means that in one case the
> system correctly determines that there's no timezone.  In the latter, it
> thinks there's no timezone on input (because it hasn't added the
> fractional seconds), but that ends up rounding up so that on output it
> thinks it's on the 14th and therefore has timezone info (the
> IS_VALID_UTIME check) and does the timezone thus giving back a time on the
> 13th with a timezone.  It's basically a wierd edge case we get wrong.
>
> > This still doesn't explain why Arnold sees a failure with to_date and
> > we don't, though.
>
> I think it comes from (from his machine)
>
> db1=> select timestamptz '1901/12/14';
>      timestamptz
> ---------------------
>  1901-12-13 13:00:00
> (1 row)

Wait, he's in australia, what if he's getting the edge case the other way.
It starts out on the 14th, does the timezone conversion.  But then it
looks like it's on the 13th which doesn't have timezone info and doesn't
do the timezone conversion back.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
"ir. F.T.M. van Vugt bc."
Date:
> > > Hmm, I just got my machine to give a similar failure mode with
> > > a slightly wacky input.

Will you allow me to add to this wackyness with the following on CVS tip from
this morning:

free4testing=# select version();
                             version
------------------------------------------------------------------
 PostgreSQL 7.4beta1 on i586-pc-linux-gnu, compiled by GCC 2.95.3
(1 row)

free4testing=# select timestamptz '1901/12/13 23:59:59.99999';
        timestamptz
---------------------------
 1901-12-13 23:59:59.99999
(1 row)

free4testing=# select timestamptz '1901/12/13 23:59:59.999999999';
        timestamptz
---------------------------
 1901-12-14 00:19:00+00:19
(1 row)

free4testing=# select timestamptz '2001/12/14 0:0:0';
      timestamptz
------------------------
 2001-12-14 00:00:00+01
(1 row)


Yes, that's a new timezone on the second case, indeed, probably a kind of
'floating' one ;-)





Frank.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Stephan Szabo wrote:

> On Thu, 21 Aug 2003, Stephan Szabo wrote:
>
> > On Thu, 21 Aug 2003, Tom Lane wrote:
> >
> > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > > > Hmm, I just got my machine to give a similar failure mode with
> > > > a slightly wacky input.
> > >
> > > Perhaps more to the point:
> > >
> > > regression=# select timestamptz '1901/12/13 0:0:0';
> > >      timestamptz
> > > ---------------------
> > >  1901-12-13 00:00:00
> > > (1 row)
> > >
> > > regression=# select timestamptz '1901/12/14 0:0:0';
> > >       timestamptz
> > > ------------------------
> > >  1901-12-14 00:00:00-05
> > > (1 row)
> > >
> > > Note the lack of timezone in the first output.
> > >
> > > It looks like 1901/12/14 is the oldest date for which the system will
> > > return timezone information; IIRC, this is the oldest date representable
> > > as a 32-bit time_t.  PG implicitly assumes that timestamps before that
> > > are always GMT.
> >
> > In my case the 23:59:59.99999 vs .99999999999 means that in one case the
> > system correctly determines that there's no timezone.  In the latter, it
> > thinks there's no timezone on input (because it hasn't added the
> > fractional seconds), but that ends up rounding up so that on output it
> > thinks it's on the 14th and therefore has timezone info (the
> > IS_VALID_UTIME check) and does the timezone thus giving back a time on the
> > 13th with a timezone.  It's basically a wierd edge case we get wrong.
> >
> > > This still doesn't explain why Arnold sees a failure with to_date and
> > > we don't, though.
> >
> > I think it comes from (from his machine)
> >
> > db1=> select timestamptz '1901/12/14';
> >      timestamptz
> > ---------------------
> >  1901-12-13 13:00:00
> > (1 row)
>
> Wait, he's in australia, what if he's getting the edge case the other way.
> It starts out on the 14th, does the timezone conversion.  But then it
> looks like it's on the 13th which doesn't have timezone info and doesn't
> do the timezone conversion back.

And, in fact, when I set my machines timezone to an australian one (not
postgres because that seems to follow a separate path) I get precisely
that behavior.  The timezone conversion is done on input but not on
output.  I'm not really sure how to fix it though.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>>> This still doesn't explain why Arnold sees a failure with to_date and
>>> we don't, though.

> Wait, he's in australia, what if he's getting the edge case the other way.
> It starts out on the 14th, does the timezone conversion.  But then it
> looks like it's on the 13th which doesn't have timezone info and doesn't
> do the timezone conversion back.

Bingo.

regression=# show time zone;
 TimeZone
----------
 EST5EDT
(1 row)

regression=# select to_date('1901/12/14', 'YYYY/MM/DD');
  to_date
------------
 1901-12-14
(1 row)

regression=# set time zone 'CST-9:30CDT';
SET
regression=# select to_date('1901/12/14', 'YYYY/MM/DD');
  to_date
------------
 1901-12-13
(1 row)


It looks like the same result occurs in any time zone east of
Greenwich.

Looking at the code, the problem seems to be that to_date is built as
    timestamptz_date(to_timestamp(str,fmt))

The initial step yields

regression=# select to_timestamp('1901/12/14', 'YYYY/MM/DD');
    to_timestamp
---------------------
 1901-12-13 23:00:00
(1 row)

and then timestamptz_date quite reasonably yields 1901-12-13.

I'm inclined to fix to_date by decomposing the code differently ---
it should avoid the coercion to timestamp, which is a waste of cycles
anyway.  But is to_timestamp (and more generally timestamp's input
converter) broken?  If so, how can we do better?  I don't think we can
entirely avoid the problem of a transition between local and GMT time.

            regards, tom lane

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
"ir. F.T.M. van Vugt bc." <ftm.van.vugt@foxi.nl> writes:
> Will you allow me to add to this wackyness with the following on CVS tip from
> this morning:

What's your time zone setting?  Also, are you using
--enable-integer-datetimes?

            regards, tom lane

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Stephan Szabo
Date:
On Thu, 21 Aug 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> >>> This still doesn't explain why Arnold sees a failure with to_date and
> >>> we don't, though.
>
> > Wait, he's in australia, what if he's getting the edge case the other way.
> > It starts out on the 14th, does the timezone conversion.  But then it
> > looks like it's on the 13th which doesn't have timezone info and doesn't
> > do the timezone conversion back.
>
> Bingo.
>
> regression=# show time zone;
>  TimeZone
> ----------
>  EST5EDT
> (1 row)
>
> regression=# select to_date('1901/12/14', 'YYYY/MM/DD');
>   to_date
> ------------
>  1901-12-14
> (1 row)
>
> regression=# set time zone 'CST-9:30CDT';
> SET
> regression=# select to_date('1901/12/14', 'YYYY/MM/DD');
>   to_date
> ------------
>  1901-12-13
> (1 row)
>
>
> It looks like the same result occurs in any time zone east of
> Greenwich.
>
> Looking at the code, the problem seems to be that to_date is built as
>     timestamptz_date(to_timestamp(str,fmt))
>
> The initial step yields
>
> regression=# select to_timestamp('1901/12/14', 'YYYY/MM/DD');
>     to_timestamp
> ---------------------
>  1901-12-13 23:00:00
> (1 row)
>
> and then timestamptz_date quite reasonably yields 1901-12-13.
>
> I'm inclined to fix to_date by decomposing the code differently ---
> it should avoid the coercion to timestamp, which is a waste of cycles
> anyway.  But is to_timestamp (and more generally timestamp's input
> converter) broken?  If so, how can we do better?  I don't think we can
> entirely avoid the problem of a transition between local and GMT time.

Yes.  Timestamp with timezone is broken on the same boundaries in general.
I'm not really sure how to do better without some work, it seems we end up
with multiple different input values getting the same internal
representation so we can differentiate which version of the input was used
to get there (whether the user said 1901-12-13 23:00 or 1901-12-14).

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Frank van Vugt
Date:
> What's your time zone setting?
> Also, are you using --enable-integer-datetimes?

I'm running Slack v8 (with some updates, but still), so TZ is not defined as
environment variable, but it's using 'Europe/Amsterdam'.

I've verified whether setting TZ made a difference (stop/starting the server),
it didn't.

Also, it was NOT configured with integer datetimes:

~$head /usr/src/postgresql/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by PostgreSQL configure 7.4beta1, which was
generated by GNU Autoconf 2.53.  Invocation command line was

  $ ./configure --with-openssl --with-perl --with-python --with-tcl


I've verified this behaviour on a second machine (AMD instead of Intel) which
was configured with the same options and got the same result

select version();
                                version
------------------------------------------------------------------------
 PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66




Best,




Frank.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Frank van Vugt
Date:
> > What's your time zone setting?

I'm baffled....

I disconnected my psql-client, reconnected a while later to demonstrate this
problem to someone on the server I stop/started earlier, and the problem
wasn't there anymore..... This is the one I stopped, set TZ and started
again. However, since I didn't notice any difference, I stopped the server,
unset TZ and started again a second time. Now while restarting the server
(and verifying whether setting TZ made a difference), I did keep my
psql-connection 'open', so it reconnected automatically, maybe this had
something to do with it?

At this moment, I still have that second server that's still showing the
problem, anything I can verify on that one to help?



Best,






Frank.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> At this moment, I still have that second server that's still showing the
> problem, anything I can verify on that one to help?

"show time zone"?

            regards, tom lane

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Frank van Vugt
Date:
> > anything I can verify on that one to help?
>
> "show time zone"?

free4testing=# show time zone;
 TimeZone
----------
 unknown
(1 row)

free4testing=# select now();
              now
-------------------------------
 2003-08-22 14:36:17.994049+02
(1 row)



The latter is what I meant by 'it seems to be using the correct timezone'.

V7.3.3 on the other Slackware machine shows exactly the same output.

V7.4cvs on the machine where I originally tested setting TZ explicitly had TZ
set to nothing after the test, so as per docs it reverted to UTC. After
unsetting TZ and restarting the postmaster I got the same old behaviour back
(my 19 minute wide timezone) with output equal to the one above.






Frank.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Frank van Vugt
Date:
> anything I can verify on that one to help?

free4testing=# select timestamptz('1901-12-14 0:0:0');
     timestamptz
---------------------
 1901-12-13 23:40:32
(1 row)




Frank.

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
"ir. F.T.M. van Vugt bc." <ftm.van.vugt@foxi.nl> writes:
> free4testing=# select timestamptz '1901/12/13 23:59:59.999999999';
>         timestamptz
> ---------------------------
>  1901-12-14 00:19:00+00:19
> (1 row)

> Yes, that's a new timezone on the second case, indeed, probably a kind of
> 'floating' one ;-)

This is apparently because the Europe/Amsterdam time zone file actually
tries to reflect the wacky local time used back then:

#
# Amsterdam Mean Time was +00:19:32.13 exactly, but the .13 is omitted
# below because the current format requires GMTOFF to be an integer.
# Zone    NAME        GMTOFF    RULES    FORMAT    [UNTIL]
Zone Europe/Amsterdam    0:19:32 -    LMT    1835
            0:19:32    Neth    %s    1937 Jul  1
            0:20    Neth    NE%sT    1940 May 16 0:00 # Dutch Time
            1:00    C-Eur    CE%sT    1945 Apr  2 2:00
            1:00    Neth    CE%sT    1977
            1:00    EU    CE%sT

The display *ought* to be 00:19:32+00:19, but Postgres drops the seconds
part because it is not expecting the timezone offset to have a seconds
component --- which it is doing to work around a bug that may or may not
still exist in the wild:

            tm->tm_hour = tx->tm_hour;
            tm->tm_min = tx->tm_min;
#if NOT_USED
/* XXX HACK
 * Argh! My Linux box puts in a 1 second offset for dates less than 1970
 *    but only if the seconds field was non-zero. So, don't copy the seconds
 *    field and instead carry forward from the original - thomas 97/06/18
 * Note that GNU/Linux uses the standard freeware zic package as do
 *    many other platforms so this may not be GNU/Linux/ix86-specific.
 * Still shows a problem on my up to date Linux box - thomas 2001-01-17
 */
            tm->tm_sec = tx->tm_sec;
#endif

I can't reproduce the bug Thomas mentions on an RH 8.0 system, but I'm
afraid to remove the workaround, as it could affect a lot of people in
order to fix a case that's not of much practical interest anymore...

            regards, tom lane

Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Thu, 21 Aug 2003, Tom Lane wrote:
>> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>>> Wait, he's in australia, what if he's getting the edge case the other way.
>> 
>> I'm inclined to fix to_date by decomposing the code differently ---
>> it should avoid the coercion to timestamp, which is a waste of cycles
>> anyway.  But is to_timestamp (and more generally timestamp's input
>> converter) broken?  If so, how can we do better?  I don't think we can
>> entirely avoid the problem of a transition between local and GMT time.

> Yes.  Timestamp with timezone is broken on the same boundaries in general.
> I'm not really sure how to do better without some work, it seems we end up
> with multiple different input values getting the same internal
> representation so we can differentiate which version of the input was used
> to get there (whether the user said 1901-12-13 23:00 or 1901-12-14).

I've fixed to_date() along the above lines, but the general problem of
how timestamp I/O should behave remains.

I've come to the conclusion that there isn't any really consistent
behavior if we want to stick with the current definition that
"timestamps outside the Unix date range are always UTC".  If we do that,
then there is a set of timestamps at one end of the date range that are
ambiguous (they could be taken as either UTC or local), while at the
other end of the range there is a set of timestamps that can't be
validly converted as either one.  This is essentially the same problem
we have during daylight-savings transition hours: when you "spring
forward" there is no local time 02:30, and when you "fall back" there
are two of 'em.

The solution we've adopted for DST transitions is to interpret invalid
or ambiguous local times as "always standard time".  We could possibly
do the same for the questionable times at the ends of the Unix date
range, ie, always interpret them as UTC (although I've been fooling with
the code for a couple hours now trying to get it to do that, without
much success).

Plan B would be to get rid of the discontinuity by abandoning the rule
that timestamps outside the Unix range are UTC.  We could instead say
that the local time zone offset that mktime() reports for the first date
of the Unix range applies to all prior dates, and similarly the offset
for the last date of the range applies to all later dates.

I'm unsure which of these is a better answer.  Any thoughts?
        regards, tom lane