Thread: Re: postgresql 7.3.2 bug on date '1901-12-13' and '1901-12
Hi Stephan and everyone We have tried PG 7.3.4 (the bug still presents itself) These are the following results.. Welcome to psql 7.3.4, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit db2=> select cast(timestamptz '1901/12/13' as date); date ------------ 1901-12-13 (1 row) db2=> select cast(timestamptz '1901/12/14' as date); date ------------ 1901-12-13 (1 row) db2=> select cast(timestamptz '1901/12/15' as date); date ------------ 1901-12-15 (1 row) Compiler switches.. ./configure --prefix=/adamdb/postgres/pg_admin/pgsql_7.3.4 --enable-debug --with-pgport=5433 --without-readline --without-zlib --enable-integer-datetimes Everything else was left at default settings Readline and zlib were bypassed because they are not available. HP-UX 11.11i
On Thu, 21 Aug 2003, Arnold Mavromatis wrote: > Hi Stephan and everyone Hmm, I just got my machine to give a similar failure mode with a slightly wacky input. sszabo=> select timestamptz '1901/12/13 23:59:59.99999'; timestamptz --------------------------- 1901-12-13 23:59:59.99999 (1 row) sszabo=> select timestamptz '1901/12/13 23:59:59.999999999'; timestamptz ------------------------ 1901-12-13 16:00:00-08 (1 row) sszabo=> select timestamptz '1901/12/14 0:0:0'; timestamptz ------------------------ 1901-12-14 00:00:00-08 (1 row) This happens on both 7.3.4 and 7.4.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Hmm, I just got my machine to give a similar failure mode with > a slightly wacky input. Perhaps more to the point: regression=# select timestamptz '1901/12/13 0:0:0'; timestamptz --------------------- 1901-12-13 00:00:00 (1 row) regression=# select timestamptz '1901/12/14 0:0:0'; timestamptz ------------------------ 1901-12-14 00:00:00-05 (1 row) Note the lack of timezone in the first output. It looks like 1901/12/14 is the oldest date for which the system will return timezone information; IIRC, this is the oldest date representable as a 32-bit time_t. PG implicitly assumes that timestamps before that are always GMT. This still doesn't explain why Arnold sees a failure with to_date and we don't, though. regards, tom lane
On Thu, 21 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > Hmm, I just got my machine to give a similar failure mode with > > a slightly wacky input. > > Perhaps more to the point: > > regression=# select timestamptz '1901/12/13 0:0:0'; > timestamptz > --------------------- > 1901-12-13 00:00:00 > (1 row) > > regression=# select timestamptz '1901/12/14 0:0:0'; > timestamptz > ------------------------ > 1901-12-14 00:00:00-05 > (1 row) > > Note the lack of timezone in the first output. > > It looks like 1901/12/14 is the oldest date for which the system will > return timezone information; IIRC, this is the oldest date representable > as a 32-bit time_t. PG implicitly assumes that timestamps before that > are always GMT. In my case the 23:59:59.99999 vs .99999999999 means that in one case the system correctly determines that there's no timezone. In the latter, it thinks there's no timezone on input (because it hasn't added the fractional seconds), but that ends up rounding up so that on output it thinks it's on the 14th and therefore has timezone info (the IS_VALID_UTIME check) and does the timezone thus giving back a time on the 13th with a timezone. It's basically a wierd edge case we get wrong. > This still doesn't explain why Arnold sees a failure with to_date and > we don't, though. I think it comes from (from his machine) db1=> select timestamptz '1901/12/14'; timestamptz --------------------- 1901-12-13 13:00:00 (1 row) I'm not sure what that's happening though. He may need to go through with the debugger.
On Thu, 21 Aug 2003, Stephan Szabo wrote: > On Thu, 21 Aug 2003, Tom Lane wrote: > > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > > Hmm, I just got my machine to give a similar failure mode with > > > a slightly wacky input. > > > > Perhaps more to the point: > > > > regression=# select timestamptz '1901/12/13 0:0:0'; > > timestamptz > > --------------------- > > 1901-12-13 00:00:00 > > (1 row) > > > > regression=# select timestamptz '1901/12/14 0:0:0'; > > timestamptz > > ------------------------ > > 1901-12-14 00:00:00-05 > > (1 row) > > > > Note the lack of timezone in the first output. > > > > It looks like 1901/12/14 is the oldest date for which the system will > > return timezone information; IIRC, this is the oldest date representable > > as a 32-bit time_t. PG implicitly assumes that timestamps before that > > are always GMT. > > In my case the 23:59:59.99999 vs .99999999999 means that in one case the > system correctly determines that there's no timezone. In the latter, it > thinks there's no timezone on input (because it hasn't added the > fractional seconds), but that ends up rounding up so that on output it > thinks it's on the 14th and therefore has timezone info (the > IS_VALID_UTIME check) and does the timezone thus giving back a time on the > 13th with a timezone. It's basically a wierd edge case we get wrong. > > > This still doesn't explain why Arnold sees a failure with to_date and > > we don't, though. > > I think it comes from (from his machine) > > db1=> select timestamptz '1901/12/14'; > timestamptz > --------------------- > 1901-12-13 13:00:00 > (1 row) Wait, he's in australia, what if he's getting the edge case the other way. It starts out on the 14th, does the timezone conversion. But then it looks like it's on the 13th which doesn't have timezone info and doesn't do the timezone conversion back.
> > > Hmm, I just got my machine to give a similar failure mode with > > > a slightly wacky input. Will you allow me to add to this wackyness with the following on CVS tip from this morning: free4testing=# select version(); version ------------------------------------------------------------------ PostgreSQL 7.4beta1 on i586-pc-linux-gnu, compiled by GCC 2.95.3 (1 row) free4testing=# select timestamptz '1901/12/13 23:59:59.99999'; timestamptz --------------------------- 1901-12-13 23:59:59.99999 (1 row) free4testing=# select timestamptz '1901/12/13 23:59:59.999999999'; timestamptz --------------------------- 1901-12-14 00:19:00+00:19 (1 row) free4testing=# select timestamptz '2001/12/14 0:0:0'; timestamptz ------------------------ 2001-12-14 00:00:00+01 (1 row) Yes, that's a new timezone on the second case, indeed, probably a kind of 'floating' one ;-) Frank.
On Thu, 21 Aug 2003, Stephan Szabo wrote: > On Thu, 21 Aug 2003, Stephan Szabo wrote: > > > On Thu, 21 Aug 2003, Tom Lane wrote: > > > > > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > > > > Hmm, I just got my machine to give a similar failure mode with > > > > a slightly wacky input. > > > > > > Perhaps more to the point: > > > > > > regression=# select timestamptz '1901/12/13 0:0:0'; > > > timestamptz > > > --------------------- > > > 1901-12-13 00:00:00 > > > (1 row) > > > > > > regression=# select timestamptz '1901/12/14 0:0:0'; > > > timestamptz > > > ------------------------ > > > 1901-12-14 00:00:00-05 > > > (1 row) > > > > > > Note the lack of timezone in the first output. > > > > > > It looks like 1901/12/14 is the oldest date for which the system will > > > return timezone information; IIRC, this is the oldest date representable > > > as a 32-bit time_t. PG implicitly assumes that timestamps before that > > > are always GMT. > > > > In my case the 23:59:59.99999 vs .99999999999 means that in one case the > > system correctly determines that there's no timezone. In the latter, it > > thinks there's no timezone on input (because it hasn't added the > > fractional seconds), but that ends up rounding up so that on output it > > thinks it's on the 14th and therefore has timezone info (the > > IS_VALID_UTIME check) and does the timezone thus giving back a time on the > > 13th with a timezone. It's basically a wierd edge case we get wrong. > > > > > This still doesn't explain why Arnold sees a failure with to_date and > > > we don't, though. > > > > I think it comes from (from his machine) > > > > db1=> select timestamptz '1901/12/14'; > > timestamptz > > --------------------- > > 1901-12-13 13:00:00 > > (1 row) > > Wait, he's in australia, what if he's getting the edge case the other way. > It starts out on the 14th, does the timezone conversion. But then it > looks like it's on the 13th which doesn't have timezone info and doesn't > do the timezone conversion back. And, in fact, when I set my machines timezone to an australian one (not postgres because that seems to follow a separate path) I get precisely that behavior. The timezone conversion is done on input but not on output. I'm not really sure how to fix it though.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >>> This still doesn't explain why Arnold sees a failure with to_date and >>> we don't, though. > Wait, he's in australia, what if he's getting the edge case the other way. > It starts out on the 14th, does the timezone conversion. But then it > looks like it's on the 13th which doesn't have timezone info and doesn't > do the timezone conversion back. Bingo. regression=# show time zone; TimeZone ---------- EST5EDT (1 row) regression=# select to_date('1901/12/14', 'YYYY/MM/DD'); to_date ------------ 1901-12-14 (1 row) regression=# set time zone 'CST-9:30CDT'; SET regression=# select to_date('1901/12/14', 'YYYY/MM/DD'); to_date ------------ 1901-12-13 (1 row) It looks like the same result occurs in any time zone east of Greenwich. Looking at the code, the problem seems to be that to_date is built as timestamptz_date(to_timestamp(str,fmt)) The initial step yields regression=# select to_timestamp('1901/12/14', 'YYYY/MM/DD'); to_timestamp --------------------- 1901-12-13 23:00:00 (1 row) and then timestamptz_date quite reasonably yields 1901-12-13. I'm inclined to fix to_date by decomposing the code differently --- it should avoid the coercion to timestamp, which is a waste of cycles anyway. But is to_timestamp (and more generally timestamp's input converter) broken? If so, how can we do better? I don't think we can entirely avoid the problem of a transition between local and GMT time. regards, tom lane
"ir. F.T.M. van Vugt bc." <ftm.van.vugt@foxi.nl> writes: > Will you allow me to add to this wackyness with the following on CVS tip from > this morning: What's your time zone setting? Also, are you using --enable-integer-datetimes? regards, tom lane
On Thu, 21 Aug 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > >>> This still doesn't explain why Arnold sees a failure with to_date and > >>> we don't, though. > > > Wait, he's in australia, what if he's getting the edge case the other way. > > It starts out on the 14th, does the timezone conversion. But then it > > looks like it's on the 13th which doesn't have timezone info and doesn't > > do the timezone conversion back. > > Bingo. > > regression=# show time zone; > TimeZone > ---------- > EST5EDT > (1 row) > > regression=# select to_date('1901/12/14', 'YYYY/MM/DD'); > to_date > ------------ > 1901-12-14 > (1 row) > > regression=# set time zone 'CST-9:30CDT'; > SET > regression=# select to_date('1901/12/14', 'YYYY/MM/DD'); > to_date > ------------ > 1901-12-13 > (1 row) > > > It looks like the same result occurs in any time zone east of > Greenwich. > > Looking at the code, the problem seems to be that to_date is built as > timestamptz_date(to_timestamp(str,fmt)) > > The initial step yields > > regression=# select to_timestamp('1901/12/14', 'YYYY/MM/DD'); > to_timestamp > --------------------- > 1901-12-13 23:00:00 > (1 row) > > and then timestamptz_date quite reasonably yields 1901-12-13. > > I'm inclined to fix to_date by decomposing the code differently --- > it should avoid the coercion to timestamp, which is a waste of cycles > anyway. But is to_timestamp (and more generally timestamp's input > converter) broken? If so, how can we do better? I don't think we can > entirely avoid the problem of a transition between local and GMT time. Yes. Timestamp with timezone is broken on the same boundaries in general. I'm not really sure how to do better without some work, it seems we end up with multiple different input values getting the same internal representation so we can differentiate which version of the input was used to get there (whether the user said 1901-12-13 23:00 or 1901-12-14).
> What's your time zone setting? > Also, are you using --enable-integer-datetimes? I'm running Slack v8 (with some updates, but still), so TZ is not defined as environment variable, but it's using 'Europe/Amsterdam'. I've verified whether setting TZ made a difference (stop/starting the server), it didn't. Also, it was NOT configured with integer datetimes: ~$head /usr/src/postgresql/config.log This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by PostgreSQL configure 7.4beta1, which was generated by GNU Autoconf 2.53. Invocation command line was $ ./configure --with-openssl --with-perl --with-python --with-tcl I've verified this behaviour on a second machine (AMD instead of Intel) which was configured with the same options and got the same result select version(); version ------------------------------------------------------------------------ PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 Best, Frank.
> > What's your time zone setting? I'm baffled.... I disconnected my psql-client, reconnected a while later to demonstrate this problem to someone on the server I stop/started earlier, and the problem wasn't there anymore..... This is the one I stopped, set TZ and started again. However, since I didn't notice any difference, I stopped the server, unset TZ and started again a second time. Now while restarting the server (and verifying whether setting TZ made a difference), I did keep my psql-connection 'open', so it reconnected automatically, maybe this had something to do with it? At this moment, I still have that second server that's still showing the problem, anything I can verify on that one to help? Best, Frank.
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > At this moment, I still have that second server that's still showing the > problem, anything I can verify on that one to help? "show time zone"? regards, tom lane
> > anything I can verify on that one to help? > > "show time zone"? free4testing=# show time zone; TimeZone ---------- unknown (1 row) free4testing=# select now(); now ------------------------------- 2003-08-22 14:36:17.994049+02 (1 row) The latter is what I meant by 'it seems to be using the correct timezone'. V7.3.3 on the other Slackware machine shows exactly the same output. V7.4cvs on the machine where I originally tested setting TZ explicitly had TZ set to nothing after the test, so as per docs it reverted to UTC. After unsetting TZ and restarting the postmaster I got the same old behaviour back (my 19 minute wide timezone) with output equal to the one above. Frank.
> anything I can verify on that one to help? free4testing=# select timestamptz('1901-12-14 0:0:0'); timestamptz --------------------- 1901-12-13 23:40:32 (1 row) Frank.
"ir. F.T.M. van Vugt bc." <ftm.van.vugt@foxi.nl> writes: > free4testing=# select timestamptz '1901/12/13 23:59:59.999999999'; > timestamptz > --------------------------- > 1901-12-14 00:19:00+00:19 > (1 row) > Yes, that's a new timezone on the second case, indeed, probably a kind of > 'floating' one ;-) This is apparently because the Europe/Amsterdam time zone file actually tries to reflect the wacky local time used back then: # # Amsterdam Mean Time was +00:19:32.13 exactly, but the .13 is omitted # below because the current format requires GMTOFF to be an integer. # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Europe/Amsterdam 0:19:32 - LMT 1835 0:19:32 Neth %s 1937 Jul 1 0:20 Neth NE%sT 1940 May 16 0:00 # Dutch Time 1:00 C-Eur CE%sT 1945 Apr 2 2:00 1:00 Neth CE%sT 1977 1:00 EU CE%sT The display *ought* to be 00:19:32+00:19, but Postgres drops the seconds part because it is not expecting the timezone offset to have a seconds component --- which it is doing to work around a bug that may or may not still exist in the wild: tm->tm_hour = tx->tm_hour; tm->tm_min = tx->tm_min; #if NOT_USED /* XXX HACK * Argh! My Linux box puts in a 1 second offset for dates less than 1970 * but only if the seconds field was non-zero. So, don't copy the seconds * field and instead carry forward from the original - thomas 97/06/18 * Note that GNU/Linux uses the standard freeware zic package as do * many other platforms so this may not be GNU/Linux/ix86-specific. * Still shows a problem on my up to date Linux box - thomas 2001-01-17 */ tm->tm_sec = tx->tm_sec; #endif I can't reproduce the bug Thomas mentions on an RH 8.0 system, but I'm afraid to remove the workaround, as it could affect a lot of people in order to fix a case that's not of much practical interest anymore... regards, tom lane
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > On Thu, 21 Aug 2003, Tom Lane wrote: >> Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >>> Wait, he's in australia, what if he's getting the edge case the other way. >> >> I'm inclined to fix to_date by decomposing the code differently --- >> it should avoid the coercion to timestamp, which is a waste of cycles >> anyway. But is to_timestamp (and more generally timestamp's input >> converter) broken? If so, how can we do better? I don't think we can >> entirely avoid the problem of a transition between local and GMT time. > Yes. Timestamp with timezone is broken on the same boundaries in general. > I'm not really sure how to do better without some work, it seems we end up > with multiple different input values getting the same internal > representation so we can differentiate which version of the input was used > to get there (whether the user said 1901-12-13 23:00 or 1901-12-14). I've fixed to_date() along the above lines, but the general problem of how timestamp I/O should behave remains. I've come to the conclusion that there isn't any really consistent behavior if we want to stick with the current definition that "timestamps outside the Unix date range are always UTC". If we do that, then there is a set of timestamps at one end of the date range that are ambiguous (they could be taken as either UTC or local), while at the other end of the range there is a set of timestamps that can't be validly converted as either one. This is essentially the same problem we have during daylight-savings transition hours: when you "spring forward" there is no local time 02:30, and when you "fall back" there are two of 'em. The solution we've adopted for DST transitions is to interpret invalid or ambiguous local times as "always standard time". We could possibly do the same for the questionable times at the ends of the Unix date range, ie, always interpret them as UTC (although I've been fooling with the code for a couple hours now trying to get it to do that, without much success). Plan B would be to get rid of the discontinuity by abandoning the rule that timestamps outside the Unix range are UTC. We could instead say that the local time zone offset that mktime() reports for the first date of the Unix range applies to all prior dates, and similarly the offset for the last date of the range applies to all later dates. I'm unsure which of these is a better answer. Any thoughts? regards, tom lane