Thread: Daylight Savings Time handling on persistent connections

Daylight Savings Time handling on persistent connections

From
Randall Nortman
Date:
I assume I'm not the first person to have encountered this, but I
couldn't find anything in the FAQ or on the mailing lists recently.
My apologies if this is already documented somewhere...

My application logs data to a Postgres table continuously (once every
15 seconds), maintaining a persistent connection.  Each datum is
logged with a time stamp (Postgres type "timestamp with time zone").
The application does not explicitly set the time zone, and does not
specify it when inserting the records.  So everything just defaults to
the local time zone configured for the system, which is "US/Eastern".
This has been working fine all summer.

Until this morning, of course, when DST ended and "US/Eastern"
switched from GMT+4 to GMT+5.  Everything logged fine up to 01:59 EDT
(05:59 UTC).  Then the clock ticked to 01:00 EST (06:00 UTC), and I
got a unique constraint violation, because the database incorrectly
computed that I was trying to insert another record at 01:00 EDT
(05:00 UTC).  I restarted the application when I noticed the problem
this morning, and now everything is working correctly.

My suspicion is that Postgres calculates the local offset from UTC
only once per session, during session initialization.  Therefore, it
fails to notice when the local offset changes as a result of DST,
causing the problem I just described.  It's hard for me to test this,
because I don't have a system I can freely muck with the clock on, but
it would completely explain this behavior.

Is this what's happening?  Is it considered a bug?  I can see making
the case for not changing the offset mid-session, but in that case it
should be explained more thoroughly in the documentation.

In my case, I think I'll have my app convert all times to UTC before
inserting them.  This should avoid all such problems in the future.

PostgreSQL version (client and server) is 7.4.5, on i686 Debian sarge.
The client app is in python 2.3.4 using psycopg.

Thanks,

Randall Nortman

Re: Daylight Savings Time handling on persistent connections

From
Tom Lane
Date:
Randall Nortman <postgreslists@wonderclown.com> writes:
> My suspicion is that Postgres calculates the local offset from UTC
> only once per session, during session initialization.

This is demonstrably not so.  We might be able to figure out what
actually went wrong, if you would show us the exact commands your
application issued.

            regards, tom lane

Re: Daylight Savings Time handling on persistent connections

From
Randall Nortman
Date:
On Sun, Oct 31, 2004 at 11:52:03AM -0500, Tom Lane wrote:
> Randall Nortman <postgreslists@wonderclown.com> writes:
> > My suspicion is that Postgres calculates the local offset from UTC
> > only once per session, during session initialization.
>
> This is demonstrably not so.  We might be able to figure out what
> actually went wrong, if you would show us the exact commands your
> application issued.

I can't reproduce the error without messing up my clock, but from my
logs, here's the text of the SQL sent to the server:

insert into sensor_readings_numeric (sensor_id, reading_ts, reading,
min, max) values (3, '2004-10-31 01:00:00', 0.540602, 0.519071837254,
0.551811824539)

And this came back:

ERROR:  duplicate key violates unique constraint "sensor_readings_numeric_pkey"

Table definition:

                               Table "public.sensor_readings_numeric"
   Column   |           Type           |
   Modifiers
------------+--------------------------+-------------------------------------------------------------
 sensor_id  | integer                  | not null
 reading_ts | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 reading    | numeric                  | not null
 min        | numeric                  |
 max        | numeric                  |
Indexes:
    "sensor_readings_numeric_pkey" primary key, btree (reading_ts, sensor_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (sensor_id) REFERENCES sensors(sensor_id)


I'll try to set up a system where I can play around with the clock to
see if I can reproduce the error, but it'll probably be a few days at
least before I can do that.  There's no hurry for me, since this won't
happen again until next year.

Thanks,

Randall Nortman

Re: Daylight Savings Time handling on persistent connections

From
Tom Lane
Date:
Randall Nortman <postgreslists@wonderclown.com> writes:
> I can't reproduce the error without messing up my clock, but from my
> logs, here's the text of the SQL sent to the server:

> insert into sensor_readings_numeric (sensor_id, reading_ts, reading,
> min, max) values (3, '2004-10-31 01:00:00', 0.540602, 0.519071837254,
> 0.551811824539)

> And this came back:
> ERROR:  duplicate key violates unique constraint "sensor_readings_numeric_pkey"

Hmm ... and you were generating that timestamp string how exactly?
I suspect that you actually sent the same timestamp string twice, one
hour apart, in which case I'd have to call this an application bug.
You really need to include the timezone specification in order to
have an unambiguous timestamp string.  It doesn't have to be UTC as you
previously suggested, but it does have to be labeled with the intended
zone.

            regards, tom lane

Re: Daylight Savings Time handling on persistent connections

From
Randall Nortman
Date:
On Sun, Oct 31, 2004 at 12:47:31PM -0500, Tom Lane wrote:
> Randall Nortman <postgreslists@wonderclown.com> writes:
> > I can't reproduce the error without messing up my clock, but from my
> > logs, here's the text of the SQL sent to the server:
>
> > insert into sensor_readings_numeric (sensor_id, reading_ts, reading,
> > min, max) values (3, '2004-10-31 01:00:00', 0.540602, 0.519071837254,
> > 0.551811824539)
>
> > And this came back:
> > ERROR:  duplicate key violates unique constraint "sensor_readings_numeric_pkey"
>
> Hmm ... and you were generating that timestamp string how exactly?
> I suspect that you actually sent the same timestamp string twice, one
> hour apart, in which case I'd have to call this an application bug.
> You really need to include the timezone specification in order to
> have an unambiguous timestamp string.  It doesn't have to be UTC as you
> previously suggested, but it does have to be labeled with the intended
> zone.

Ah, I see now.  PostgreSQL is behaving a bit differently than I
expected.  The timestamp string above is ambiguous in the timezone
US/Eastern -- it could be EST or EDT.  I was expecting PostgreSQL to
resolve this ambiguity based on the current time when the SQL
statement is processed -- if it's currently EST, then the server would
assume that EST was intended, but if it's currently EDT, then it would
assume EDT.  If this were the case, my code would be correct -- yes, I
tried to insert the same timestamp value twice, but the inserts were
issued when my local timezone was in different offsets from UTC.

But it appears that PostgreSQL always assumes EDT in this case,
regardless of the current time?  I can see that being a good idea, for
its predictability.  For example, a client running on a different host
than the server might have its clock off by a few minutes; this could
then cause the server to make a different assumption about the correct
time zone than the client.  Even running on the same host, a delay
between the client issuing a command and the server processing it
could cause this problem.

So yeah, I see the wisdom of always specifying a time zone explicitly
in the query.  In my case, it will probably be easiest to specify UTC,
because otherwise I have to figure out myself whether or not DST was
in effect when the sensor reading was generated.  In my code, in fact,
timestamps are recorded as seconds since the epoch, in UTC, so it
makes little sense to convert to local time anyway.  Right now,
psycopg (the python module I'm using for postgres access) is
generating the timestamp string for me (via
psycopg.TimestampFromTicks()).  I just need to figure out how to get
it to generate the string with an explicit time zone, which I'm sure
is possible.  And if not, I'll just generate the string myself.

Re: Daylight Savings Time handling on persistent connections

From
Tom Lane
Date:
Randall Nortman <postgreslists@wonderclown.com> writes:
> Ah, I see now.  PostgreSQL is behaving a bit differently than I
> expected.  The timestamp string above is ambiguous in the timezone
> US/Eastern -- it could be EST or EDT.  I was expecting PostgreSQL to
> resolve this ambiguity based on the current time when the SQL
> statement is processed

I think this would be a very bad thing for it to do.  It might seem
to make sense for a timestamp representing "now", but as soon as you
consider a timestamp that isn't "now" it becomes a sure way to shoot
yourself in the foot.

> But it appears that PostgreSQL always assumes EDT in this case,
> regardless of the current time?

Actually, the intended and documented behavior is that it should
interpret an ambiguous time as local standard time (e.g., EST not EDT).
That seems to be broken at the moment :-(, which is odd because I'm
quite certain I tested it last time we touched the relevant subroutine.

We have had varying and often platform-specific behaviors on this point
in past releases, but in 8.0 it should be possible to ensure consistent
results now that we are no longer at the mercy of the local libc's
timezone code.

Before I go off and try to fix it, does anyone have any objection to
the rule "interpret an ambiguous time as local standard time"?
This would normally mean picking the later of the two possible
interpretations, which might be the wrong choice for some applications.
(I notice that HPUX's cron is documented to choose the earlier
interpretation in comparable situations.)

> In my code, in fact, timestamps are recorded as seconds since the
> epoch, in UTC, so it makes little sense to convert to local time
> anyway.  Right now, psycopg (the python module I'm using for postgres
> access) is generating the timestamp string for me (via
> psycopg.TimestampFromTicks()).  I just need to figure out how to get
> it to generate the string with an explicit time zone, which I'm sure
> is possible.  And if not, I'll just generate the string myself.

Actually, your best bet is to forgo the conversion altogether.  The
recommended way to get from a Unix epoch value to a timestamp is

    'epoch'::timestamptz + NNNNN * '1 second'::interval

For example:

regression=# select 'epoch'::timestamptz + 1099251435 * '1 second'::interval;
        ?column?
------------------------
 2004-10-31 14:37:15-05
(1 row)

Or you can do

    select 'epoch'::timestamptz + '1099251435 seconds'::interval;

which saves a couple microseconds at execution but requires assembling
the query string as a string.  The latter is probably easy for your
application, but if say you were extracting the numeric value from a
database column, the former would be easier.

            regards, tom lane

Re: Daylight Savings Time handling on persistent connections

From
Randall Nortman
Date:
On Sun, Oct 31, 2004 at 02:44:51PM -0500, Tom Lane wrote:
> Randall Nortman <postgreslists@wonderclown.com> writes:
> > Ah, I see now.  PostgreSQL is behaving a bit differently than I
> > expected.  The timestamp string above is ambiguous in the timezone
> > US/Eastern -- it could be EST or EDT.  I was expecting PostgreSQL to
> > resolve this ambiguity based on the current time when the SQL
> > statement is processed
>
> I think this would be a very bad thing for it to do.  It might seem
> to make sense for a timestamp representing "now", but as soon as you
> consider a timestamp that isn't "now" it becomes a sure way to shoot
> yourself in the foot.

Yes, I absolutely see your point.


> > But it appears that PostgreSQL always assumes EDT in this case,
> > regardless of the current time?
>
> Actually, the intended and documented behavior is that it should
> interpret an ambiguous time as local standard time (e.g., EST not EDT).
> That seems to be broken at the moment :-(, which is odd because I'm
> quite certain I tested it last time we touched the relevant subroutine.

It certainly seems that way, but as I've said I can't reproduce the
bug without mucking with my clock, which is not an option right now.
But looking at the data which was generated overnight in UTC, I see
continuous data all the way up to 05:59.  If the server had started
converting to EST at 01:00EDT, there would be a gap in the data from
05:00UTC to 06:00UTC as the server switched from a +4 offset to +5,
and then data would have been logged with a timestamp one hour in the
future through 06:59UTC, and then I would have gotten a unique
constraint violation when the actual switch happened.


> Before I go off and try to fix it, does anyone have any objection to
> the rule "interpret an ambiguous time as local standard time"?
> This would normally mean picking the later of the two possible
> interpretations, which might be the wrong choice for some applications.
> (I notice that HPUX's cron is documented to choose the earlier
> interpretation in comparable situations.)

I'm finding it hard to see how either way is likely to generate good
results in *any* application, much less in a majority of applications.
So in a way, perhaps the most correct thing to do would be to spit out
an error if the timestamp is ambiguous.  Any application which deals
with timestamps in anything other than UTC should really be handling
the disambiguation itself, because the server can't possibly know what
the application means to do.  Not generating an error is likely to
allow an application bug to go unnoticed, especially if the database
does not have a unique constraint on timestamps (as mine does).

Then again, it's not up to the database to expose bugs in the client,
so perhaps it's best to just stick with the current intended behavior
of always choosing local standard time.  Or maybe we should write our
legislative representatives and get them to abolish DST.  ;)


> Actually, your best bet is to forgo the conversion altogether.  The
> recommended way to get from a Unix epoch value to a timestamp is
>
>     'epoch'::timestamptz + NNNNN * '1 second'::interval

At first glance, that seems to me to be really inefficient, but that's
just because my brain tends to associate verbosity in code with
runtime overhead.  In this case, it's probably just as fast as letting
the Python library do the math required to convert the Unix timestamp
to a date/time string.  And if Postgres stores timestamps as some unit
of time since an epoch, then it would be quite a bit more efficient.
Of course, all these calculations happen in the blink of an eye, and
I'm only logging data every 15 seconds, so I suppose it doesn't matter
anyway.  So thanks for the tip!  That will be much easier and more
reliable than the way I'm currently doing it.  (I just hope that
nobody ever gets the idea of changing the Unix epoch.)

Thanks for all your help,

Randall Nortman

Re: Daylight Savings Time handling on persistent connections

From
Tom Lane
Date:
Randall Nortman <postgreslists@wonderclown.com> writes:
> On Sun, Oct 31, 2004 at 02:44:51PM -0500, Tom Lane wrote:
>> Actually, the intended and documented behavior is that it should
>> interpret an ambiguous time as local standard time (e.g., EST not EDT).

> I'm finding it hard to see how either way is likely to generate good
> results in *any* application, much less in a majority of applications.
> So in a way, perhaps the most correct thing to do would be to spit out
> an error if the timestamp is ambiguous.  Any application which deals
> with timestamps in anything other than UTC should really be handling
> the disambiguation itself, because the server can't possibly know what
> the application means to do.  Not generating an error is likely to
> allow an application bug to go unnoticed, especially if the database
> does not have a unique constraint on timestamps (as mine does).

That line of argument leads directly to the conclusion that we shouldn't
allow timezone-less input strings at all, since it's unlikely that
anyone would code their app to append a timezone spec only during the
two hours a year when it actually matters.  And wouldn't you rather have
had the problem pointed out immediately on testing the app, rather than
waiting till 1AM on a fall Sunday morning to find out it's broken?

However, I am not prepared to buy into requiring explicit TZ specs
always... it's just too much of a PITA.

For human users, there would be some value in acting this way, since
it would serve to remind them of the issue only when it actually
matters.  Comments anyone?

            regards, tom lane

Re: Daylight Savings Time handling on persistent connections

From
Martijn van Oosterhout
Date:
On Sun, Oct 31, 2004 at 04:14:52PM -0500, Tom Lane wrote:
> That line of argument leads directly to the conclusion that we shouldn't
> allow timezone-less input strings at all, since it's unlikely that
> anyone would code their app to append a timezone spec only during the
> two hours a year when it actually matters.  And wouldn't you rather have
> had the problem pointed out immediately on testing the app, rather than
> waiting till 1AM on a fall Sunday morning to find out it's broken?

The only issue is storing a timezone-less timestamp into a field that
is timestamp with timezone. Every other combination is easy to handle.
If you're doing this, isn't it an indication that your field is the
wrong type? It's just plain ambiguous no matter which way you put it.
Assuming UTC would be almost as valid.

We don't promote integers to other types of numbers, so maybe this
shouldn't be allowed either.

> For human users, there would be some value in acting this way, since
> it would serve to remind them of the issue only when it actually
> matters.  Comments anyone?

Except that means your program will work all the time except for one or
two hours per year where it breaks. Chances are your testing is not
going to trip it...

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Daylight Savings Time handling on persistent connections

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sun, Oct 31, 2004 at 04:14:52PM -0500, Tom Lane wrote:
>> That line of argument leads directly to the conclusion that we shouldn't
>> allow timezone-less input strings at all, since it's unlikely that
>> anyone would code their app to append a timezone spec only during the
>> two hours a year when it actually matters.

>> For human users, there would be some value in acting this way, since
>> it would serve to remind them of the issue only when it actually
>> matters.  Comments anyone?

> Except that means your program will work all the time except for one or
> two hours per year where it breaks. Chances are your testing is not
> going to trip it...

ISTM basically we have to make a tradeoff between convenience for
human-driven data entry and reliability for program-driven data entry.
Refusing TZ-less data input would certainly force programmers to write
their programs more safely, but is it worth the inconvenience for
interpreting human-generated input strings?  I doubt it.  We already
allow a great variety of input syntaxes, some would say more than we
should, in order to make the timestamp input converters useful for
interpreting hand-entered strings.

I'm inclined to think that rejecting impossible or ambiguous input
without a zone is reasonable (and it would go along with the changes
we made in 7.4 to tighten up datetime field order assumptions).
But I don't want to take away the convenience of leaving off the
zone altogether.

One point here is that timestamp-to-timestamptz datatype conversion will
be affected by whatever we choose.  While it's easy to say "reject it"
for data coming into a database, it's less easy to say that a coercion
function should fail on some inputs it didn't use to fail on.

            regards, tom lane

Re: Daylight Savings Time handling on persistent connections

From
Vinko Vrsalovic
Date:
On Sun, Oct 31, 2004 at 05:55:23PM -0500, Tom Lane wrote:
[...]
> I'm inclined to think that rejecting impossible or ambiguous input
> without a zone is reasonable (and it would go along with the changes
> we made in 7.4 to tighten up datetime field order assumptions).
> But I don't want to take away the convenience of leaving off the
> zone altogether.
>
> One point here is that timestamp-to-timestamptz datatype conversion will
> be affected by whatever we choose.  While it's easy to say "reject it"
> for data coming into a database, it's less easy to say that a coercion
> function should fail on some inputs it didn't use to fail on.

What about letting the user decide on the behaviour through a config
option? I really missed this when the integer parsing changed.

The default could be to reject ambiguous input, allowing the user to
choose the assumed zone if he wants to, in a global and per-connection
basis.

--
Vinko Vrsalovic <el[|-@-|]vinko.cl>

Re: Daylight Savings Time handling on persistent connections

From
Martijn van Oosterhout
Date:
On Mon, Nov 01, 2004 at 01:57:38PM -0300, Vinko Vrsalovic wrote:
> On Sun, Oct 31, 2004 at 05:55:23PM -0500, Tom Lane wrote:
> > One point here is that timestamp-to-timestamptz datatype conversion will
> > be affected by whatever we choose.  While it's easy to say "reject it"
> > for data coming into a database, it's less easy to say that a coercion
> > function should fail on some inputs it didn't use to fail on.
>
> What about letting the user decide on the behaviour through a config
> option? I really missed this when the integer parsing changed.
>
> The default could be to reject ambiguous input, allowing the user to
> choose the assumed zone if he wants to, in a global and per-connection
> basis.

It's not that simple. In this case the conversion will now produce a
different datatype, which means that perfectly valid pl/pgsql may now
be invalid. Columns defined unique will now have a different criteria
for uniqueness. Even how timestamps are stored will be different. By
switching a config option, you may have just invalidated your entire
database.

For the parsing integer issue it may have worked, but this is another
kettle of fish. I don't think you can do this as a simple switch, it
would have to set during the initdb and not allowed to be changed
afterwards. I don't know if that something that can be supported.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Daylight Savings Time handling on persistent connections

From
Vinko Vrsalovic
Date:
On Mon, Nov 01, 2004 at 07:08:39PM +0100, Martijn van Oosterhout wrote:

> For the parsing integer issue it may have worked, but this is another
> kettle of fish. I don't think you can do this as a simple switch, it
> would have to set during the initdb and not allowed to be changed
> afterwards. I don't know if that something that can be supported.

I suspected it wasn't that easy. Anyhow, I strongly believe
that when no reasonable defaults can be deduced, the software
should give the user the ability to decide what he wants to do.

Of course technical (implementation, maintenance, etc.) issues are
highly relevant and if it can't reasonably be done, well, tough luck,
but I think (and I don't have a clue about the internals of
PostgreSQL, so take this with two grains of salt) a solution such
as the one you mention should be given consideration.

--
Vinko Vrsalovic <el[|-@-|]vinko.cl>

Re: Daylight Savings Time handling on persistent connections

From
Steve Crawford
Date:
On Sunday 31 October 2004 11:44 am, Tom Lane wrote:
> Randall Nortman <postgreslists@wonderclown.com> writes:
> > Ah, I see now.  PostgreSQL is behaving a bit differently than I
> > expected.  The timestamp string above is ambiguous in the
> > timezone US/Eastern -- it could be EST or EDT.  I was expecting
> > PostgreSQL to resolve this ambiguity based on the current time
> > when the SQL statement is processed
>
> I think this would be a very bad thing for it to do.  It might seem
> to make sense for a timestamp representing "now", but as soon as
> you consider a timestamp that isn't "now" it becomes a sure way to
> shoot yourself in the foot.

Would it help to add the PG locale TZ to the insert statement? For
example the following queries return the TZ as text.

select to_char(now(),'tz');
 to_char
---------
 pst

select to_char(now()-'3 days'::interval,'tz');
 to_char
---------
 pdt

So the following might fix this particular situation:
insert into sensor_readings_numeric (...) values (...,'2004-10-31
01:00:00 ' || to_char(now(),'tz'),...)

I realize that it assumes that the data is being inserted at the time
it was taken so a reading taken just before DST changes and inserted
just after will be incorrect but it may work for this particular app.

Of course the better solution is to have the application generate a
fully-qualified timestamp with time zone. Generating all the
timestamps in UTC and explicitly specifying that in the insert is
probably the easiest way to go. Your queries will still have your
local-appropriate TZ:

select '2004-10-31 00:00:00+00'::timestamptz;
      timestamptz
------------------------
 2004-10-30 17:00:00-07

select '2004-11-01 00:00:00+00'::timestamptz;
      timestamptz
------------------------
 2004-10-31 16:00:00-08


Cheers,
Steve