Thread: Comment on timezone and interval types

Comment on timezone and interval types

From
Bruno Wolff III
Date:
Recently there has been some discussion about attaching a timezone to
a timestamp and some other discussion about including a 'day' part
in the interval type. These two features impact each other, since
if you add a 'day' to a timestamp the result can depend on what timezone
the timestamp is supposed to be in. It probably makes more sense to use
a timezone associated with the timestamp than say the timezone GUC or the
fixed timezone UTC.

Re: Comment on timezone and interval types

From
Martijn van Oosterhout
Date:
On Sat, Oct 23, 2004 at 06:49:15PM -0500, Bruno Wolff III wrote:
> Recently there has been some discussion about attaching a timezone to
> a timestamp and some other discussion about including a 'day' part
> in the interval type. These two features impact each other, since
> if you add a 'day' to a timestamp the result can depend on what timezone
> the timestamp is supposed to be in. It probably makes more sense to use
> a timezone associated with the timestamp than say the timezone GUC or the
> fixed timezone UTC.

I agree. One issue I can think of is that if you store each timestamp
as a (seconds,timezone) pair, the storage requirements will balloon,
since timezone can be something like "Australia/Sydney" and this will
be repeated for every value in the table. I don't know how to deal
easily with this since there is no unique identifier to timezones and
no implicit order.

The only solution I can think of is have initdb create a pg_timezones
table which assigns an OID to each timezone it finds. Then the type can
use that.

I think this is a good solution actually, any thoughts?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Comment on timezone and interval types

From
Thomas Hallgren
Date:
Martijn,
> I agree. One issue I can think of is that if you store each timestamp
> as a (seconds,timezone) pair, the storage requirements will balloon,
> since timezone can be something like "Australia/Sydney" and this will
> be repeated for every value in the table. I don't know how to deal
> easily with this since there is no unique identifier to timezones and
> no implicit order.
>
> The only solution I can think of is have initdb create a pg_timezones
> table which assigns an OID to each timezone it finds. Then the type can
> use that.
>
> I think this is a good solution actually, any thoughts?

Using OID's is a good idea, but I think a canonical list of known
timezone to OID mappings must be maintained and shipped with the
PostgreSQL core.

If OID's are generated at initdb time, there's a great risk that the
OID's will differ between databases using different versions of
PostgreSQL. That in turn will have some negative implications for data
exchange.

Regards,
Thomas Hallgren

Re: Comment on timezone and interval types

From
Martijn van Oosterhout
Date:
On Wed, Oct 27, 2004 at 09:21:39AM +0200, Thomas Hallgren wrote:
> Martijn,
> >I agree. One issue I can think of is that if you store each timestamp
> >as a (seconds,timezone) pair, the storage requirements will balloon,
> >since timezone can be something like "Australia/Sydney" and this will
> >be repeated for every value in the table. I don't know how to deal
> >easily with this since there is no unique identifier to timezones and
> >no implicit order.
> >
> >The only solution I can think of is have initdb create a pg_timezones
> >table which assigns an OID to each timezone it finds. Then the type can
> >use that.
> >
> >I think this is a good solution actually, any thoughts?
>
> Using OID's is a good idea, but I think a canonical list of known
> timezone to OID mappings must be maintained and shipped with the
> PostgreSQL core.

How can there be a "canonical list of known timezones" if every
operating system has it's own list. Maybe you can provide a base list,
but you have to allow for people to make their own.

> If OID's are generated at initdb time, there's a great risk that the
> OID's will differ between databases using different versions of
> PostgreSQL. That in turn might have some negative implications for data
> exchange.

I doubt it, the OIDs would never be output. Types, triggers, functions
etc all have OIDs that never appear in any output anywhere, so why
should these. Since PostgreSQL doesn't support you to copying any part
of the raw data files between different installations, let alone
different versions, I think the issues with data exchange are not a
problem.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Comment on timezone and interval types

From
Michael Glaesemann
Date:
On Oct 27, 2004, at 6:00 PM, Martijn van Oosterhout wrote:

> On Wed, Oct 27, 2004 at 09:21:39AM +0200, Thomas Hallgren wrote:
>>
>> Using OID's is a good idea, but I think a canonical list of known
>> timezone to OID mappings must be maintained and shipped with the
>> PostgreSQL core.
>
> How can there be a "canonical list of known timezones" if every
> operating system has it's own list. Maybe you can provide a base list,
> but you have to allow for people to make their own.

My understanding is that with the addition of the zic time zone data to
the PostgreSQL server, there's no longer any need to rely on OS time
zone data. Some areas may still use OS time zone data--I'm not sure if
the all the niggling pieces have been converted yet. One could then
produce a canonical list, based on the zic data.

Corrections welcome if I've misunderstood something.

Regards,

Michael Glaesemann
grzm myrealbox com


Re: Comment on timezone and interval types

From
"Marco Ferretti"
Date:
On Wed, 2004-10-27 at 09:00 +0200, Thomas Hallgren wrote:

> Using OID's is a good idea, but I think a canonical list of known
> timezone to OID mappings must be maintained and shipped with the
> PostgreSQL core.
>
> If OID's are generated at initdb time, there's a great risk that the
> OID's will differ between databases using different versions of
> PostgreSQL. That in turn will have some negative implications for data
> exchange.
>
> Regards,
> Thomas Hallgren
>

I definitely agree with Thomas . The fact that  OIDs are generated at
initdb time really scares me since we have different versions of the
database engine running; it would really be a nightmare if the OIDs were
different from machine to machine


Re: Comment on timezone and interval types

From
Tom Lane
Date:
Michael Glaesemann <grzm@myrealbox.com> writes:
> On Oct 27, 2004, at 6:00 PM, Martijn van Oosterhout wrote:
>> How can there be a "canonical list of known timezones" if every
>> operating system has it's own list. Maybe you can provide a base list,
>> but you have to allow for people to make their own.

> My understanding is that with the addition of the zic time zone data to
> the PostgreSQL server, there's no longer any need to rely on OS time
> zone data.

Correct, but it is still the case that different installations will need
to have slightly different timezone lists.  Consider for example the
australian_timezones kluge we have now, and consider that there are
several known cases of zone name conflicts that are not covered by
australian_timezones (the one I remember at the moment is IST which both
the Israelis and the Indians use; but I think there are some others).
I think the most reasonable way to solve this will be to invent a
configuration file that lets people list the zone abbreviations they
want to use and the corresponding UTC offsets.  We will need a mapping
method that can cope with changes in such a file.

But having said that, I concur with Martijn that there is no problem,
because the OIDs (or whatever numeric ID we use) are inside the database
and will never be visible outside it.  There is no more portability risk
here than there is in using platform-native byte order in integers.

            regards, tom lane

Re: Comment on timezone and interval types

From
Stuart Bishop
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Wolff III wrote:
| Recently there has been some discussion about attaching a timezone to
| a timestamp and some other discussion about including a 'day' part
| in the interval type. These two features impact each other, since
| if you add a 'day' to a timestamp the result can depend on what timezone
| the timestamp is supposed to be in. It probably makes more sense to use
| a timezone associated with the timestamp than say the timezone GUC or the
| fixed timezone UTC.

If you add a 'day' to a timestamp, it should be identical to adding 24
hours. Any other interpretation leads to all sorts of wierd ambiguities.
For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on
April 4th 2004 didn't exist in that timezone because the clocks were put
forward and that hour skipped. If you round up to the nearest existant
time, you then have the issue that '2am April 3rd + 1 day == 3am Aril
3rd + 1 day'.

- --
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBge+sAfqZj7rGN0oRAgInAJsEuYkxX6/jsaszquhjEX/PH3nXvACfVBW9
Z3sfU5XGgxSOI77vuOOOzKA=
=euY6
-----END PGP SIGNATURE-----

Re: Comment on timezone and interval types

From
Tom Lane
Date:
Stuart Bishop <stuart@stuartbishop.net> writes:
> If you add a 'day' to a timestamp, it should be identical to adding 24
> hours.

No, it should not --- at least not when the addition traverses a DST
switchover time.

> For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on
> April 4th 2004 didn't exist in that timezone because the clocks were put
> forward and that hour skipped.

The times right at the DST transition are questionable no matter what
we do, but that does not justify your claim that we do not need to fix
this.  For instance, I think 10pm April 3rd (EST) plus '24 hours' ought
to be 11pm April 4th (EDT), but adding '1 day' ought to yield 10pm EDT.
There isn't really any ambiguity about what people will consider the
right answer there.

I think your example has about as much validity as claiming that we
shouldn't support "+ '1 month'" because it's not clear what to do when
adding '1 month' to 'Jan 31'.  Yes, you end up having to define some
corner-case behaviors, but that doesn't render the main cases worthless.

            regards, tom lane

Re: Comment on timezone and interval types

From
Guy Fraser
Date:
Yes For example :

MST = GMT - 7 hours
MDT = GMT - 6 hours

The GMT time remains constant no mater if it is or is not
daylight savings time.

You still want to bill someone for 1 hour of usage from
02:00 MDT to 02:00 MST, but you don't want to bill an
hour from 02:00 MST to 03:00 MDT.

Unless you are using GMT or another timezone that does not
use daylight savings, you should always include the timezone
with the time.

1 day should always be calculated as 24 hours, just as an hour
is calculated as 60 minutes...

Since interval does not store an actual time range, it is not sensitive to
daylight savings.

Where problems occur is when you try to use units larger than a week
because they vary in the number of days per unit depending on the date
range.

I would prefer to see interval state time in :

Days:Hours:Minutes:Seconds.Microseconds

Rather than :

Years Months Days Hours:Minutes:Seconds.Microseconds

Since months and years are not a constant number of days it does not
seem reasonable to use them in calculations to determine days, unless
it is qualified with a start or stop time and date including the time zone.

Since I don't need to account for microseconds or durations larger
than +/- 68 years I usually use an int4 to store time usage in seconds.
Since int4 can be cast into reltime, it is simple to calculate the
beginning or end of the interval with one timestamp with timezone and
an int4 duration. The Storage required for this is 16 bytes ; 12 for the
timestamp and 4 for the int4 {integer}. If you need more accuracy
you could use a timestamp and an interval, but the storage required
would be 24 bytes IIRC.

Stuart Bishop wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Bruno Wolff III wrote:
> | Recently there has been some discussion about attaching a timezone to
> | a timestamp and some other discussion about including a 'day' part
> | in the interval type. These two features impact each other, since
> | if you add a 'day' to a timestamp the result can depend on what
> timezone
> | the timestamp is supposed to be in. It probably makes more sense to use
> | a timezone associated with the timestamp than say the timezone GUC
> or the
> | fixed timezone UTC.
>
> If you add a 'day' to a timestamp, it should be identical to adding 24
> hours. Any other interpretation leads to all sorts of wierd ambiguities.
> For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on
> April 4th 2004 didn't exist in that timezone because the clocks were put
> forward and that hour skipped. If you round up to the nearest existant
> time, you then have the issue that '2am April 3rd + 1 day == 3am Aril
> 3rd + 1 day'.
>
> - --
> Stuart Bishop <stuart@stuartbishop.net>
> http://www.stuartbishop.net/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQFBge+sAfqZj7rGN0oRAgInAJsEuYkxX6/jsaszquhjEX/PH3nXvACfVBW9
> Z3sfU5XGgxSOI77vuOOOzKA=
> =euY6
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
> .
>

--
Guy Fraser
Network Administrator
The Internet Centre
780-450-6787 , 1-888-450-6787

There is a fine line between genius and lunacy, fear not, walk the
line with pride. Not all things will end up as you wanted, but you
will certainly discover things the meek and timid will miss out on.




Re: Comment on timezone and interval types

From
Bruno Wolff III
Date:
On Fri, Oct 29, 2004 at 11:14:31 -0600,
  Guy Fraser <guy@incentre.net> wrote:
>
> 1 day should always be calculated as 24 hours, just as an hour
> is calculated as 60 minutes...

If you want 24 hours you can use 24 hours. Days are not constant length,
just like months aren't constant length.

> Since interval does not store an actual time range, it is not sensitive to
> daylight savings.

When inetervals are added or subtracted from timestamps there is an actual
time range which makes DST transitions relevant.

Re: Comment on timezone and interval types

From
Stuart Bishop
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Wolff III wrote:
| On Fri, Oct 29, 2004 at 11:14:31 -0600,
|   Guy Fraser <guy@incentre.net> wrote:
|
|>1 day should always be calculated as 24 hours, just as an hour
|>is calculated as 60 minutes...
|
|
| If you want 24 hours you can use 24 hours. Days are not constant length,
| just like months aren't constant length.

Days *are* of constant length - check your nearest dictionary, which
will define it as 24 hours or the period of rotation of the earth. If
people see 'day', they think '24 hours' because that is the definition
they have been using since preschool. This breeds sleeping bugs that
nobody notices until the DST transition kicks in and events happen an
hour late or not at all.

What you are talking about is useful, but should be called calendar_day
or something that makes it obvious that it isn't using the traditional
definition.

People are used to months being ambiguous so it is less likely to cause
upsets, although it still bites people because their toolkits definition
of 'month' does not match their business rules of 'month' (which might
be 30 days, 31 days, 4 weeks, calendar month rounded down).

- --
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBizvxAfqZj7rGN0oRAqLAAJ9sNmXB/yv/P6saytH4qrPQ9xdwEACfX8I9
krauCgYalcWsW+7qiPymoYo=
=pMyc
-----END PGP SIGNATURE-----

Re: Comment on timezone and interval types

From
Michael Glaesemann
Date:
On Nov 5, 2004, at 5:38 PM, Stuart Bishop wrote:

> Bruno Wolff III wrote:
> | On Fri, Oct 29, 2004 at 11:14:31 -0600,
> |   Guy Fraser <guy@incentre.net> wrote:
> |
> |>1 day should always be calculated as 24 hours, just as an hour
> |>is calculated as 60 minutes...
> |
> |
> | If you want 24 hours you can use 24 hours. Days are not constant
> length,
> | just like months aren't constant length.
>
> Days *are* of constant length - check your nearest dictionary, which
> will define it as 24 hours or the period of rotation of the earth. If
> people see 'day', they think '24 hours' because that is the definition
> they have been using since preschool. This breeds sleeping bugs that
> nobody notices until the DST transition kicks in and events happen an
> hour late or not at all.
>
> What you are talking about is useful, but should be called calendar_day
> or something that makes it obvious that it isn't using the traditional
> definition.

Could you expand on this a bit? I'm not quite sure what you're getting
at. I think most people would say the period from noon one day until
noon the next would be 1 day. If that day spans a DST change, it will
definitely not be 24 hours, and people might agree if they're asked "is
the period from noon til noon over DST 24 hours?". They'd most likely
say no, I think. Yet, if they're asked if the same period is one day, I
think they'd answer yes. I think this is what Bruno is getting at.

Regards,

Michael Glaesemann
grzm myrealbox com


Re: Comment on timezone and interval types

From
Tom Lane
Date:
Stuart Bishop <stuart@stuartbishop.net> writes:
> | If you want 24 hours you can use 24 hours. Days are not constant length,
> | just like months aren't constant length.

> Days *are* of constant length - check your nearest dictionary, which
> will define it as 24 hours or the period of rotation of the earth.

This is about as relevant to our problems as claiming that we should
ignore leap years because years are really of constant length.

We are trying to emulate the common civil calendar here, and in places
that observe DST, days are *not* of constant length.  If you don't like
this, why are you using the timestamp-with-time-zone datatype (or at
least, why are you using it with a DST-aware zone setting)?

timestamp-without-time-zone will continue to behave as it always has,
so that seems to me to offer a sufficient out for people who really
truly do not want DST-aware calculations.

            regards, tom lane