Thread: Comment on timezone and interval types
Recently there has been some discussion about attaching a timezone to a timestamp and some other discussion about including a 'day' part in the interval type. These two features impact each other, since if you add a 'day' to a timestamp the result can depend on what timezone the timestamp is supposed to be in. It probably makes more sense to use a timezone associated with the timestamp than say the timezone GUC or the fixed timezone UTC.
On Sat, Oct 23, 2004 at 06:49:15PM -0500, Bruno Wolff III wrote: > Recently there has been some discussion about attaching a timezone to > a timestamp and some other discussion about including a 'day' part > in the interval type. These two features impact each other, since > if you add a 'day' to a timestamp the result can depend on what timezone > the timestamp is supposed to be in. It probably makes more sense to use > a timezone associated with the timestamp than say the timezone GUC or the > fixed timezone UTC. I agree. One issue I can think of is that if you store each timestamp as a (seconds,timezone) pair, the storage requirements will balloon, since timezone can be something like "Australia/Sydney" and this will be repeated for every value in the table. I don't know how to deal easily with this since there is no unique identifier to timezones and no implicit order. The only solution I can think of is have initdb create a pg_timezones table which assigns an OID to each timezone it finds. Then the type can use that. I think this is a good solution actually, any thoughts? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn, > I agree. One issue I can think of is that if you store each timestamp > as a (seconds,timezone) pair, the storage requirements will balloon, > since timezone can be something like "Australia/Sydney" and this will > be repeated for every value in the table. I don't know how to deal > easily with this since there is no unique identifier to timezones and > no implicit order. > > The only solution I can think of is have initdb create a pg_timezones > table which assigns an OID to each timezone it finds. Then the type can > use that. > > I think this is a good solution actually, any thoughts? Using OID's is a good idea, but I think a canonical list of known timezone to OID mappings must be maintained and shipped with the PostgreSQL core. If OID's are generated at initdb time, there's a great risk that the OID's will differ between databases using different versions of PostgreSQL. That in turn will have some negative implications for data exchange. Regards, Thomas Hallgren
On Wed, Oct 27, 2004 at 09:21:39AM +0200, Thomas Hallgren wrote: > Martijn, > >I agree. One issue I can think of is that if you store each timestamp > >as a (seconds,timezone) pair, the storage requirements will balloon, > >since timezone can be something like "Australia/Sydney" and this will > >be repeated for every value in the table. I don't know how to deal > >easily with this since there is no unique identifier to timezones and > >no implicit order. > > > >The only solution I can think of is have initdb create a pg_timezones > >table which assigns an OID to each timezone it finds. Then the type can > >use that. > > > >I think this is a good solution actually, any thoughts? > > Using OID's is a good idea, but I think a canonical list of known > timezone to OID mappings must be maintained and shipped with the > PostgreSQL core. How can there be a "canonical list of known timezones" if every operating system has it's own list. Maybe you can provide a base list, but you have to allow for people to make their own. > If OID's are generated at initdb time, there's a great risk that the > OID's will differ between databases using different versions of > PostgreSQL. That in turn might have some negative implications for data > exchange. I doubt it, the OIDs would never be output. Types, triggers, functions etc all have OIDs that never appear in any output anywhere, so why should these. Since PostgreSQL doesn't support you to copying any part of the raw data files between different installations, let alone different versions, I think the issues with data exchange are not a problem. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Oct 27, 2004, at 6:00 PM, Martijn van Oosterhout wrote: > On Wed, Oct 27, 2004 at 09:21:39AM +0200, Thomas Hallgren wrote: >> >> Using OID's is a good idea, but I think a canonical list of known >> timezone to OID mappings must be maintained and shipped with the >> PostgreSQL core. > > How can there be a "canonical list of known timezones" if every > operating system has it's own list. Maybe you can provide a base list, > but you have to allow for people to make their own. My understanding is that with the addition of the zic time zone data to the PostgreSQL server, there's no longer any need to rely on OS time zone data. Some areas may still use OS time zone data--I'm not sure if the all the niggling pieces have been converted yet. One could then produce a canonical list, based on the zic data. Corrections welcome if I've misunderstood something. Regards, Michael Glaesemann grzm myrealbox com
On Wed, 2004-10-27 at 09:00 +0200, Thomas Hallgren wrote: > Using OID's is a good idea, but I think a canonical list of known > timezone to OID mappings must be maintained and shipped with the > PostgreSQL core. > > If OID's are generated at initdb time, there's a great risk that the > OID's will differ between databases using different versions of > PostgreSQL. That in turn will have some negative implications for data > exchange. > > Regards, > Thomas Hallgren > I definitely agree with Thomas . The fact that OIDs are generated at initdb time really scares me since we have different versions of the database engine running; it would really be a nightmare if the OIDs were different from machine to machine
Michael Glaesemann <grzm@myrealbox.com> writes: > On Oct 27, 2004, at 6:00 PM, Martijn van Oosterhout wrote: >> How can there be a "canonical list of known timezones" if every >> operating system has it's own list. Maybe you can provide a base list, >> but you have to allow for people to make their own. > My understanding is that with the addition of the zic time zone data to > the PostgreSQL server, there's no longer any need to rely on OS time > zone data. Correct, but it is still the case that different installations will need to have slightly different timezone lists. Consider for example the australian_timezones kluge we have now, and consider that there are several known cases of zone name conflicts that are not covered by australian_timezones (the one I remember at the moment is IST which both the Israelis and the Indians use; but I think there are some others). I think the most reasonable way to solve this will be to invent a configuration file that lets people list the zone abbreviations they want to use and the corresponding UTC offsets. We will need a mapping method that can cope with changes in such a file. But having said that, I concur with Martijn that there is no problem, because the OIDs (or whatever numeric ID we use) are inside the database and will never be visible outside it. There is no more portability risk here than there is in using platform-native byte order in integers. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bruno Wolff III wrote: | Recently there has been some discussion about attaching a timezone to | a timestamp and some other discussion about including a 'day' part | in the interval type. These two features impact each other, since | if you add a 'day' to a timestamp the result can depend on what timezone | the timestamp is supposed to be in. It probably makes more sense to use | a timezone associated with the timestamp than say the timezone GUC or the | fixed timezone UTC. If you add a 'day' to a timestamp, it should be identical to adding 24 hours. Any other interpretation leads to all sorts of wierd ambiguities. For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on April 4th 2004 didn't exist in that timezone because the clocks were put forward and that hour skipped. If you round up to the nearest existant time, you then have the issue that '2am April 3rd + 1 day == 3am Aril 3rd + 1 day'. - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBge+sAfqZj7rGN0oRAgInAJsEuYkxX6/jsaszquhjEX/PH3nXvACfVBW9 Z3sfU5XGgxSOI77vuOOOzKA= =euY6 -----END PGP SIGNATURE-----
Stuart Bishop <stuart@stuartbishop.net> writes: > If you add a 'day' to a timestamp, it should be identical to adding 24 > hours. No, it should not --- at least not when the addition traverses a DST switchover time. > For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on > April 4th 2004 didn't exist in that timezone because the clocks were put > forward and that hour skipped. The times right at the DST transition are questionable no matter what we do, but that does not justify your claim that we do not need to fix this. For instance, I think 10pm April 3rd (EST) plus '24 hours' ought to be 11pm April 4th (EDT), but adding '1 day' ought to yield 10pm EDT. There isn't really any ambiguity about what people will consider the right answer there. I think your example has about as much validity as claiming that we shouldn't support "+ '1 month'" because it's not clear what to do when adding '1 month' to 'Jan 31'. Yes, you end up having to define some corner-case behaviors, but that doesn't render the main cases worthless. regards, tom lane
Yes For example : MST = GMT - 7 hours MDT = GMT - 6 hours The GMT time remains constant no mater if it is or is not daylight savings time. You still want to bill someone for 1 hour of usage from 02:00 MDT to 02:00 MST, but you don't want to bill an hour from 02:00 MST to 03:00 MDT. Unless you are using GMT or another timezone that does not use daylight savings, you should always include the timezone with the time. 1 day should always be calculated as 24 hours, just as an hour is calculated as 60 minutes... Since interval does not store an actual time range, it is not sensitive to daylight savings. Where problems occur is when you try to use units larger than a week because they vary in the number of days per unit depending on the date range. I would prefer to see interval state time in : Days:Hours:Minutes:Seconds.Microseconds Rather than : Years Months Days Hours:Minutes:Seconds.Microseconds Since months and years are not a constant number of days it does not seem reasonable to use them in calculations to determine days, unless it is qualified with a start or stop time and date including the time zone. Since I don't need to account for microseconds or durations larger than +/- 68 years I usually use an int4 to store time usage in seconds. Since int4 can be cast into reltime, it is simple to calculate the beginning or end of the interval with one timestamp with timezone and an int4 duration. The Storage required for this is 16 bytes ; 12 for the timestamp and 4 for the int4 {integer}. If you need more accuracy you could use a timestamp and an interval, but the storage required would be 24 bytes IIRC. Stuart Bishop wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Bruno Wolff III wrote: > | Recently there has been some discussion about attaching a timezone to > | a timestamp and some other discussion about including a 'day' part > | in the interval type. These two features impact each other, since > | if you add a 'day' to a timestamp the result can depend on what > timezone > | the timestamp is supposed to be in. It probably makes more sense to use > | a timezone associated with the timestamp than say the timezone GUC > or the > | fixed timezone UTC. > > If you add a 'day' to a timestamp, it should be identical to adding 24 > hours. Any other interpretation leads to all sorts of wierd ambiguities. > For example, what is '2am April 3rd 2004 US/Eastern + 1 day'? 2am on > April 4th 2004 didn't exist in that timezone because the clocks were put > forward and that hour skipped. If you round up to the nearest existant > time, you then have the issue that '2am April 3rd + 1 day == 3am Aril > 3rd + 1 day'. > > - -- > Stuart Bishop <stuart@stuartbishop.net> > http://www.stuartbishop.net/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQFBge+sAfqZj7rGN0oRAgInAJsEuYkxX6/jsaszquhjEX/PH3nXvACfVBW9 > Z3sfU5XGgxSOI77vuOOOzKA= > =euY6 > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > . > -- Guy Fraser Network Administrator The Internet Centre 780-450-6787 , 1-888-450-6787 There is a fine line between genius and lunacy, fear not, walk the line with pride. Not all things will end up as you wanted, but you will certainly discover things the meek and timid will miss out on.
On Fri, Oct 29, 2004 at 11:14:31 -0600, Guy Fraser <guy@incentre.net> wrote: > > 1 day should always be calculated as 24 hours, just as an hour > is calculated as 60 minutes... If you want 24 hours you can use 24 hours. Days are not constant length, just like months aren't constant length. > Since interval does not store an actual time range, it is not sensitive to > daylight savings. When inetervals are added or subtracted from timestamps there is an actual time range which makes DST transitions relevant.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bruno Wolff III wrote: | On Fri, Oct 29, 2004 at 11:14:31 -0600, | Guy Fraser <guy@incentre.net> wrote: | |>1 day should always be calculated as 24 hours, just as an hour |>is calculated as 60 minutes... | | | If you want 24 hours you can use 24 hours. Days are not constant length, | just like months aren't constant length. Days *are* of constant length - check your nearest dictionary, which will define it as 24 hours or the period of rotation of the earth. If people see 'day', they think '24 hours' because that is the definition they have been using since preschool. This breeds sleeping bugs that nobody notices until the DST transition kicks in and events happen an hour late or not at all. What you are talking about is useful, but should be called calendar_day or something that makes it obvious that it isn't using the traditional definition. People are used to months being ambiguous so it is less likely to cause upsets, although it still bites people because their toolkits definition of 'month' does not match their business rules of 'month' (which might be 30 days, 31 days, 4 weeks, calendar month rounded down). - -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBizvxAfqZj7rGN0oRAqLAAJ9sNmXB/yv/P6saytH4qrPQ9xdwEACfX8I9 krauCgYalcWsW+7qiPymoYo= =pMyc -----END PGP SIGNATURE-----
On Nov 5, 2004, at 5:38 PM, Stuart Bishop wrote: > Bruno Wolff III wrote: > | On Fri, Oct 29, 2004 at 11:14:31 -0600, > | Guy Fraser <guy@incentre.net> wrote: > | > |>1 day should always be calculated as 24 hours, just as an hour > |>is calculated as 60 minutes... > | > | > | If you want 24 hours you can use 24 hours. Days are not constant > length, > | just like months aren't constant length. > > Days *are* of constant length - check your nearest dictionary, which > will define it as 24 hours or the period of rotation of the earth. If > people see 'day', they think '24 hours' because that is the definition > they have been using since preschool. This breeds sleeping bugs that > nobody notices until the DST transition kicks in and events happen an > hour late or not at all. > > What you are talking about is useful, but should be called calendar_day > or something that makes it obvious that it isn't using the traditional > definition. Could you expand on this a bit? I'm not quite sure what you're getting at. I think most people would say the period from noon one day until noon the next would be 1 day. If that day spans a DST change, it will definitely not be 24 hours, and people might agree if they're asked "is the period from noon til noon over DST 24 hours?". They'd most likely say no, I think. Yet, if they're asked if the same period is one day, I think they'd answer yes. I think this is what Bruno is getting at. Regards, Michael Glaesemann grzm myrealbox com
Stuart Bishop <stuart@stuartbishop.net> writes: > | If you want 24 hours you can use 24 hours. Days are not constant length, > | just like months aren't constant length. > Days *are* of constant length - check your nearest dictionary, which > will define it as 24 hours or the period of rotation of the earth. This is about as relevant to our problems as claiming that we should ignore leap years because years are really of constant length. We are trying to emulate the common civil calendar here, and in places that observe DST, days are *not* of constant length. If you don't like this, why are you using the timestamp-with-time-zone datatype (or at least, why are you using it with a DST-aware zone setting)? timestamp-without-time-zone will continue to behave as it always has, so that seems to me to offer a sufficient out for people who really truly do not want DST-aware calculations. regards, tom lane