Thread: storing TZ along timestamps
Hi, One of our customers is interested in being able to store original timezone along with a certain timestamp. It is currently possible to store a TZ in a separate column, but this is a bit wasteful and not very convenient anyway. There are all sorts of UI issues that need to be resolved in order for this to be a complete feature proposal, but the first thing that we discussed was what is the storage going to look like. Of course, one thing we don't want is to store the complete TZ name as text. So the first thing is cataloguing timezone names, and assigning an ID to each (maybe an OID). If we do that, then we can store the OID of the timezone name along the int64/float8 of the actual timestamp value. Right now we rely on the tzdata files on disk for things like pg_timezone_names and other accesses of TZ data; so the files are the authoritative source of TZ info. So we need to ensure that whenever the files are updated, the catalogs are updated as well. I think we could make this work if we "refreshed" the catalog from the files on SIGHUP if the directory changes (say, a new timezone is created). Note that I am currently proposing to store only the zone names in the catalog, not the full TZ data. Are there objections to the general idea? If not, I'll flesh a more complete proposal. -- Álvaro Herrera <alvherre@commandprompt.com>
Alvaro Herrera <alvherre@commandprompt.com> writes: > Right now we rely on the tzdata files on disk for things like > pg_timezone_names and other accesses of TZ data; so the files are the > authoritative source of TZ info. So we need to ensure that whenever the > files are updated, the catalogs are updated as well. > I think we could make this work if we "refreshed" the catalog from the > files on SIGHUP if the directory changes (say, a new timezone is > created). (1) SIGHUP processing normally occurs outside any transaction. (2) The only obvious way to ensure the "refresh" is done once, and not once per backend, is to have the postmaster do it ... which is a nonstarter for many reasons. I'd suggest instead considering something like the pg_collations approach: load up the catalog once at initdb. If the user really needs to add to the set of accessible TZ names later, give him a tool to do that. But it's 100% not worth either the implementation pain or the cycles to try to auto-update the catalog, especially not as often as once per SIGHUP. BTW, what will you do about pg_upgrade? Ensuring the OID mapping doesn't change seems like loads of fun. regards, tom lane
On May 27, 2011, at 1:43 PM, Alvaro Herrera wrote: > Right now we rely on the tzdata files on disk for things like > pg_timezone_names and other accesses of TZ data; so the files are the > authoritative source of TZ info. So we need to ensure that whenever the > files are updated, the catalogs are updated as well. > I think we could make this work if we "refreshed" the catalog from the > files on SIGHUP if the directory changes (say, a new timezone is > created). Note that I am currently proposing to store only the zone > names in the catalog, not the full TZ data. > > Are there objections to the general idea? If not, I'll flesh a more > complete proposal. I like it, but what do you do when a TZ has been renamed or has ceased to exist. Or, worse, existed last week, so last week'sdates might still use it, but next week's must not? Best, David
"David E. Wheeler" <david@kineticode.com> wrote: > I like it, but what do you do when a TZ has been renamed or has > ceased to exist. Or, worse, existed last week, so last week's > dates might still use it, but next week's must not? I think the key thing is that the timestamp portion of it would be identical to our current TIMESTAMP WITH TIME ZONE -- always store it in the value UTC zone. That way comparisons and math between timestamps could remain sane. The stored time zone portion would be what it would be the display format, if usable. In an extreme situation like you describe above, I guess you could fall back on what we do now for display of a timestamptz value. Personally, I think it would be good to move a bit closer to the standard by including a time zone in a TIMESTAMP WITH TIME ZONE value. The biggest problem I can see is how to try to do this in a standard conforming fashion without breaking existing code. It would seem more than a little odd to support the standard semantics with nonstandard syntax and vice versa. -Kevin
"David E. Wheeler" <david@kineticode.com> writes: > I like it, but what do you do when a TZ has been renamed or has ceased > to exist. As far as that goes, I think "nothing" is a sufficient answer. There's no requirement that an OID in the mapping table correspond to a live TZ. It's just a more compact way of storing a string name. regards, tom lane
On May 27, 2011, at 2:35 PM, Tom Lane wrote: > "David E. Wheeler" <david@kineticode.com> writes: >> I like it, but what do you do when a TZ has been renamed or has ceased >> to exist. > > As far as that goes, I think "nothing" is a sufficient answer. There's > no requirement that an OID in the mapping table correspond to a live TZ. > It's just a more compact way of storing a string name. Well then you'd just want to be sure to never delete TZs. I think the issue of trying this week to use a TZ that was removed last week might be more problematic. I mean, we couldjust let the user use it, but that hardly seems wise… Best, David
On Fri, May 27, 2011 at 2:32 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > I think the key thing is that the timestamp portion of it would be > identical to our current TIMESTAMP WITH TIME ZONE -- always store it > in the value UTC zone. Fwiw our timestamp with time zone stores seconds since the epoch. This is a quantity which is independent of timezones entirely. Excluding relativistic effects there have been the same number of time zones since that point in time regardless of where you stand relative to the sun. My question for Alvarro is whether he really wants the text label for the time zone at all, or just the offset which was used to enter it. That is, if I enter "12:00pm" with my current time zone set to GMT and later update the tzdata on the machine to start summer time on a earlier date should the data type now show "1:00pm BST" or should it still display "12:00pm +000" and leave it up to the reader to decide whether why I entered it in a weird time zone for that time of year? -- greg
On 05/27/2011 01:43 PM, Alvaro Herrera wrote: > Hi, > > One of our customers is interested in being able to store original > timezone along with a certain timestamp. I am very interested in the use-case for this (in part as I'm working on a PG related time talk). My experience thus far is that people who want this do not fully understand the nature of date-time calculations and variables in PG. > It is currently possible to store a TZ in a separate column, but this is > a bit wasteful and not very convenient anyway. > > Are there objections to the general idea? If not, I'll flesh a more > complete proposal. I'm not crazy about it. Although time-stamp-with-time-zone is, perhaps, a bad name for what is actually a "point in time", a point-in-time is what timestamptz represents. I can enter it and allow my defaults to take over, specify abbreviations, explicit offsets or long names none of which change the actual point in time. Likewise, I can display said point-in-time in any of dozens of ways according to my needs. steve=# select '2011-05-27 12:34'::timestamptz; timestamptz ------------------------ 2011-05-27 12:34:00-07 steve=# select '2011-05-27 12:34-07'::timestamptz; timestamptz ------------------------ 2011-05-27 12:34:00-07 steve=# select '2011-05-27 12:34 PDT'::timestamptz; timestamptz ------------------------ 2011-05-27 12:34:00-07 steve=# select '2011-05-27 11:34 PST'::timestamptz; timestamptz ------------------------ 2011-05-27 12:34:00-07 steve=# select '2011-05-27 15:34 US/Eastern'::timestamptz; timestamptz ------------------------ 2011-05-27 12:34:00-07 select now() - '02:58:54.605041'::interval; ?column? ------------------------------- 2011-05-27 12:34:00.394959-07 Granted, I'm a random sample of 1, but I've never found anyone with a real need for this feature - especially since the capability already exists to achieve the requested result, and much more flexibly, by either a separate column or a user-defined type. Questions: What would be the storage impact (tables, indexes and backups) for those of use with tens-of-millions of pieces of timestamp data? What type of timestamp would be stored? Abbreviated/offset (PST, -07), full (US/Eastern) or a mix? Is there an expectation that the stored time zone information would be used for any calculation purposes? If so, how would rules be applied? Would there be any form of error-checking? Currently PG accepts non-existent time zones but maps them to UTC: steve=# select '2011-05-27 15:34'::timestamptz at time zone 'US/f00'; timezone --------------------- 2011-05-27 15:34:00 Would there be any impact to existing queries? How would dump/restore issues be handled - especially if the time-zone info changes in between? More as I think of them. Cheers, Steve
On Fri, May 27, 2011 at 4:13 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > I am very interested in the use-case for this (in part as I'm working on a > PG related time talk). My experience thus far is that people who want this > do not fully understand the nature of date-time calculations and variables > in PG. The use cases I recall having been mentioned in the past were accurate data retention and calendaring applications. Accurate data retention for things like drug trials need to guarantee they retain precisely what the user entered, not an equivalent value. If you run a report on a drug trial you need to see that the event was recorded as occuring at 1:00pm EST not 6:00pm GMT even if you happen to run the report in London. And calendaring apps want to know what timezone is attached to an event, not only the point in time at which it occurs. If your plane flight departs at 12:00pm GMT and lands at 2:00pm EST you need to know that to book your taxi at 2:30pm EST -- not 7:30pm GMT. Both of these two cases can be handled differently. The former by storing the raw text inputs and then storing the interpreted value as a derived column separetly, and the latter by storing the local time zone to use for display as an additional attribute along with the local address and other attributes of the calendar event. -- greg
On 05/27/2011 04:29 PM, Greg Stark wrote: > On Fri, May 27, 2011 at 4:13 PM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: >> I am very interested in the use-case for this (in part as I'm working on a >> PG related time talk). My experience thus far is that people who want this >> do not fully understand the nature of date-time calculations and variables >> in PG. > The use cases I recall having been mentioned in the past were accurate > data retention and calendaring applications. > > Accurate data retention for things like drug trials need to guarantee > they retain precisely what the user entered, not an equivalent value. > If you run a report on a drug trial you need to see that the event was > recorded as occuring at 1:00pm EST not 6:00pm GMT even if you happen > to run the report in London. > > And calendaring apps want to know what timezone is attached to an > event, not only the point in time at which it occurs. If your plane > flight departs at 12:00pm GMT and lands at 2:00pm EST you need to know > that to book your taxi at 2:30pm EST -- not 7:30pm GMT. > > Both of these two cases can be handled differently. The former by > storing the raw text inputs and then storing the interpreted value as > a derived column separetly, and the latter by storing the local time > zone to use for display as an additional attribute along with the > local address and other attributes of the calendar event. > So the proposed change does not handle the first case as you need to capture the raw input. And the second case is already well handled. In fact calendaring is a great example. I enter the time for the teleconference and PG nicely uses my default timezone to store the point-in-time. When you retrieve it, it is shown in your timezone and we both pick up the phone at the correct time. And if I know I'll be somewhere else at that time, I just ask for the data in that zone. Altering the data type gains nothing. Cheers, Steve
On May 27, 2011, at 6:29 PM, Greg Stark wrote: > Both of these two cases can be handled differently. The former by > storing the raw text inputs and then storing the interpreted value as > a derived column separetly, and the latter by storing the local time > zone to use for display as an additional attribute along with the > local address and other attributes of the calendar event. Which means you're back to a very cumbersome method that involves another field. That's a tremendous amount of extra code. We run multiple businesses around the globe. Each business operates in it's own timezone, and 90% of the time we want thingshandled in that timezone. The wheels fall off the wagon if we try and combine data from multiple locations into a singledatabase; there's no reasonable way to say: give me the data in this field *at the timezone that was originally entered*,except for not storing timezone data at all. If we don't store timezone data at all, then it's impossible to determinean actual point in time that something happened at. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, May 27, 2011 at 04:43:28PM -0400, Alvaro Herrera wrote: > Hi, > > One of our customers is interested in being able to store original > timezone along with a certain timestamp. I've felt that pain here and there too... > So the first thing is cataloguing timezone names, and assigning an ID to > each (maybe an OID). If we do that, then we can store the OID of the > timezone name along the int64/float8 of the actual timestamp value. > > Right now we rely on the tzdata files on disk for things like > pg_timezone_names and other accesses of TZ data; so the files are the > authoritative source of TZ info. So we need to ensure that whenever the > files are updated, the catalogs are updated as well. Problem with this approach (mapping external time zone names to OIDs) is: dump/restore would only be meaningful if you "carry over" the time zone data, right? That is: two independent systems are likely to have different mappings (even if at some point they have the "same" TZ data?) What would be a solution to that? (a) A central, "official" catalog, with only additions, never deletions (perhaps with some space carved out for "local"additions, to minimize conflicts)?(b) A hash of the time zone name? Both not very good ideas, I know. Although (a) might be less bad than it seems. Most Unixoids (including OSX) seem to have basically Olson's. Don't know about Windows, but it might seem feasible to make some mapping (or union). Only important rule: no backtrack :-) Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFN4JWtBcgs9XrR2kYRAt+mAJ0atx3u6pll50+s4vVwCKZUjqmnSQCffWNe gzSFgRCFUvsd8pbH1Qm/ho4= =FVhO -----END PGP SIGNATURE-----
On fre, 2011-05-27 at 16:57 -0700, Steve Crawford wrote: > And the second case is already well handled. In fact calendaring is a > great example. I enter the time for the teleconference and PG nicely > uses my default timezone to store the point-in-time. When you > retrieve > it, it is shown in your timezone and we both pick up the phone at the > correct time. And if I know I'll be somewhere else at that time, I > just > ask for the data in that zone. Altering the data type gains nothing. How about a recurring appointment that happens every Tuesday whenever it is 9:00am in California, independent of DST (in California or where ever the participant actually is). I'm not sure how to solve that within the SQL framework. You might need to use time with time zone with a placeholder timezone, and then a rule that date + time with time zone creates a timestamp with time zone that resolves the time zone for that particular day.
On 05/28/2011 02:58 PM, Peter Eisentraut wrote: > On fre, 2011-05-27 at 16:57 -0700, Steve Crawford wrote: >> And the second case is already well handled. In fact calendaring is a >> great example. I enter the time for the teleconference and PG nicely >> uses my default timezone to store the point-in-time. When you >> retrieve >> it, it is shown in your timezone and we both pick up the phone at the >> correct time. And if I know I'll be somewhere else at that time, I >> just >> ask for the data in that zone. Altering the data type gains nothing. > How about a recurring appointment that happens every Tuesday whenever it > is 9:00am in California, independent of DST (in California or where ever > the participant actually is). I'm not sure how to solve that within the > SQL framework. You might need to use time with time zone with a > placeholder timezone, and then a rule that date + time with time zone > creates a timestamp with time zone that resolves the time zone for that > particular day. > > Interval math is pretty smart about that: select '2011-05-31 09:00'::timestamp at time zone 'PST8PDT' + ('7 days'::interval * generate_series(1,60)); ------------------------ 2011-06-07 09:00:00-07 2011-06-14 09:00:00-07 2011-06-21 09:00:00-07 2011-06-28 09:00:00-07 2011-07-0509:00:00-07 2011-07-12 09:00:00-07 2011-07-19 09:00:00-07 2011-07-26 09:00:00-07 2011-08-02 09:00:00-07 2011-08-0909:00:00-07 2011-08-16 09:00:00-07 2011-08-23 09:00:00-07 2011-08-30 09:00:00-07 2011-09-06 09:00:00-07 2011-09-1309:00:00-07 2011-09-20 09:00:00-07 2011-09-27 09:00:00-07 2011-10-04 09:00:00-07 2011-10-11 09:00:00-07 2011-10-1809:00:00-07 2011-10-25 09:00:00-07 2011-11-01 09:00:00-07 2011-11-08 09:00:00-08 2011-11-15 09:00:00-08 2011-11-2209:00:00-08 2011-11-29 09:00:00-08 2011-12-06 09:00:00-08 2011-12-13 09:00:00-08 2011-12-20 09:00:00-08 2011-12-2709:00:00-08 2012-01-03 09:00:00-08 2012-01-10 09:00:00-08 2012-01-17 09:00:00-08 2012-01-24 09:00:00-08 2012-01-3109:00:00-08 2012-02-07 09:00:00-08 2012-02-14 09:00:00-08 2012-02-21 09:00:00-08 2012-02-28 09:00:00-08 2012-03-0609:00:00-08 2012-03-13 09:00:00-07 2012-03-20 09:00:00-07 2012-03-27 09:00:00-07 2012-04-03 09:00:00-07 2012-04-1009:00:00-07 2012-04-17 09:00:00-07 2012-04-24 09:00:00-07 2012-05-01 09:00:00-07 2012-05-08 09:00:00-07 2012-05-1509:00:00-07 2012-05-22 09:00:00-07 2012-05-29 09:00:00-07 ... Or if you have to call in from London (notice the blips between 4pm and 5pm due to London and California switching to/from DST on different dates): select ('2011-05-31 09:00'::timestamp at time zone 'PST8PDT' + ('7 days'::interval * generate_series(1,60))) at time zone 'Europe/London'; --------------------- 2011-06-07 17:00:00 2011-06-14 17:00:00 2011-06-21 17:00:00 2011-06-28 17:00:00 2011-07-05 17:00:002011-07-12 17:00:00 2011-07-19 17:00:00 2011-07-26 17:00:00 2011-08-02 17:00:00 2011-08-09 17:00:00 2011-08-16 17:00:002011-08-23 17:00:00 2011-08-30 17:00:00 2011-09-06 17:00:00 2011-09-13 17:00:00 2011-09-20 17:00:00 2011-09-27 17:00:002011-10-04 17:00:00 2011-10-11 17:00:00 2011-10-18 17:00:00 2011-10-25 17:00:00 2011-11-01 16:00:00 2011-11-08 17:00:002011-11-15 17:00:00 2011-11-22 17:00:00 2011-11-29 17:00:00 2011-12-06 17:00:00 2011-12-13 17:00:00 2011-12-20 17:00:002011-12-27 17:00:00 2012-01-03 17:00:00 2012-01-10 17:00:00 2012-01-17 17:00:00 2012-01-24 17:00:00 2012-01-31 17:00:002012-02-07 17:00:00 2012-02-14 17:00:00 2012-02-21 17:00:00 2012-02-28 17:00:00 2012-03-06 17:00:00 2012-03-13 16:00:002012-03-20 16:00:00 2012-03-27 17:00:00 2012-04-03 17:00:00 2012-04-10 17:00:00 2012-04-17 17:00:00 2012-04-24 17:00:002012-05-01 17:00:00 2012-05-08 17:00:00 ... Cheers, Steve
Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011: > On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: > > Hi, > > > > One of our customers is interested in being able to store original > > timezone along with a certain timestamp. > > I assume that you're talking about a new data type, not augmenting the > current types, correct? Yes -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: > Hi, > > One of our customers is interested in being able to store original > timezone along with a certain timestamp. I assume that you're talking about a new data type, not augmenting the current types, correct? Regards,Jeff Davis
On Wed, Jun 1, 2011 at 8:18 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011: >> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: >> > Hi, >> > >> > One of our customers is interested in being able to store original >> > timezone along with a certain timestamp. >> >> I assume that you're talking about a new data type, not augmenting the >> current types, correct? > > Yes why not use a composite type for that? performance maybe? merlin
On 06/01/2011 05:18 PM, Alvaro Herrera wrote: > Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011: >> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: >>> Hi, >>> >>> One of our customers is interested in being able to store original >>> timezone along with a certain timestamp. >> I assume that you're talking about a new data type, not augmenting the >> current types, correct? > Yes > That eliminates many of my issues - I just didn't want the type changed underneath me. But some considerations remain - including some new that have crossed my mind: 1. How would the time-zone be defined in this composite? Offset from GMT? Timezone (well, link thereto) with all DST rules intact? Would "extract" need to be modified to include the ability to grab the timezone? 2. What would be the precedence for defining originating timezone? Default? Set timezone to? ...at time zone...? Based on the timestamp (2011-06-02 12:34:56-07)? 3. Would indexing/sorting include the originating zone? If so, how would time zones collate (base offset, actual offset based on the timestamp, name)? 4. What would be the corresponding type when used with Perl/PHP/Python/... applications - would they require special non-standard handling? Since this isn't going to alter my current beloved timestamptz and I don't have a use-case I leave the decisions on the above to others. But in my imagined use-cases I still see the originating zone as a separate piece of information better handled as a different column - for example sorting by timestamp plus priority or selecting everything for a specific time zone. Cheers, Steve
Excerpts from Merlin Moncure's message of mié jun 01 21:36:32 -0400 2011: > On Wed, Jun 1, 2011 at 8:18 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011: > >> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: > >> > Hi, > >> > > >> > One of our customers is interested in being able to store original > >> > timezone along with a certain timestamp. > >> > >> I assume that you're talking about a new data type, not augmenting the > >> current types, correct? > > > > Yes > > why not use a composite type for that? performance maybe? To avoid having to implement all the operators and lookup tables (of timezones) in userland, mainly. Probably performance would be affected too, not sure, but that's not the main point. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, 2011-06-02 at 18:46 +0000, Christopher Browne wrote: > > 1. How would the time-zone be defined in this composite? Offset from GMT? > > Timezone (well, link thereto) with all DST rules intact? Would "extract" > > need to be modified to include the ability to grab the timezone? > > That doesn't seem appropriate, because timezones are not always > represented by strict offsets from GMT. Some frequently-used > timezones represent variable offsets. ("EDT/EST", I'm looking at > you!) In conjunction with a specific timestamp, a timezone does strictly map to a single offset. That is, unless it's a timestamp in the future, and someone decides to adjust a timezone before the timestamp actually occurs. But that's a problem with the current timestamptz implementation anyway... > > Since this isn't going to alter my current beloved timestamptz and I don't > > have a use-case I leave the decisions on the above to others. But in my > > imagined use-cases I still see the originating zone as a separate piece of > > information better handled as a different column - for example sorting by > > timestamp plus priority or selecting everything for a specific time zone. I have a similar inclination. ">" seems like the fundamental operation you'd want to perform on any timestamp (perhaps more so than equality), and that's not well-defined if there is no total order (but several meaningful partial orders). However, I do see some nice benefits, too. The main one is that you can easily get either local time or GMT out of it. So you can answer queries such as "which of these activities occurred outside of normal business hours" as well as "which of these events happened first". It would take a little care to use properly, however. Regards,Jeff Davis
On Thu, Jun 2, 2011 at 6:06 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > On 06/01/2011 05:18 PM, Alvaro Herrera wrote: >> >> Excerpts from Jeff Davis's message of mié jun 01 19:57:40 -0400 2011: >>> >>> On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: >>>> >>>> Hi, >>>> >>>> One of our customers is interested in being able to store original >>>> timezone along with a certain timestamp. >>> >>> I assume that you're talking about a new data type, not augmenting the >>> current types, correct? >> >> Yes >> > That eliminates many of my issues - I just didn't want the type changed > underneath me. But some considerations remain - including some new that have > crossed my mind: > > 1. How would the time-zone be defined in this composite? Offset from GMT? > Timezone (well, link thereto) with all DST rules intact? Would "extract" > need to be modified to include the ability to grab the timezone? That doesn't seem appropriate, because timezones are not always represented by strict offsets from GMT. Some frequently-used timezones represent variable offsets. ("EDT/EST", I'm looking at you!) > 2. What would be the precedence for defining originating timezone? Default? > Set timezone to? ...at time zone...? Based on the timestamp (2011-06-02 > 12:34:56-07)? > > 3. Would indexing/sorting include the originating zone? If so, how would > time zones collate (base offset, actual offset based on the timestamp, > name)? Some timezones contain discontinuities, so that the notion of sorting them seems implausible, as there isn't properly an "ordering." > 4. What would be the corresponding type when used with Perl/PHP/Python/... > applications - would they require special non-standard handling? > > Since this isn't going to alter my current beloved timestamptz and I don't > have a use-case I leave the decisions on the above to others. But in my > imagined use-cases I still see the originating zone as a separate piece of > information better handled as a different column - for example sorting by > timestamp plus priority or selecting everything for a specific time zone. I'd tend to think that this is best captured by having two pieces of information: a) The timestamp in UTC terms, so that it's a totally stable value, which is amenable to comparison against other timestamps (irrespective of timezone) b) A symbolic representation of the timezone, perhaps its name. It's not at all obvious that these ought to be treated as a singular data type. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Fri, 2011-05-27 at 16:43 -0400, Alvaro Herrera wrote: > One of our customers is interested in being able to store original > timezone along with a certain timestamp. Another thing to consider is that this will eliminate any useful total order. You could define an arbitrary total order, of course, just to allow BTrees for equality searches. However, I don't think you should define ">" (and other non-equality comparator operators) according to that total order -- they should be more hidden like "~>~". ">" should not exist as an operator over this type at all. I also do not like the idea of having "=" mean "equivalent after timezone adjustment". If we're making a distinction between "2000-01-01 10:00:00 +03" and "2000-01-01 9:00:00 +02", then "=" should not obscure that distinction. Regards,Jeff Davis
On Thu, Jun 2, 2011 at 12:55 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> >> > One of our customers is interested in being able to store original >> >> > timezone along with a certain timestamp. >> >> >> >> I assume that you're talking about a new data type, not augmenting the >> >> current types, correct? >> > >> > Yes >> >> why not use a composite type for that? performance maybe? > > To avoid having to implement all the operators and lookup tables (of > timezones) in userland, mainly. Probably performance would be affected > too, not sure, but that's not the main point. right -- I see where you are going with this. ok, some random questions: *) what about making a 'timezone' type in addition to (or even instead of) the timezonetz_inputtz? Then you could in theory treat the your proposed type as a composite of timezonetz and timezone, just as timestamptz is a 'composite' of date and timetz. (note I'm not necessarily arguing against the creation of a specific unified type -- performance is important for time types). *) in/out formats...what would be the wire formats of your type -- in particular, the binary format? *) do you see your type interacting with various datetime function (like extract) or will a cast be required? Interval math? *) how does ordering and uniqueness apply to same timestamps with unique input time zones? merlin
On Thu, Jun 2, 2011 at 3:02 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2011-06-02 at 18:46 +0000, Christopher Browne wrote: >> > 1. How would the time-zone be defined in this composite? Offset from GMT? >> > Timezone (well, link thereto) with all DST rules intact? Would "extract" >> > need to be modified to include the ability to grab the timezone? >> >> That doesn't seem appropriate, because timezones are not always >> represented by strict offsets from GMT. Some frequently-used >> timezones represent variable offsets. ("EDT/EST", I'm looking at >> you!) > > In conjunction with a specific timestamp, a timezone does strictly map > to a single offset. But that doesn't seem like enough, because if someone adds '1 day', knowing the offset isn't sufficient to figure out the answer. You have to know where the DST boundary is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 2011-06-02 at 20:28 -0400, Robert Haas wrote: > But that doesn't seem like enough, because if someone adds '1 day', > knowing the offset isn't sufficient to figure out the answer. You > have to know where the DST boundary is. Good point, I guess the timezone itself needs to be stored. That's a little unfortunate, because timezones are somewhat of a moving target (which I think was Tom's point). That means that we'd need an entire history (and future?) of timezone definitions, and apply the timezone definition as of the associated timestamp to get the offset. Or, should we apply the timezone definition as of the "real" time the value was entered? Regards,Jeff Davis
On Jun 2, 2011, at 7:48 PM, Jeff Davis wrote: > On Thu, 2011-06-02 at 20:28 -0400, Robert Haas wrote: >> But that doesn't seem like enough, because if someone adds '1 day', >> knowing the offset isn't sufficient to figure out the answer. You >> have to know where the DST boundary is. > > Good point, I guess the timezone itself needs to be stored. That's a > little unfortunate, because timezones are somewhat of a moving target > (which I think was Tom's point). > > That means that we'd need an entire history (and future?) of timezone > definitions, and apply the timezone definition as of the associated > timestamp to get the offset. Or, should we apply the timezone definition > as of the "real" time the value was entered? As someone else mentioned, timestamptz suffers the exact same problems. I'm torn between whether the type should store the original time or the original time converted to GMT. I believe you wouldhave the most accuracy if you stored the original time... but then indexing becomes problematic. I don't know if thisdata quality issue can be solved by anything short of somehow storing the actual timezone rule that was in place whenthe data was set. Speaking of input; someone asked what timezone should be used as the "original" timezone. We should use whatever timezonewas passed in with the value, and if one wasn't passed in we should use whatever the timezone GUC is set to (I'massuming that's what timestamptz does). -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On tor, 2011-06-02 at 22:58 -0500, Jim Nasby wrote: > I'm torn between whether the type should store the original time or > the original time converted to GMT. I believe you would have the most > accuracy if you stored the original time... but then indexing becomes > problematic. I don't know if this data quality issue can be solved by > anything short of somehow storing the actual timezone rule that was in > place when the data was set. > > Speaking of input; someone asked what timezone should be used as the > "original" timezone. We should use whatever timezone was passed in > with the value, and if one wasn't passed in we should use whatever the > timezone GUC is set to (I'm assuming that's what timestamptz does). I think all of that comes down to business rules. Train and airline companies etc. have probably figured this out for themselves, not necessarily consistent with each other. So it's doubtful whether a single solution that we can hash out here is going to work well in practice. I think making it easier to implement particular business rules in this area in userspace (composite types, etc.) might go a longer way.
On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim@nasby.net> wrote: > > I'm torn between whether the type should store the original time or the original time converted to GMT. This is the wrong way to think about it. We *never* store time "converted to GMT". When we want to represent a point in time we represent it as seconds since the epoch. This is a quantity that is the same everywhere in the world (modulo relativistic effects...) ie. it's a concept which is completely independent of timezones. The question here is how to represent more complex concepts than simply points in time. I think the two concepts under discussion are a) a composite type representing a point in time and a timezone it should be interpreted in for operations and display and b) the original input provided which is a text string with the constraint that it's a valid input which can be interpreted as a point in time. -- greg
On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: > On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim@nasby.net> wrote: >> >> I'm torn between whether the type should store the original time or the original time converted to GMT. > > This is the wrong way to think about it. We *never* store time > "converted to GMT". When we want to represent a point in time we > represent it as seconds since the epoch. Right. Sorry, my bad. > The question here is how to represent more complex concepts than > simply points in time. I think the two concepts under discussion are > a) a composite type representing a point in time and a timezone it > should be interpreted in for operations and display and b) the > original input provided which is a text string with the constraint > that it's a valid input which can be interpreted as a point in time. My fear with A is that something could change that would make it impossible to actually get back to the time that was originallyentered. For example, a new version of the timezone database could change something. Though, that problem alsoexists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Hi, On May 27, 2011, at 11:43 PM, Alvaro Herrera wrote: > > One of our customers is interested in being able to store original > timezone along with a certain timestamp. > > It is currently possible to store a TZ in a separate column, but this is > a bit wasteful and not very convenient anyway. > > There are all sorts of UI issues that need to be resolved in order for > this to be a complete feature proposal, but the first thing that we > discussed was what is the storage going to look like. Of course, one > thing we don't want is to store the complete TZ name as text. > > So the first thing is cataloguing timezone names, and assigning an ID to > each (maybe an OID). If we do that, then we can store the OID of the > timezone name along the int64/float8 of the actual timestamp value. So, I'd think there are 2 reasonable approaches to storing the timezone part: 1. Store the timezone abbreviation (i.e. 'EST' along w/ the timestamp data). 2. Assign OID to each of the timezones and store it w/ the timestamp. The first option seem to avoid the necessity of creating a new system catalog for timezone information and the burden of updating it, because current implementation is already capable of translating abbreviations to useful timezone information. The question is, whether just a TZ abbreviation is sufficient to uniquely identify the timezone and get the offset and DST rules. If it's not sufficient, how conflicting TZ short names are handled in the current code (i.e. 'AT TIME ZONE ...')? The second choice doesn't avoids the issue of ambiguous names, although it requires moving TZ information inside the database and providing some means to update it. There were mentions of potential problems w/ pg_upgrade and pg_dump, if we add a massive amount of oids for the timezones. What are these problems specifically? I'd thing that storing TZ abbreviations is more straightforward and easier to implement, unless there are too ambiguous to identify the timezone correctly. > Note that I am currently proposing to store only the zone > names in the catalog, not the full TZ data. Where would we store other bits of timezone information? Wouldn't it be inconvenient to update names in system catalogs and DST rules elsewhere? Alexey. -- Command Prompt, Inc. http://www.CommandPrompt.com PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim@nasby.net> wrote: > On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: >> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim@nasby.net> wrote: >>> >>> I'm torn between whether the type should store the original time or the original time converted to GMT. >> >> This is the wrong way to think about it. We *never* store time >> "converted to GMT". When we want to represent a point in time we >> represent it as seconds since the epoch. > Right. Sorry, my bad. > >> The question here is how to represent more complex concepts than >> simply points in time. I think the two concepts under discussion are >> a) a composite type representing a point in time and a timezone it >> should be interpreted in for operations and display and b) the >> original input provided which is a text string with the constraint >> that it's a valid input which can be interpreted as a point in time. > > My fear with A is that something could change that would make it impossible to actually get back to the time that was originallyentered. For example, a new version of the timezone database could change something. Though, that problem alsoexists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. The common problem is daylight savings time being declared or cancelled. This happens numerous times throughout the year, often with short notice. If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that to keep meaning 6pm Fiji time no matter what decisions the Fijian government makes over the next two years, you need to store the wallclock (local) time and the timezone. The wallclock time remains fixed, but the conversion to UTC may float. If you are storing an point in time that remains stable no matter future political decisions, you store UTC time and an offset. The conversion to wallclock time may float, and your 6pm Fiji time meeting might change to 5pm or 7pm depending on the policical edicts. If you are only storing past events, its not normally an issue but timezone information does occasionally get changed retroactively if errors are discovered. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
On Jul 6, 2011, at 9:24 AM, Alexey Klyukin wrote: > So, I'd think there are 2 reasonable approaches to storing the > timezone part: > > 1. Store the timezone abbreviation (i.e. 'EST' along w/ the timestamp > data). > 2. Assign OID to each of the timezones and store it w/ the timestamp. > > The first option seem to avoid the necessity of creating a new system > catalog for timezone information and the burden of updating it, > because current implementation is already capable of translating > abbreviations to useful timezone information. The question is, whether > just a TZ abbreviation is sufficient to uniquely identify the timezone > and get the offset and DST rules. If it's not sufficient, how > conflicting TZ short names are handled in the current code (i.e. 'AT > TIME ZONE ...')? It's not enough. See the timezone_abbreviations setting. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Fri, Jul 8, 2011 at 11:13 AM, Stuart Bishop <stuart@stuartbishop.net> wrote: > On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim@nasby.net> wrote: >> On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: >>> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim@nasby.net> wrote: >>>> >>>> I'm torn between whether the type should store the original time or the original time converted to GMT. >>> >>> This is the wrong way to think about it. We *never* store time >>> "converted to GMT". When we want to represent a point in time we >>> represent it as seconds since the epoch. >> Right. Sorry, my bad. >> >>> The question here is how to represent more complex concepts than >>> simply points in time. I think the two concepts under discussion are >>> a) a composite type representing a point in time and a timezone it >>> should be interpreted in for operations and display and b) the >>> original input provided which is a text string with the constraint >>> that it's a valid input which can be interpreted as a point in time. >> >> My fear with A is that something could change that would make it impossible to actually get back to the time that wasoriginally entered. For example, a new version of the timezone database could change something. Though, that problem alsoexists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. > > The common problem is daylight savings time being declared or > cancelled. This happens numerous times throughout the year, often with > short notice. > > If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that > to keep meaning 6pm Fiji time no matter what decisions the Fijian > government makes over the next two years, you need to store the > wallclock (local) time and the timezone. The wallclock time remains > fixed, but the conversion to UTC may float. > > If you are storing an point in time that remains stable no matter > future political decisions, you store UTC time and an offset. The > conversion to wallclock time may float, and your 6pm Fiji time meeting > might change to 5pm or 7pm depending on the policical edicts. This is a pretty good point. You would want the first of these (probably) for the time a meeting is scheduled to start, and the second for the time of a meteor shower (or some other natural event that doesn't care what the politicians decide). I feel like the second of these is pretty well handled by our existing timestamptz data type. Granted, you can't store the intended display time zone, but putting it in a separate column is not really a problem: at least, it has the right semantics. So maybe the first is the one we should be aiming at. If so, storing a counter and a time zone is the wrong approach: you need to record something like <year, month, day, hour, minute, second, tzname>. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jul 18, 2011, at 12:29 AM, Robert Haas wrote: > On Fri, Jul 8, 2011 at 11:13 AM, Stuart Bishop <stuart@stuartbishop.net> wrote: >> On Mon, Jun 6, 2011 at 7:50 AM, Jim Nasby <jim@nasby.net> wrote: >>> On Jun 4, 2011, at 3:56 AM, Greg Stark wrote: >>>> On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby <jim@nasby.net> wrote: >>>>> >>>>> I'm torn between whether the type should store the original time or the original time converted to GMT. >>>> >>>> This is the wrong way to think about it. We *never* store time >>>> "converted to GMT". When we want to represent a point in time we >>>> represent it as seconds since the epoch. >>> Right. Sorry, my bad. >>> >>>> The question here is how to represent more complex concepts than >>>> simply points in time. I think the two concepts under discussion are >>>> a) a composite type representing a point in time and a timezone it >>>> should be interpreted in for operations and display and b) the >>>> original input provided which is a text string with the constraint >>>> that it's a valid input which can be interpreted as a point in time. >>> >>> My fear with A is that something could change that would make it impossible to actually get back to the time that wasoriginally entered. For example, a new version of the timezone database could change something. Though, that problem alsoexists for timestamptz today, so presumably if it was much of an issue we'd have gotten complaints by now. >> >> The common problem is daylight savings time being declared or >> cancelled. This happens numerous times throughout the year, often with >> short notice. >> >> If you want to store '6pm July 3rd 2014 Pacific/Fiji', and want that >> to keep meaning 6pm Fiji time no matter what decisions the Fijian >> government makes over the next two years, you need to store the >> wallclock (local) time and the timezone. The wallclock time remains >> fixed, but the conversion to UTC may float. >> >> If you are storing an point in time that remains stable no matter >> future political decisions, you store UTC time and an offset. The >> conversion to wallclock time may float, and your 6pm Fiji time meeting >> might change to 5pm or 7pm depending on the policical edicts. > > This is a pretty good point. You would want the first of these > (probably) for the time a meeting is scheduled to start, and the > second for the time of a meteor shower (or some other natural event > that doesn't care what the politicians decide). > > I feel like the second of these is pretty well handled by our existing > timestamptz data type. Granted, you can't store the intended display > time zone, but putting it in a separate column is not really a > problem: at least, it has the right semantics. So maybe the first is > the one we should be aiming at. If so, storing a counter and a time > zone is the wrong approach: you need to record something like <year, > month, day, hour, minute, second, tzname>. Right; you need a timestamp and you need to know what timezone that timestamp was entered in. That means you can always convertthat time to whatever timezone you'd like (like timestamptz), but you also know what time was originally entered,and what timezone it was entered in. Technically you can do that with a separate field, but that seems really uglyto me. So what we're proposing is a new data type that stores a timestamp as well as the timezone that that time was originallyentered in. We can't just store a 3 letter timezone abbreviation, because the mapping from 3 letter TZs to actualTZs is not fixed (see the timezone_abbreviations GUC). I believe the best way to handle this is a system table thatstores the name of every timezone that the database has ever loaded from the timezone data files, along with an OID.That means that the storage for this is just a timestamp and an OID. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim, > Right; you need a timestamp and you need to know what timezone that timestamp was entered in. That means you can alwaysconvert that time to whatever timezone you'd like (like timestamptz), but you also know what time was originally entered,and what timezone it was entered in. Technically you can do that with a separate field, but that seems really uglyto me. I disagree. It's a good mapping of the actual data. The timestamp and the timezone in which that timestamp was entered are two separate pieces of data and *ought* to be in two separate fields. For one thing, the question of "what timezone was this entered in" is an application-specific question, since you have three different potential timezones: * the actual client timezone * the actual server timezone * the application timezone if the application has configurable timezones In a builtin data type, which of those three would you pick? Only the application knows. Additionally, if you have your timestamp-with-original-timezone data type, then you're going to need to recode every single timestamp-handling function and operator to handle the new type. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Excerpts from Josh Berkus's message of lun jul 18 18:37:15 -0400 2011: > The timestamp and the timezone in which that timestamp was entered are > two separate pieces of data and *ought* to be in two separate fields. > For one thing, the question of "what timezone was this entered in" is an > application-specific question, since you have three different potential > timezones: > > * the actual client timezone > * the actual server timezone > * the application timezone if the application has configurable timezones > > In a builtin data type, which of those three would you pick? Only the > application knows. I think this whole discussion is built on the assumption that the client timezone and the application timezone are one thing and the same; and the server timezone is not relevant at all. If the app TZ is not the client TZ, then the app will need fixed. > Additionally, if you have your timestamp-with-original-timezone data > type, then you're going to need to recode every single > timestamp-handling function and operator to handle the new type. I have my doubts about that, and I hope not. These details haven't been discussed at all; I only started this thread to get community approval on cataloguing the TZs. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Josh Berkus <josh@agliodbs.com> wrote: > The timestamp and the timezone in which that timestamp was entered > are two separate pieces of data and *ought* to be in two separate > fields. So, if you're grabbing a timestamp and the time zone for it, how do you ensure you've done that atomically if you're at the boundary of a DST change? The difficulty of grabbing both such that they are guaranteed to correspond suggests to me that they really form a single logical value. > For one thing, the question of "what timezone was this entered in" > is an application-specific question, since you have three > different potential timezones: > > * the actual client timezone > * the actual server timezone > * the application timezone if the application has configurable > timezones > > In a builtin data type, which of those three would you pick? Well clearly the only one *PostgreSQL* would "pick" is one assigned within the database server; otherwise, for a data type like this the value coming over the wire should specify it. If I want the client side value (in Java) it's easy enough to get such a value. "new GregorianCalendar()" is described thusly: | Constructs a default GregorianCalendar using the current time in | the default time zone with the default locale. How does Java assign those defaults? Why should PostgreSQL care? It's got the means to do so for itself. The point is, people can easily establish such a value on the client side; why not on the server side? > Only the application knows [whether it should pick the value or > let the database pick it]. When are things otherwise? Obviously the application will assign it or choose to let the server assign it (if that's the right thing). > Additionally, if you have your timestamp-with-original-timezone > data type, then you're going to need to recode every single > timestamp-handling function and operator to handle the new type. Why? I think you'd want to add some *new* casts and operators for the new data type; I don't see why any existing ones would need to be modified. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Josh Berkus <josh@agliodbs.com> wrote: >> The timestamp and the timezone in which that timestamp was entered >> are two separate pieces of data and *ought* to be in two separate >> fields. > So, if you're grabbing a timestamp and the time zone for it, how do > you ensure you've done that atomically if you're at the boundary of > a DST change? In my view of the world, the timezone that you are in is not an object that changes across a DST boundary. So the above is a red herring. It is only a problem if you insist on a broken concept of what a timezone is. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> Josh Berkus <josh@agliodbs.com> wrote: >>> The timestamp and the timezone in which that timestamp was >>> entered are two separate pieces of data and *ought* to be in two >>> separate fields. > >> So, if you're grabbing a timestamp and the time zone for it, how >> do you ensure you've done that atomically if you're at the >> boundary of a DST change? > > In my view of the world, the timezone that you are in is not an > object that changes across a DST boundary. You're right -- the moment in time should be fixed like in the current PostgreSQL "timestamp with time zone", and the time zone doesn't change with DST. Not an intentional read herring, but definitely some muddy thinking there. That weakens the argument for such a data type. Even with that, I suspect that its value as a convenience for application programmers would be sufficient that an extension to provide such functionality would get used. Essentially the current timestamptz bundled with a time zone and which is, by default, displayed "at time zone" of the attached time zone on output. -Kevin
On 19 July 2011 17:11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>> Josh Berkus <josh@agliodbs.com> wrote: >>>> The timestamp and the timezone in which that timestamp was >>>> entered are two separate pieces of data and *ought* to be in two >>>> separate fields. >> >>> So, if you're grabbing a timestamp and the time zone for it, how >>> do you ensure you've done that atomically if you're at the >>> boundary of a DST change? >> >> In my view of the world, the timezone that you are in is not an >> object that changes across a DST boundary. > > You're right -- the moment in time should be fixed like in the > current PostgreSQL "timestamp with time zone", and the time zone > doesn't change with DST. Not an intentional read herring, but > definitely some muddy thinking there. There was an earlier point made that if someone puts eg 5pm local time two years in the future into the database, and then the DST boundary gets moved subsequently, some applications would like the value to still say 5pm local time, even though that means it now refers to a different point in absolute time - this potentially seems like a useful feature. Retroactive timezone changes wouldn't make a lot of sense in this case though... I guess there are three concepts of time here - an absolute fixed time with no reference to a timezone, a time with a timezone that is still set as a fixed point in time, or a local time in a specific timezone that would move if the timezone definition changed. Ian
Alvaro, Kevin, >> In a builtin data type, which of those three would you pick? Only the >> application knows. > > I think this whole discussion is built on the assumption that the client > timezone and the application timezone are one thing and the same; and > the server timezone is not relevant at all. If the app TZ is not the > client TZ, then the app will need fixed. Not at all. Consider a hosted webapp where the user is allowed to set their own timezone, but you use pooled connections. In that case, the app is going to be handling user timezones with an AT TIME ZONE, not with a SET TIMEZONE="" > I have my doubts about that, and I hope not. These details haven't been > discussed at all; I only started this thread to get community approval > on cataloguing the TZs. I am strongly in favor of having a *timezone* data type and some system whereby we can uniquely identify timezones in the Zic database. That would be tremendously useful for all sorts of things. I'm just asserting that those who want a composite timestamp+saved-time-zone data type have not thought about all of the complications involved. > So, if you're grabbing a timestamp and the time zone for it, how do > you ensure you've done that atomically if you're at the boundary of > a DST change? The difficulty of grabbing both such that they are > guaranteed to correspond suggests to me that they really form a > single logical value. Not relevant, given that (hopefully) the conception of a time zone should exist independantly of whether it's currently in DST or not. That is, the time zone is NOT "-07". The time zone is "US/Pacific". > Why? I think you'd want to add some *new* casts and operators for > the new data type; I don't see why any existing ones would need to > be modified. That would work too. What I'm pointing out is that we can't implement the new type using just one-line modifications to the old operators and functions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Jul 19, 2011, at 2:06 PM, Josh Berkus wrote: > I am strongly in favor of having a *timezone* data type and some system > whereby we can uniquely identify timezones in the Zic database. CREATE OR REPLACE FUNCTION is_timezone( tz CITEXT ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$ BEGIN PERFORM NOW() AT TIME ZONE tz; RETURN TRUE; EXCEPTION WHEN invalid_parameter_value THEN RETURN FALSE; END; $$; CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) ); Best, David
On Jul 19, 2011, at 4:20 PM, David E. Wheeler wrote: > On Jul 19, 2011, at 2:06 PM, Josh Berkus wrote: > >> I am strongly in favor of having a *timezone* data type and some system >> whereby we can uniquely identify timezones in the Zic database. > > CREATE OR REPLACE FUNCTION is_timezone( > tz CITEXT > ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$ ... > CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) ); Storing giant globs of text with every timestamp field is really ugly. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote: > On 19 July 2011 17:11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >>>> Josh Berkus <josh@agliodbs.com> wrote: >>>>> The timestamp and the timezone in which that timestamp was >>>>> entered are two separate pieces of data and *ought* to be in two >>>>> separate fields. >>> >>>> So, if you're grabbing a timestamp and the time zone for it, how >>>> do you ensure you've done that atomically if you're at the >>>> boundary of a DST change? >>> >>> In my view of the world, the timezone that you are in is not an >>> object that changes across a DST boundary. >> >> You're right -- the moment in time should be fixed like in the >> current PostgreSQL "timestamp with time zone", and the time zone >> doesn't change with DST. Not an intentional read herring, but >> definitely some muddy thinking there. > > There was an earlier point made that if someone puts eg 5pm local time > two years in the future into the database, and then the DST boundary > gets moved subsequently, some applications would like the value to > still say 5pm local time, even though that means it now refers to a > different point in absolute time - this potentially seems like a > useful feature. Retroactive timezone changes wouldn't make a lot of > sense in this case though... Right; and timezone's aren't supposed to change retroactively. The ZIC database is specifically setup so that it knows thehistory of TZ changes and deals with the past correctly. > I guess there are three concepts of time here - an absolute fixed time > with no reference to a timezone, a time with a timezone that is still > set as a fixed point in time, or a local time in a specific timezone > that would move if the timezone definition changed. Or, another way to put the third class: a timestamp that remembers what it's original timezone was so that you can referto it a common timezone (such as UTC), OR you can refer to it at it's original, local time. That's our exact need forthis: we have different businesses that operate in different timezones. Generally, we only care about things in localtime, but there are cases (such as event logging) where we could care about local *OR* unified time. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Jul 21, 2011, at 2:39 PM, Jim Nasby wrote: >> CREATE OR REPLACE FUNCTION is_timezone( >> tz CITEXT >> ) RETURNS BOOLEAN LANGUAGE plpgsql STABLE AS $$ > ... >> CREATE DOMAIN timezone AS CITEXT CHECK ( is_timezone( VALUE ) ); > > Storing giant globs of text with every timestamp field is really ugly. You work with what you've got. David
On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote: >> I have my doubts about that, and I hope not. These details haven't been >> discussed at all; I only started this thread to get community approval >> on cataloguing the TZs. > > I am strongly in favor of having a *timezone* data type and some system > whereby we can uniquely identify timezones in the Zic database. That > would be tremendously useful for all sorts of things. I'm just > asserting that those who want a composite timestamp+saved-time-zone data > type have not thought about all of the complications involved. Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezonedatatype there's at least the possibility of using a composite type to deal with all of this. Or at least we canjust create a custom datatype using existing tools... the only part of this that I see that actually requires closer coresupport is the timezone data itself. So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data typeas an add-on (at least assuming we don't run into any gotchas). Alvaro, please speak up if there's any technical issues here that I've missed? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Jim Nasby <jim@nasby.net> writes: > On Jul 19, 2011, at 11:22 AM, Ian Caulfield wrote: >> There was an earlier point made that if someone puts eg 5pm local time >> two years in the future into the database, and then the DST boundary >> gets moved subsequently, some applications would like the value to >> still say 5pm local time, even though that means it now refers to a >> different point in absolute time - this potentially seems like a >> useful feature. Retroactive timezone changes wouldn't make a lot of >> sense in this case though... > Right; and timezone's aren't supposed to change retroactively. The ZIC database is specifically setup so that it knowsthe history of TZ changes and deals with the past correctly. You haven't noticed that at least two or three times a year, there are "historical corrections" in the ZIC database? The mapping between local time and UTC might be less likely to change for a time instant in the past than one in the future, but it would be folly to assume that it's immutable in either direction. regards, tom lane
On Thu, Jul 21, 2011 at 5:48 PM, Jim Nasby <jim@nasby.net> wrote: > On Jul 19, 2011, at 4:06 PM, Josh Berkus wrote: >>> I have my doubts about that, and I hope not. These details haven't been >>> discussed at all; I only started this thread to get community approval >>> on cataloguing the TZs. >> >> I am strongly in favor of having a *timezone* data type and some system >> whereby we can uniquely identify timezones in the Zic database. That >> would be tremendously useful for all sorts of things. I'm just >> asserting that those who want a composite timestamp+saved-time-zone data >> type have not thought about all of the complications involved. > > Having to deal with timezone's completely separate from their timestamps is a huge PITA. That said, if we had a timezonedatatype there's at least the possibility of using a composite type to deal with all of this. Or at least we canjust create a custom datatype using existing tools... the only part of this that I see that actually requires closer coresupport is the timezone data itself. > > So if the community is OK with adding a timezone datatype then we can focus on that and leave the timestamptztz data typeas an add-on (at least assuming we don't run into any gotchas). As I have been watching this whole thread, my inclination has been to look at this from a "Prolog" perspective, where we think about the database as indicating a series of assertions about facts, from which we then try to reason. I suspect that determining what *really* needs to get recorded depends on this. And it seems to me that trying to head down the path of defining oid-based lookups of timezone names may be putting the cart before the horse. There are a number of facts about a timestamp: 1. What time did the database server think it was? SELECT NOW(); captures the database's concept of what time it was, complete with: a) The time, based, I think, on UT1. With the caveat that there's no certainty that the DB server's time is necessarily correct. b) An encoding of the timezone offset based on the value of the TimeZone GUC for this connection. If one is running an NTP daemon, pointing to a decently-connected network of NTP servers, then it's likely that this time is pretty accurate. And most of the time, I'd be inclined to treat this as authoritative, and contend that anything else is likely to be less correct and less easy to work with. The goal of this discussion thread is to record another timestamp with a different basis. It's not entirely clear what is its basis. I'll suggest one, which mostly underlines my contention that it's likely "less correct" and "less easy to work with" than having a column defined as... some_timestamp timestamp with timezone default NOW() 2. Client-based timestamp, comprising two things: a) A time, ascertained by the client. b) A timezone, ascertained by the client. Note that timezones are pretty open-ended. There is an authoritative encoding defined in the tz database, but there are other values used out there. We had to patch Slony-I to have it use 'ISO' timestamps, and recommend running in GMT/UTC, because there are values that blow things up. For instance, on AIX, there is a habit for boxes to set TZ=CUT0, out of the box, which isn't on what PostgreSQL considers to be the "official list." On the more whimsical side of things, Joey Hess, a Debian developer noted for such things as ikiwiki, etckeeper, git-annex, decided to create his very own custom timezone, "JEST", because he was irritated about DST. http://kitenet.net/~joey/blog/entry/howto_create_your_own_time_zone/ http://kitenet.net/~joey/blog/entry/JEST_results/ That "whimsical" entry won't be going into tzdata, and while we could discount this case as whimsy, it's not "out there" for organizations such as nation states to decide to legislate their own things, that we can't be certain will necessarily get into tzdata. There are enough aliases and possibilities of local national decisions to make it at least somewhat troublesome to treat this as something that can be considered fixed down to the OID level. My conclusion would be that if someone is really, really, really keen on capturing their own notion of timezone, then this fits with the notion that, if they want to have something that could be treated as remotely authoritative, they should capture a multiplicity of pieces of datestamp information, and actively accept that this will be pretty duplicative. - I'd commend capturing NOW() in a timestamptz field. That gives you: 1. What time the DB server thought it was, in terms of UT1 2. What timezone it thought was tied to that connection. - Also, I'd be inclined to capture, in plain text form: 3. A client-recorded timestamp. I'm agnostic as to whether this has *any* validation done on it; I'd think it plausible that this is simply a text field, that might require a human to interpret it. 4. A client-recorded timezone. This would be a plain text field, and I'm not certain it's of any particular value to try to validate it against any would-be authoritative list. Why shouldn't Joey be able to use JEST? -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote: > - I'd commend capturing NOW() in a timestamptz field. That gives you: > 1. What time the DB server thought it was, in terms of UT1 > 2. What timezone it thought was tied to that connection. Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were usingwhen we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone*was* at that time, we have no way to go back to it. Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we shouldstore the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless withouta good way to remember what timezone it started out in. Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft to me:you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 byte timestamp+ varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with what thedatabase has available in it's ZIC database. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Thu, Jul 21, 2011 at 7:05 PM, Jim Nasby <jim@nasby.net> wrote: > On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote: >> - I'd commend capturing NOW() in a timestamptz field. That gives you: >> 1. What time the DB server thought it was, in terms of UT1 >> 2. What timezone it thought was tied to that connection. > > Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we were usingwhen we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what the timezone*was* at that time, we have no way to go back to it. > > Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we shouldstore the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless withouta good way to remember what timezone it started out in. > > Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft tome: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 bytetimestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with whatthe database has available in it's ZIC database. You have those same problems trying to include the time zone information in some new timestampreallyhasthetz data type, though. This problem reminds me a great deal of the problems associated with managing security labels for SE-Linux. There aren't that many distinct values, so ideally it would be nice to store an OID -> string mapping somewhere and just store the OIDs in the main table. But a new security label can appear at any time, and it doesn't work to have the transaction that discovers it do the insert into the mapping table. Time zones have the same problem, more or less. Now, maybe if we had non-transactional tables like Alvaro keeps muttering about... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of vie jul 22 11:33:09 -0400 2011: > This problem reminds me a great deal of the problems associated with > managing security labels for SE-Linux. There aren't that many > distinct values, so ideally it would be nice to store an OID -> string > mapping somewhere and just store the OIDs in the main table. But a > new security label can appear at any time, and it doesn't work to have > the transaction that discovers it do the insert into the mapping > table. Time zones have the same problem, more or less. Now, maybe if > we had non-transactional tables like Alvaro keeps muttering about... Oh, I wasn't relating these problems to non transactional tables ... Hmm. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Jim, > Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft tome: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 bytetimestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with whatthe database has available in it's ZIC database. Sure, although there's no getting around the portability issues. The moment you move that data between servers, you risk having specific timezones not be available on the second server. Or worse, be available but have a different definition -- if, for example, you're running a more/less updated PostgreSQL on the second server. I'm not saying that this isn't worth solving. I could really use a timezone datatype which was synched with ZIC in some way, and so could a lot of other users, whether or not a timestamp + original timezone type is available as well. But don't underestimate the scope of the problem. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Jul 22, 2011, at 10:33 AM, Robert Haas wrote: > On Thu, Jul 21, 2011 at 7:05 PM, Jim Nasby <jim@nasby.net> wrote: >> On Jul 21, 2011, at 5:30 PM, Christopher Browne wrote: >>> - I'd commend capturing NOW() in a timestamptz field. That gives you: >>> 1. What time the DB server thought it was, in terms of UT1 >>> 2. What timezone it thought was tied to that connection. >> >> Except that it doesn't, and that's exactly the problem I'm trying to solve here. I want to know what timezone we wereusing when we put a value into timestamptz, which then got converted to UT1. Without a reliable way to store what thetimezone *was* at that time, we have no way to go back to it. >> >> Now, we can debate whether it makes more sense to store the original time without conversion to UT1, or whether we shouldstore the time after converting it to UT1 (or whether we should offer both options), but that debate is pointless withouta good way to remember what timezone it started out in. >> >> Arguably, we could just create an add-on data type for storing that timezone information, but that seems pretty daft tome: you're stuck either storing raw text which takes what should be a 12 byte datatype up to a 20-30 byte type (8 bytetimestamp + varlena + text of timezone name), or you end up with major problems trying to keep an enum in sync with whatthe database has available in it's ZIC database. > > You have those same problems trying to include the time zone > information in some new timestampreallyhasthetz data type, though. > > This problem reminds me a great deal of the problems associated with > managing security labels for SE-Linux. There aren't that many > distinct values, so ideally it would be nice to store an OID -> string > mapping somewhere and just store the OIDs in the main table. But a > new security label can appear at any time, and it doesn't work to have > the transaction that discovers it do the insert into the mapping > table. Time zones have the same problem, more or less. Now, maybe if > we had non-transactional tables like Alvaro keeps muttering about... Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we can enforcethat we'll only recognize new TZ info as part of a config reload? Josh Berkus also made a good point that this does introduce the risk that you could end up moving data to a different server,that has a different ZIC database (perhaps via replication); at which point the fit could hit the shan (or the excrementcould impact the cooling device...). So perhaps the only reasonable way to handle this is to actually load ZIC datainto the database itself. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim@nasby.net> wrote: > Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we canenforce that we'll only recognize new TZ info as part of a config reload? Hmm. That might work in theory, but I don't see any good way to update every database's tz table on each reload. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jul 25, 2011, at 6:22 PM, Robert Haas wrote: > On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim@nasby.net> wrote: >> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we canenforce that we'll only recognize new TZ info as part of a config reload? > > Hmm. That might work in theory, but I don't see any good way to > update every database's tz table on each reload. I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want allof them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foreseethen perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do thatwith race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. Or are you seeing a problem I'm missing? -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Tue, Jul 26, 2011 at 6:45 PM, Jim Nasby <jim@nasby.net> wrote: > On Jul 25, 2011, at 6:22 PM, Robert Haas wrote: >> On Mon, Jul 25, 2011 at 6:26 PM, Jim Nasby <jim@nasby.net> wrote: >>> Hrm, don't we only pull in ZIC info on a reload? Or do we actually refer to it dynamically all the time? Perhaps we canenforce that we'll only recognize new TZ info as part of a config reload? >> >> Hmm. That might work in theory, but I don't see any good way to >> update every database's tz table on each reload. > > I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't want allof them to try to actually do the updates, only the first one that discovers the change. If that's the problem you foreseethen perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do thatwith race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. > > Or are you seeing a problem I'm missing? What if 4 backends concurrently are the "first ones" to try to simultaneously add "South America/Ruritania", and... 1. Connection #1 came in 'first', but rolls back its transaction. 2. Connection #2 came in 'second', and also winds up rolling back its transaction because the connection fails due to a network problem. 3. Connection #3 actually completes. But doesn't commit until after #4. 4. Connection #4 started last, but turns out to COMMIT first. The "merge" is a pretty bad one. They all have to try to succeed, and in some way that doesn't block things. Perhaps the TZ values need to not be UNIQUE, but some process can come in afterwards and rewrite to drop out non-unique values. That's not very nice either; that means you can't use a FK reference against the TZ table. Or you need to have something that comes in afterwards and repoints tuples to the *real* TZ entry, which seems likely to be troublesome. This just gets so ugly so fast; the attempt to save space by storing a pointer to the TZ value is just filled with trouble (and potential #fail). -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Jul 26, 2011, at 5:56 PM, Christopher Browne wrote: >> I'm assuming that the issue here is that multiple backends could be connected to the same database, and we don't wantall of them to try to actually do the updates, only the first one that discovers the change. If that's the problem youforesee then perhaps it's a non-issue... if each backend only updates things that have actually changed, and they do thatwith race-free 'merge' logic, then only the first backend to attempt the update would end up finding actual work to do. >> >> Or are you seeing a problem I'm missing? > > What if 4 backends concurrently are the "first ones" to try to > simultaneously add "South America/Ruritania", and... > > 1. Connection #1 came in 'first', but rolls back its transaction. > 2. Connection #2 came in 'second', and also winds up rolling back its > transaction because the connection fails due to a network problem. > 3. Connection #3 actually completes. But doesn't commit until after #4. > 4. Connection #4 started last, but turns out to COMMIT first. Ugh, I didn't realize that a reload would take effect in the middle of a transaction. That certainly kills what I proposed. Though, now I'm wondering why this info would need to be in every database anyway... certainly this should be treated asglobal data, and if that's the case then only one process needs to update it. Though I'm not sure if it's possible forglobal data to be ACID. Anyway, rather than continue this on-list, I'm going to get Alvaro to think about it in more detail and see what he comesup with. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net