Thread: Storage sizes for dates/times (documentation bug?)
I was just looking at the 8.3.1 documentation on the postgresql web site. According to the docs, timestamp with time zone takes less space than time with time zone with the same resolution. Is this a documentation bug? It makes no sense to me that by storing the date with the time you can save 4 bytes. __ Marc
Attachment
Marc Munro <marc@bloodnok.com> writes: > According to the docs, timestamp with time zone takes less space than > time with time zone with the same resolution. Is this a documentation > bug? Nope. regression=# select typname,typlen from pg_type where typname like 'time%tz'; typname | typlen -------------+-------- timestamptz | 8 timetz | 12 (2 rows) > It makes no sense to me that by storing the date with the time you > can save 4 bytes. Actually, your mistake is in imagining that timestamptz represents the timezone explicitly ... it doesn't. If it did, it'd be at least as large as timetz. regards, tom lane
Tom Lane wrote: > Marc Munro <marc@bloodnok.com> writes: > > It makes no sense to me that by storing the date with the time you > > can save 4 bytes. > > Actually, your mistake is in imagining that timestamptz represents the > timezone explicitly ... it doesn't. If it did, it'd be at least as > large as timetz. Hmm, so timetz does that? Amazing ... I didn't know that. Which means that storing date + timetz in two separate columns is not quite the same as storing a timestamptz. Oops. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Which means that storing date + timetz in two separate columns is not > quite the same as storing a timestamptz. Oops. Quite so. Our docs already point out that timetz is really a completely brain-damaged concept, anyway. There's been some talk of adding an explicit zone representation to timestamptz, but so far I haven't been convinced that it's worth doubling the storage requirement (which is what it would take, considering alignment...). ISTM that we have defined timestamptz in such a way that it solves many real-world problems, and timestamp also solves real-world problems, but the use-case for a timestamp plus an explicit time zone is much less clear. regards, tom lane
On Mon, Apr 14, 2008 at 09:10:52PM -0400, Tom Lane wrote: > Actually, your mistake is in imagining that timestamptz represents the > timezone explicitly ... it doesn't. Which really is a great pity :-( Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Mon, Apr 14, 2008 at 09:39:57PM -0400, Tom Lane wrote: > considering alignment...). ISTM that we have defined timestamptz > in such a way that it solves many real-world problems, and timestamp > also solves real-world problems, No doubt about it. > but the use-case for a timestamp plus an explicit time > zone is much less clear. GNUmed is a medical record. We store data about patients. Our public database is accessed from within different time zones. Imagine a user from Los Angeles and another one from San Francisco. Later on it is not readily apparent wherefrom a particular entry was made unless we store the originating timezone and/or location (which we do). Even if the location is stored one cannot *easily* derive the appropriate timezone from it (and thereby the local time of entering data). This is particularly important in medicine - expected hormone levels (say, cortisol) are markedly different at different times of the day. Say, when a doctor in London enters a cortisol level measured at 7:00am his time which is later displayed in a location B with local time = GMT + 8 hours the level will appear to have been taken at 3:00pm -- for which the level is out of bounds. Of course, the actual time stored in the database in UTC is quite correct - it was indeed 3pm in location B when it was 7am in London. But we need to know the original local time (and also be able to know UTC since we want to correlate times). What we do now is to explicitely store the original timezone with the timestamp for individual doctor-patient encounters which is a bit of leap-of-faith but works (one would actually have to store the timezone for each and every timestamptz field). We then use that information to say "SELECT ... AT TIME ZONE ... FROM ...". Another scenario: Recently Germany underwent the annual DST change. Suddenly times in the EMR entered before DST appeared shifted one hour forward. Which, technically, is correct - it's the same UTC time as before the DST onset, just shifted by one more hour. They'll appear shifted back to correct times when we go back to non-DST time. But then DST-entered times will appear shifted back, too, until DST starts again. Of course, all this is solvable by explicitely keeping track of which timestamps mean what but it'd be a whole lot easier if one could just say: select value, ts_blood_drawn at original time zone from lab_results where type = 'cortisol'; I (for one) would happily store more bytes if that's what it takes to reliably get at correct results (given the above circumstances). Yes, I know about tagged types but have shyed away from them so far courtesy of them not being adjustable after the fact. Should I be using a custom domain for this ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > Of course, the actual time stored in the database in UTC is > quite correct - it was indeed 3pm in location B when it was > 7am in London. But we need to know the original local time > (and also be able to know UTC since we want to correlate > times). I was under the impression that "timestamp without time zone" does precisely this. It would be nicer if the docs highlighted the differences, and reasons behind, the semantics between the two, instead of focusing so much on the syntax. The "WITH TIME ZONE" variant is described nicely: For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's timezone parameter, and is converted to UTC using the offset for the timezone zone. When a timestamp with time zone value is output, it is always converted from UTC to the current timezone zone, and displayed as local time in that zone. To see the time in another time zone, either change timezone or use the AT TIME ZONE construct (see Section 9.9.3). But there doesn't seem to be any similar description of the "WITHOUT TIME ZONE" option. It mentions: the date/time fields in the input value [...] is not adjusted for time zone. But that's about all I could find. I think that the actual semantics should be described and maybe a paragraph should be written highlighting differences with an example. I'd be happy to write this if people agree. My reasoning goes something like this: The WITH and WITHOUT clauses seem to be the opposite of my naive understanding of their purpose. I'd think that if you specify WITH TIME ZONE then it means that the timezone is important to me, and I want to deal with it myself. Whereas, the WITHOUT TIME ZONE clause would suggest that the timezone isn't important to me, and anything the database can do to make the problem go away the better. What the spec says, and PG does, is actually the opposite. The fact that this confusion can occur (and seems to occur reasonably often based on previous posts to the mailing lists) suggests that the docs should highlight the differences more clearly. I'd also hazard a guess that we don't hear about it more because most people just work within a single time zone and hence don't even notice the difference between the two. Sam
On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > Yes, I know about tagged types but have shyed away from them > so far courtesy of them not being adjustable after the fact. What do you mean by this? Adjustable in what way? Truly, taggedtypes are a really useful feature but I think the chance of them being in the main tree approximatly nil, which is enough reason to stay away from them. > Should I be using a custom domain for this ? You just need to store the pair (time,zone), I don't think domains are flexible enough for that. A complex type maybe (but then you've just reinvented taggedtypes, inefficiently :) ) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Tuesday 15 April 2008 6:31 am, Sam Mason wrote: > On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > > Of course, the actual time stored in the database in UTC is > > quite correct - it was indeed 3pm in location B when it was > > 7am in London. But we need to know the original local time > > (and also be able to know UTC since we want to correlate > > times). > > I was under the impression that "timestamp without time zone" does > precisely this. It would be nicer if the docs highlighted the > differences, and reasons behind, the semantics between the two, instead > of focusing so much on the syntax. The "WITH TIME ZONE" variant is > described nicely: > > For timestamp with time zone, the internally stored value is always > in UTC (Universal Coordinated Time, traditionally known as Greenwich > Mean Time, GMT). An input value that has an explicit time zone > specified is converted to UTC using the appropriate offset for that > time zone. If no time zone is stated in the input string, then it is > assumed to be in the time zone indicated by the system's timezone > parameter, and is converted to UTC using the offset for the timezone > zone. > > When a timestamp with time zone value is output, it is always > converted from UTC to the current timezone zone, and displayed as > local time in that zone. To see the time in another time zone, either > change timezone or use the AT TIME ZONE construct (see Section > 9.9.3). > > But there doesn't seem to be any similar description of the "WITHOUT > TIME ZONE" option. It mentions: > > the date/time fields in the input value [...] is not adjusted for time > zone. > > But that's about all I could find. I think that the actual semantics > should be described and maybe a paragraph should be written highlighting > differences with an example. I'd be happy to write this if people > agree. > > My reasoning goes something like this: The WITH and WITHOUT clauses > seem to be the opposite of my naive understanding of their purpose. I'd > think that if you specify WITH TIME ZONE then it means that the timezone > is important to me, and I want to deal with it myself. Whereas, the > WITHOUT TIME ZONE clause would suggest that the timezone isn't important > to me, and anything the database can do to make the problem go away the > better. What the spec says, and PG does, is actually the opposite. The > fact that this confusion can occur (and seems to occur reasonably often > based on previous posts to the mailing lists) suggests that the docs > should highlight the differences more clearly. > > I'd also hazard a guess that we don't hear about it more because most > people just work within a single time zone and hence don't even notice > the difference between the two. My only comment is on this assertion. Any location that has DST rules has two time zones. For instance I live in US PST/PDT. Without timezone support doing date/time math across time zone boundaries is asking for problems. > > > Sam -- Adrian Klaver aklaver@comcast.net
On Tue, Apr 15, 2008 at 02:31:22PM +0100, Sam Mason wrote: > On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > > Of course, the actual time stored in the database in UTC is > > quite correct - it was indeed 3pm in location B when it was > > 7am in London. But we need to know the original local time > > (and also be able to know UTC since we want to correlate > > times). > > I was under the impression that "timestamp without time zone" does > precisely this. It doesn't. It keeps the time *value* untouched. But it doesn't even store *any* timezone information with it. So, unless I *know* the original timezone by any other means I don't have *any* clue as to what point in time a particular timestamp value is. It less useful than "with time zone". The latter at least allows me to know the true (UTC-adjusted) time of an event without jumping through any hoops. > I'd also hazard a guess that we don't hear about it more because most > people just work within a single time zone and hence don't even notice > the difference between the two. Any DST change will highlight the difference quite clearly. I don't even have to change locations. Any tstz stored before a DST changeover will (quite logically) show up as shifted one hour after the changeover. This happens twice a year. A different angle: Customer orders item at 23:15 on March 30. Item is on special offer March 30th only. DST change happens on March 30 to March-31. Dealer looks at orders and sees "item ordered March 31st 0:15" and does NOT apply the rebate for March 30th. Of course, it's the app developers fault, but the use case for keeping the original timezone (so it can be reapplied) is clearly there. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Tue, Apr 15, 2008 at 03:43:05PM +0200, Martijn van Oosterhout wrote: > On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > > Yes, I know about tagged types but have shyed away from them > > so far courtesy of them not being adjustable after the fact. > > What do you mean by this? Adjustable in what way? This svana.org/kleptog/pgsql/taggedtypes.html is currently down for me but Google has a cache. I was probably wrongly remembering this: When you delete a tagged type some cruft is left behind (not in the system catalog though). Perhaps I confuse this with some limitation of a previous implementation of the enum type. Also perhaps I was misguided into thinking tags cannot be modified by the "don't delete from table of tags" part. > Truly, taggedtypes are a really useful feature but I think the chance > of them being in the main tree approximatly nil, which is enough reason > to stay away from them. Agree. Another one is non-indexability which I'd truly need. > > Should I be using a custom domain for this ? > > You just need to store the pair (time,zone), I don't think domains are > flexible enough for that. A complex type maybe (but then you've just > reinvented taggedtypes, inefficiently :) ) Which I'd rather avoid. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Tue, Apr 15, 2008 at 04:04:20PM +0200, Karsten Hilbert wrote: > On Tue, Apr 15, 2008 at 02:31:22PM +0100, Sam Mason wrote: > > On Tue, Apr 15, 2008 at 02:46:14PM +0200, Karsten Hilbert wrote: > > > Of course, the actual time stored in the database in UTC is > > > quite correct - it was indeed 3pm in location B when it was > > > 7am in London. But we need to know the original local time > > > (and also be able to know UTC since we want to correlate > > > times). > > > > I was under the impression that "timestamp without time zone" does > > precisely this. > It doesn't. It keeps the time *value* untouched. But it > doesn't even store *any* timezone information with it. But I was under the impression that you didn't want any time zone information. You wanted to know that that an appointment was at 3PM at the patients local time, attempting to correct this for the local time zone of any analyst is invalid. > So, > unless I *know* the original timezone by any other means I > don't have *any* clue as to what point in time a particular > timestamp value is. It less useful than "with time zone". The > latter at least allows me to know the true (UTC-adjusted) > time of an event without jumping through any hoops. I must be missing something then, can you explain why the original time zone matters? > A different angle: > > Customer orders item at 23:15 on March 30. Item is on > special offer March 30th only. DST change happens on March > 30 to March-31. Dealer looks at orders and sees "item > ordered March 31st 0:15" and does NOT apply the rebate for > March 30th. Except it probably wouldn't quite happen like this :) your time zone is normally recorded as something like (for me) "Europe/London" which then says that between these two dates this time zone applies, and within these other two dates this other correction applies. So, in your example, I'm not sure if there would be any visible difference between with and without time zones. If you actually hardcoded your timezone as GMT+6, or whatever, then yes it may be different. But only if you went around at midnight March 31st, changing computers to be GMT+5 (or would it be 7, I can never remember!). > Of course, it's the app developers fault, but the use case > for keeping the original timezone (so it can be reapplied) > is clearly there. In some cases yes I'd agree, but I have a feeling the number of cases is surprisingly small in practise. Sam
On Tue, Apr 15, 2008 at 06:48:27AM -0700, Adrian Klaver wrote: > On Tuesday 15 April 2008 6:31 am, Sam Mason wrote: > > My reasoning goes something like this: The WITH and WITHOUT clauses > > seem to be the opposite of my naive understanding of their purpose. I'd > > think that if you specify WITH TIME ZONE then it means that the timezone > > is important to me, and I want to deal with it myself. Whereas, the > > WITHOUT TIME ZONE clause would suggest that the timezone isn't important > > to me, and anything the database can do to make the problem go away the > > better. What the spec says, and PG does, is actually the opposite. The > > fact that this confusion can occur (and seems to occur reasonably often > > based on previous posts to the mailing lists) suggests that the docs > > should highlight the differences more clearly. > > > > I'd also hazard a guess that we don't hear about it more because most > > people just work within a single time zone and hence don't even notice > > the difference between the two. > > My only comment is on this assertion. Any location that has DST rules has two > time zones. For instance I live in US PST/PDT. Without timezone support > doing date/time math across time zone boundaries is asking for problems. Yes, knowing the difference between the two variants is needed to write correct code. In practise I'd guess that people don't do much testing across DST changes and if they do the calcs are generally on the order of days so an hour probably doesn't go amiss. The code will do all the right things for them, but yes, it isn't correct. Sam
On Tue, Apr 15, 2008 at 04:23:42PM +0200, Karsten Hilbert wrote: > When you delete a tagged type some cruft is left behind > (not in the system catalog though). > > Perhaps I confuse this with some limitation of a previous > implementation of the enum type. Also perhaps I was > misguided into thinking tags cannot be modified by the > "don't delete from table of tags" part. Oh, it means that if you DROP the type it will leave some stuff behind. You can ofcourse handle *value* of that type just like any other value. The 'tag table' in this case would be the list of timezones. I'll see if I can clarify it. > > Truly, taggedtypes are a really useful feature but I think the chance > > of them being in the main tree approximatly nil, which is enough reason > > to stay away from them. > Agree. Another one is non-indexability which I'd truly need. Well, you can index them ofcourse, but you need to indicate explicitly what you want to index: the timestamp or the timestamp shifted to the timezone. I felt the module couldn't make this decision on its own. Indexing the type directly would be more work, not sure if there's enough demand for it. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Tue, Apr 15, 2008 at 03:31:46PM +0100, Sam Mason wrote: > But I was under the impression that you didn't want any time zone > information. Wrong impression. > You wanted to know that that an appointment was at 3PM at > the patients local time, ... plus "what does local time mean". > attempting to correct this for the local time > zone of any analyst is invalid. Sure, there can be valid and invalid normalisations. > I must be missing something then, can you explain why the original time > zone matters? a) I want to be able to display when a patient's appointment happened in local time. b) I must be able to aggregate appointments from different time zones into a coherent EMR. For that I need to be able to map them onto, say, UTC. Taken together this could be served nicely by a UTC-storing, local-tz-remembering timestamp. > If you actually hardcoded your timezone as > GMT+6, or whatever, then yes it may be different. But only if you went > around at midnight March 31st, changing computers to be GMT+5 The machines do that by themselves. > In some cases yes I'd agree, but I have a feeling the number of cases is > surprisingly small in practise. The sampling may not be that large but when the problem is there it is painful. Basically, akin to "there's no such thing as plain text" there should be "there's no such thing as a timezone-less timestamp". Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Tue, Apr 15, 2008 at 05:29:09PM +0200, Martijn van Oosterhout wrote: > > Perhaps I confuse this with some limitation of a previous > > implementation of the enum type. Also perhaps I was > > misguided into thinking tags cannot be modified by the > > "don't delete from table of tags" part. > > Oh, it means that if you DROP the type it will leave some stuff behind. > You can ofcourse handle *value* of that type just like any other value. > The 'tag table' in this case would be the list of timezones. I'll see > if I can clarify it. That'd help, thanks. > > Agree. Another one is non-indexability which I'd truly need. > > Well, you can index them ofcourse, but you need to indicate explicitly > what you want to index: the timestamp or the timestamp shifted to the > timezone. Oh, OK. > I felt the module couldn't make this decision on its own. True enough. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
On Wed, Apr 16, 2008 at 05:09:56PM +0200, Karsten Hilbert wrote: > On Tue, Apr 15, 2008 at 03:31:46PM +0100, Sam Mason wrote: > > But I was under the impression that you didn't want any time zone > > information. > Wrong impression. Doh, yes. > > I must be missing something then, can you explain why the original time > > zone matters? > > a) I want to be able to display when a patient's appointment > happened in local time. > > b) I must be able to aggregate appointments from different > time zones into a coherent EMR. For that I need to be able > to map them onto, say, UTC. Hum, what's an "EMR"? > Taken together this could be served nicely by a UTC-storing, > local-tz-remembering timestamp. Why not do: CREATE TYPE tstz AS ( ts TIMESTAMP WITH TIME ZONE, tz TEXT ); And use this instead? > > If you actually hardcoded your timezone as > > GMT+6, or whatever, then yes it may be different. But only if you went > > around at midnight March 31st, changing computers to be GMT+5 > The machines do that by themselves. What sort of machines do this? With computers I've used, if its time zone is set to the local time of some specific location then yes it will. If you set it to some specific offset then no it won't. These are independant cases, and not the one I was drawing your attention to above. These cases are also independant of the original problem as well. If it's adjusting for local time, then it'll know when to apply DST offsets. I don't think a timestamp should ever change just because you're looking at from different DST values. I think this is why TIMEs are somewhat awkward beasts, I've tried to stay away from them because I can't build a consistant model of how they should function. If TIME values are only used to store values and not to perform any calculations on then I see some utility. > > In some cases yes I'd agree, but I have a feeling the number of cases is > > surprisingly small in practise. > The sampling may not be that large but when the problem is > there it is painful. Yes, I'm sure it is! > Basically, akin to "there's no such thing as plain text" > there should be "there's no such thing as a timezone-less > timestamp". Or maybe, a programming language should allow you to define your own abstractions if the defaults don't fit. Sam
On Wed, Apr 16, 2008 at 08:21:15PM +0100, Sam Mason wrote: > Hum, what's an "EMR"? Sorry, Electronic Medical Record. > Why not do: > > CREATE TYPE tstz AS ( ts TIMESTAMP WITH TIME ZONE, tz TEXT ); > > And use this instead? That should work. At the time (a couple of years ago) I wasn't aware of all the implications. Indexability, operator availability, computability ... I'm still not sure I'd know all the pitfalls. > What sort of machines do this? With computers I've used, if its time > zone is set to the local time of some specific location then yes it > will. If you set it to some specific offset then no it won't. These > are independant cases, and not the one I was drawing your attention to > above. These cases are also independant of the original problem as > well. All true. I misunderstood what you said. > > Basically, akin to "there's no such thing as plain text" > > there should be "there's no such thing as a timezone-less > > timestamp". > Or maybe, a programming language should allow you to define your own > abstractions if the defaults don't fit. Surely so and both Python and PostgreSQL have both been very helpful in this regard. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346