Thread: Floating-point timestamps versus Range Types
I'm working on the design for Range Types for 9.1: http://wiki.postgresql.org/wiki/RangeTypes But I think that floating-point timestamps may pose a problem. In this thread: http://archives.postgresql.org/pgsql-bugs/2010-08/msg00378.php I pointed out that floating-point timestamps can become a different value when dumped and reloaded (that is, the type input and output functions are not perfect inverses). This is a problem now, in theory, with UNIQUE indexes because the data might be inserted once, but then a dump/reload may violate the constraint. But it's not very common to include a high-precision timestamp in a UNIQUE index, so it's not much of a practical problem. However, with Exclusion Constraints and Range Types, I expect this problem may become a practical problem, because Exclusion Constraints are designed to work with things like timestamps. Also, with Range Types, I expect people to generally care more about the precise boundaries; so basic sanity (like input and output functions that are inverses) seems more important. What should be done? I see a few options: 1. Do nothing. Floating-point timestamps aren't the default, and the bug reports are likely to be few and far between (but those that encounter the bug are likely to be very frustrated). 2. Fix the input/output functions in a special mode for dump/reload, to make them true inverses. This isn't easy to do by just adding extra digits. But perhaps we could hack up the text format to have a special escape that is really an encoded binary representation, or maybe resembling float8. 3. Somehow deprecate floating point timestamps or make them unusable in conjunction with Range Types. I'm not sure if there is demand to keep them alive or not. Thoughts? Regards,Jeff Davis
On Oct 17, 2010, at 9:56 AM, Jeff Davis wrote: > 3. Somehow deprecate floating point timestamps or make them unusable in > conjunction with Range Types. I'm not sure if there is demand to keep > them alive or not. +1 David
On Sun, 2010-10-17 at 10:00 -0700, David E. Wheeler wrote: > On Oct 17, 2010, at 9:56 AM, Jeff Davis wrote: > > > 3. Somehow deprecate floating point timestamps or make them unusable in > > conjunction with Range Types. I'm not sure if there is demand to keep > > them alive or not. > This seems the best solution. We finally got around to making --integer-datetimes the default in 8.4. The only major distribution that I know of that ships the deprecated configuration is RedHat/Fedora. I don't know when that will change. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Jeff Davis <pgsql@j-davis.com> writes: > What should be done? I see a few options: > 1. Do nothing. Floating-point timestamps aren't the default, and the bug > reports are likely to be few and far between (but those that encounter > the bug are likely to be very frustrated). I'm for that one. Anybody working with fractional float timestamps should already understand that they aren't exact. I can't see the value of expending any great amount of effort on this. There is maybe some argument for removing the float timestamp code altogether, but I think that that's probably premature. They were still the default in 8.3, and we are still supporting in-place upgrade from 8.3. regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> writes: > The only major distribution that I know of that ships the deprecated > configuration is RedHat/Fedora. I don't know when that will change. Red Hat switched to integer datetimes as of 8.4 ... just like upstream. Please don't imagine that you can complain that Red Hat is behind the curve on that. regards, tom lane
On Sun, 2010-10-17 at 16:17 -0400, Tom Lane wrote: > I'm for that one. Anybody working with fractional float timestamps > should already understand that they aren't exact. I can't see the value > of expending any great amount of effort on this. OK. > There is maybe some argument for removing the float timestamp code > altogether, but I think that that's probably premature. They were > still the default in 8.3, and we are still supporting in-place upgrade > from 8.3. Regarding Josh Drake's comment, do you have any insight about when Redhat will start to ship with integer timestamps? That seems like the determining factor for when we can deprecate floating-point timestamps. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Sun, 2010-10-17 at 16:17 -0400, Tom Lane wrote: >> There is maybe some argument for removing the float timestamp code >> altogether, but I think that that's probably premature. They were >> still the default in 8.3, and we are still supporting in-place upgrade >> from 8.3. > Regarding Josh Drake's comment, do you have any insight about when > Redhat will start to ship with integer timestamps? That seems like the > determining factor for when we can deprecate floating-point timestamps. At the earliest, we could consider dropping them when we drop support for in-place upgrade from 8.3 --- not only direct upgrade, but through multiple pg_upgrade steps. That's assuming that we think there are no users who are depending on float timestamps for functionality (they have a wider range than int timestamps don't they?). I don't believe that Red Hat's choices enter into this in the slightest: they aren't doing anything different from users who compile from source. Anyway the short answer seems to be that we can consider dropping them when we next break on-disk compatibility. regards, tom lane
On 10/17/2010 04:40 PM, Tom Lane wrote: > At the earliest, we could consider dropping them when we drop support > for in-place upgrade from 8.3 --- not only direct upgrade, but through > multiple pg_upgrade steps. That's assuming that we think there are > no users who are depending on float timestamps for functionality (they > have a wider range than int timestamps don't they?). Yes, they do. Maybe we need to look at providing a bigtimestamp type or similar at some stage. Or maybe the demand for it would be so low it should be an add-on module. cheers andrew
On Sun, 2010-10-17 at 16:27 -0400, Tom Lane wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > The only major distribution that I know of that ships the deprecated > > configuration is RedHat/Fedora. I don't know when that will change. > > Red Hat switched to integer datetimes as of 8.4 ... just like upstream. > Please don't imagine that you can complain that Red Hat is behind the > curve on that. I wasn't imagining please don't get your boxers in a bunch. I was misinformed and I apologize. JD > > regards, tom lane > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On Sun, Oct 17, 2010 at 12:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > The only major distribution that I know of that ships the deprecated > configuration is RedHat/Fedora. I don't know when that will change. > If only we knew someone in Redhat :) iirc the issue was binary upgrades. So I suspect the answer will be "the next time we break pg_upgrade and require a dump/reload". This is the cost we chose to pay for binary upgrades. -- greg
On Sun, 2010-10-17 at 15:52 -0700, Greg Stark wrote: > On Sun, Oct 17, 2010 at 12:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > > The only major distribution that I know of that ships the deprecated > > configuration is RedHat/Fedora. I don't know when that will change. > > > > If only we knew someone in Redhat :) Fortunately for those in the know, you can get --integer-datetimes with Red Hat. My information was a little outdated. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Andrew Dunstan <andrew@dunslane.net> writes: > On 10/17/2010 04:40 PM, Tom Lane wrote: >> ... That's assuming that we think there are >> no users who are depending on float timestamps for functionality (they >> have a wider range than int timestamps don't they?). > Yes, they do. > Maybe we need to look at providing a bigtimestamp type or similar at > some stage. Or maybe the demand for it would be so low it should be an > add-on module. [ rechecks the code & docs ... ] In principle float timestamps could have a ridiculously wide range, on the order of 140 million years if you assume that 1-second precision is sufficient. In practice they are constrained by our use of nonnegative 32-bit integers for Julian Day counts, which restricts the range to be from 4713 BC to 5 million years and change AD. 64-bit-int timestamps have a theoretical range of about plus or minus 300 thousand years, which again is restricted on the BC side by the Julian Day code. We could push out the 5M AD limit by converting the JD code to 64-bit ints, but it's not clear there's any interest in that given that it won't do a thing for the integer timestamp case (and I'm not sure if the equations are really correct so far out, anyway). So the bottom line question is whether somebody has a use for Gregorian calendar dates between 300K AD and 5M AD, while not needing to go back before 4K BC. I should think that the BC-side limit pretty much renders this datatype pointless for astronomers and geologists, even if they wanted to count in Gregorian dates; and I can't think of any other communities that are going to care much about dates that far out. So, if there's a use-case at all, it's not interesting enough to include in core. IOW I don't think the range argument holds much water for keeping float timestamps alive. The on-disk-compatibility argument does, though. regards, tom lane
On Mon, Oct 18, 2010 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > IOW I don't think the range argument holds much water for keeping float > timestamps alive. The on-disk-compatibility argument does, though. Right. I think your argument that we should "do nothing" upthread is exactly right. Deprecating float timestamps doesn't solve any real problem. As of today, we can assume that anyone who is still using float timestamps is doing so because they are doing in-place upgrade from an older version. If we do nothing, the worst thing that can possibly happen is that MAYBE they will have some difficulties if they use floating timestamps in combination with the range types Jeff is proposing to implement. Or, we can remove integer date time support and categorically prevent them from using pg_upgrade whether they care about range types or not, and whether they actually would have experienced problems with them or not. AFAICS, that's just being unfriendly to no purpose. A more interesting question is whether and how we can ease the migration path from float timestamps to integer timestamps. Even without range types, if someone does have a UNIQUE index on a timestamp column, could they get an error if they dump from a float-timestamp version of PG and restore onto an integer-timestamp version? How would we recommend that they recover from that situation? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2010-10-18 at 14:06 -0400, Robert Haas wrote: > Right. I think your argument that we should "do nothing" upthread is > exactly right. OK. > A more interesting question is whether and how we can ease the > migration path from float timestamps to integer timestamps. Even > without range types, if someone does have a UNIQUE index on a > timestamp column, could they get an error if they dump from a > float-timestamp version of PG and restore onto an integer-timestamp > version? Yes. They could also get an error if they dump from a float-timestamp version and restore into a float-timestamp version. [ That's because for float-timestamps typinput(typoutput(VALUE)) may not equal VALUE. ] > How would we recommend that they recover from that > situation? COPY the data out in binary mode, or they have already lost data (if using float-timestamps). A reasonable conversion path might be to offer integer timestamps using a different type name (e.g. inttimestamp) that always means integer timestamps. Then, they could convert using ALTER TABLE, then do an in-place upgrade. We could even make pg_upgrade optionally convert inttimestamp to timestamp in O(1) on an integer-timestamps build. Regards,Jeff Davis
Robert Haas <robertmhaas@gmail.com> writes: > A more interesting question is whether and how we can ease the > migration path from float timestamps to integer timestamps. Even > without range types, if someone does have a UNIQUE index on a > timestamp column, could they get an error if they dump from a > float-timestamp version of PG and restore onto an integer-timestamp > version? In principle yes, but I think the risk is pretty hypothetical. Currently (2010, ten years out from the internal epoch) the effective resolution of IEEE-float-based timestamps is about a tenth of a microsecond. Thus for example, on 8.3 I get regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.6164432-04'::timestamptz;?column? ----------f (1 row) regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.6164431-04'::timestamptz;?column? ----------t (1 row) regression=# select '2010-10-18 14:35:14.6164431-04'::timestamptz = '2010-10-18 14:35:14.61644311-04'::timestamptz;?column? ----------t (1 row) whereas an int-timestamp build sees these inputs as all the same. Thus, to get into trouble you'd need to have a unique index on data that conflicts at the microsecond scale but not at the tenth-of-a-microsecond scale. This seems implausible. In particular, you didn't get any such data from now(), which relies on Unix APIs that don't go below microsecond precision. You might conceivably have entered such data externally, as I did above, but you'd have to not notice/care that it wasn't coming back out at the same precision. And you'd have to never have dumped/reloaded using pg_dump, or the low order digits would have vanished already. And you'd have to not be dealing with data outside a range of roughly 1900-2100, or the precision of floats would actually be worse than ints. So the argument seems academic to me ... regards, tom lane
On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis <pgsql@j-davis.com> wrote: > A reasonable conversion path might be to offer integer timestamps using > a different type name (e.g. inttimestamp) that always means integer > timestamps. Then, they could convert using ALTER TABLE, then do an > in-place upgrade. We could even make pg_upgrade optionally convert > inttimestamp to timestamp in O(1) on an integer-timestamps build. I think in retrospect it would certainly have been better to make integer timestamps and float timestamps two separate data types, rather than two versions of the same data type. Whether it's worth providing that now after the fact is not clear to me. I'd be inclined to wait and see whether we get many complaints... One problem with changing types in pg_upgrade is that type OIDs can get embedded in the on-disk representation - I believe that this happens for arrays, for instance. So I think it's practical for pg_upgrade to change type names during a version upgrade, but not type OIDs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2010-10-18 at 14:49 -0400, Tom Lane wrote: > whereas an int-timestamp build sees these inputs as all the same. > Thus, to get into trouble you'd need to have a unique index on data that > conflicts at the microsecond scale but not at the tenth-of-a-microsecond > scale. This seems implausible. In particular, you didn't get any such > data from now(), which relies on Unix APIs that don't go below > microsecond precision. You might conceivably have entered such data > externally, as I did above, but you'd have to not notice/care that it > wasn't coming back out at the same precision. You can also get there via interval math, like multiplying by a numeric. That seems slightly more plausible. > So the argument seems academic to me ... With UNIQUE indexes I agree completely. If nothing else, who puts a UNIQUE index on high-precision timestamps? And the problem has existed for a long time already, it's nothing new. With Exclusion Constraints, it's slightly less academic, and it's a new addition. Still pretty far-fetched; but at least plausible, which is why I brought it up. However, I won't argue with the "don't do anything" approach to float-timestamps. Regards,Jeff Davis
Robert Haas wrote: > On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis <pgsql@j-davis.com> wrote: > > A reasonable conversion path might be to offer integer timestamps using > > a different type name (e.g. inttimestamp) that always means integer > > timestamps. Then, they could convert using ALTER TABLE, then do an > > in-place upgrade. We could even make pg_upgrade optionally convert > > inttimestamp to timestamp in O(1) on an integer-timestamps build. > > I think in retrospect it would certainly have been better to make > integer timestamps and float timestamps two separate data types, > rather than two versions of the same data type. Whether it's worth > providing that now after the fact is not clear to me. I'd be inclined > to wait and see whether we get many complaints... > > One problem with changing types in pg_upgrade is that type OIDs can > get embedded in the on-disk representation - I believe that this > happens for arrays, for instance. So I think it's practical for > pg_upgrade to change type names during a version upgrade, but not type > OIDs. One thing we have talked about is converting the page on read-in from the backend. Since the timestamps are the same size as float or integer, that might be possible. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian <bruce@momjian.us> wrote: > One thing we have talked about is converting the page on read-in from > the backend. Since the timestamps are the same size as float or > integer, that might be possible. Did we have a solution for the problem that understanding which columns are timestamps requires having a tuple descriptor and parsing the every tuple? That seems like it would a) be slow and b) require a lot of high level code in the middle of a low-level codepath. -- greg
Greg Stark wrote: > On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian <bruce@momjian.us> wrote: > > One thing we have talked about is converting the page on read-in from > > the backend. ?Since the timestamps are the same size as float or > > integer, that might be possible. > > Did we have a solution for the problem that understanding which > columns are timestamps requires having a tuple descriptor and parsing > the every tuple? That seems like it would a) be slow and b) require a > lot of high level code in the middle of a low-level codepath. Yep, that's what it requires. It would rewrite in the new format. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > Greg Stark wrote: >> Did we have a solution for the problem that understanding which >> columns are timestamps requires having a tuple descriptor and parsing >> the every tuple? That seems like it would a) be slow and b) require a >> lot of high level code in the middle of a low-level codepath. > Yep, that's what it requires. It would rewrite in the new format. In the case of the recent hstore fixes, we were able to put the burden on the hstore functions themselves to do any necessary conversion. I wonder if it'd be possible to do something similar here? I haven't chased the bits in any detail, but I'm thinking that integer timestamps in a plausible range might all look like denormalized floats, and conversely plausible float timestamps would look like ridiculously large integer timestamps. Would we be willing to make such assumptions to support in-place upgrade of timestamps? regards, tom lane
On Thu, Oct 21, 2010 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Would we be willing to make such assumptions to > support in-place upgrade of timestamps? > If something like that is true (I'm not sure it is) then we could consider doing the equivalent of what we were talking about doing for changes that require more space in the new version. 1. Backpatch a change that refuses to create new values of the prohibited type 2. Provide a tool which scans all the values in your old database and ensures there are no values of the prohibited type 3. Only allow pg_migrator from the version that includes the backpatched check We would also have to make sure the new version's integer timestamp doesn't write out any instances that look valid float timestamps. We could have a guc to disable this check if you have a tool that scans all the pages and rewrites any old values. I think #1 would be feasible if it's really as simple as checking the high bit which I imagine is what you're hoping it is. I supose it would require writing out a 0 or larger denormalized value which would mean we wouldn't be able to handle values close to the epoch properly. I suppose for timestamps that doesn't really matter since that's precision we never really have anyways. I'm not sure if the tool to rewrite all existing values is so feasible though. Considering that the values could be stuck in the middle of arrays or records or even custom data types. Also there's tintervals and so on to worry about too. -- greg
On Thu, Oct 21, 2010 at 10:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Greg Stark wrote: >>> Did we have a solution for the problem that understanding which >>> columns are timestamps requires having a tuple descriptor and parsing >>> the every tuple? That seems like it would a) be slow and b) require a >>> lot of high level code in the middle of a low-level codepath. > >> Yep, that's what it requires. It would rewrite in the new format. > > In the case of the recent hstore fixes, we were able to put the burden > on the hstore functions themselves to do any necessary conversion. > I wonder if it'd be possible to do something similar here? I haven't > chased the bits in any detail, but I'm thinking that integer timestamps > in a plausible range might all look like denormalized floats, and > conversely plausible float timestamps would look like ridiculously large > integer timestamps. Would we be willing to make such assumptions to > support in-place upgrade of timestamps? This seems like it might not be entirely reliable, which would make me disinclined to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>>>> "JD" == Jeff Davis <pgsql@j-davis.com> writes: JD> 2. Fix the input/output functions in a special mode for dump/reload, JD> to make them true inverses. That can be done by supporting the %A printf(3)/scanf(3) format. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
On Mon, 2010-10-25 at 13:54 -0400, James Cloos wrote: > >>>>> "JD" == Jeff Davis <pgsql@j-davis.com> writes: > > JD> 2. Fix the input/output functions in a special mode for dump/reload, > JD> to make them true inverses. > > That can be done by supporting the %A printf(3)/scanf(3) format. I don't happen to see a %A format in the man page, but I doubt the output would look like a timestamp. And if it doesn't look like a timestamp, it violates the spirit of a logical dump of the data. We could just base64-encode the binary and print that, but that doesn't maintain the spirit of a logical data dump either. Regards,Jeff Davis
>>>>> "JD" == Jeff Davis <pgsql@j-davis.com> writes: JD> 2. Fix the input/output functions in a special mode for dump/reload, JD> to make them true inverses. JC> That can be done by supporting the %A printf(3)/scanf(3) format. JD> I don't happen to see a %A format in the man page, but I doubt the JD> output would look like a timestamp. Modern printf(1) also supports it, so an easy example: :; printf '%A\n' 3.1415926535897932384626433832795029 0XC.90FDAA22168C235P-2 %a is the same, but with miniscule letters. It is the hex format of the float types, and defaults to exactly enough precision. The length modifier L makes %a expect a long double. JD> And if it doesn't look like a timestamp, it violates the spirit of a JD> logical dump of the data. Point taken. Had I read the whole thread before replying I would have been reminded that the float timestamps were archaic; that avoids any need of %A for timestamps. That said, the possiblity of hex i/o format for the float datatypes would be welcome. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
James Cloos <cloos@jhcloos.com> writes: > That said, the possiblity of hex i/o format for the float datatypes > would be welcome. It's unportable, for two different reasons: 1. pg_dump output would become platform-specific. This is highly undesirable. 2. The printf specifiers you want us to rely on are not standard. Cf Single Unix Spec, http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html which does not define either %A or %a. regards, tom lane
>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: JC> That said, the possiblity of hex i/o format for the float datatypes JC> would be welcome. TL> It's unportable, for two different reasons: TL> 2. The printf specifiers you want us to rely on are not standard. They are in C99. TL> 1. pg_dump output would become platform-specific. This is highly TL> undesirable. It is true that pg would have to test for them in configure and supply alternative code wherever libc fails to support them. I can readily accept that there are many more pressing needs. But would such a patch for master be rejected? -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 1024D/ED7DAEA6
On Mon, Oct 25, 2010 at 4:35 PM, James Cloos <cloos@jhcloos.com> wrote: >>>>>> "TL" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > JC> That said, the possiblity of hex i/o format for the float datatypes > JC> would be welcome. > > TL> It's unportable, for two different reasons: > > TL> 2. The printf specifiers you want us to rely on are not standard. > > They are in C99. > > TL> 1. pg_dump output would become platform-specific. This is highly > TL> undesirable. > > It is true that pg would have to test for them in configure and supply > alternative code wherever libc fails to support them. > > I can readily accept that there are many more pressing needs. > > But would such a patch for master be rejected? Let me back up a few steps and ask why you want this in the first place. If there's a real problem here, we should solve it, either in the way you've proposed or in some other manner. But you haven't really said what problem you're trying to solve - just that it would be "welcome", which leaves much to my (not very good) imagination. This is quite a bit OT for this thread so if you want to pursue this I'd suggest starting a new thread in which you start by laying out your case for doing this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company