Thread: fixes for date_part micro/millisecond precision
Hi all, This patch fixes a problem where extract/date_part returns bad values for microsecond and millisecond parts. Patch is attached. I'm not sure if the solution is correct, but it cures this symptom. before: ======== brent=# select date_part('milliseconds','2001-11-24 13:49:12.826833-05'::timestamp); date_part ------------------ 826.833000000001 (1 row) brent=# select date_part('microseconds','2001-11-24 13:49:12.826833-05'::timestamp); date_part ------------------ 826833.000000001 (1 row) after: ======== brent=# select date_part('milliseconds','2001-11-24 13:49:12.826833-05'::timestamp); date_part ----------- 826.833 (1 row) brent=# select date_part('microseconds','2001-11-24 13:49:12.826833-05'::timestamp); date_part ----------- 826833 (1 row) cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Attachment
Brent Verner <brent@rcfile.org> writes: > This patch fixes a problem where extract/date_part returns bad values > for microsecond and millisecond parts. I'm not convinced the existing behavior is wrong. Timestamps are floats in PG, and therefore not infinitely precise. Your patch seems to be trying to cover this up, but I fear it will only succeed in introducing other problems. regards, tom lane
On 24 Nov 2001 at 14:24 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > This patch fixes a problem where extract/date_part returns bad values | > for microsecond and millisecond parts. | | I'm not convinced the existing behavior is wrong. Timestamps are floats | in PG, and therefore not infinitely precise. Your patch seems to be | trying to cover this up, but I fear it will only succeed in introducing | other problems. <pedantically> The current bahavior seems to be precise, just not accurate. </pedantically> I agree much that my fix is covering over the problem, but something needs to change (IMO), because if I insert a timestamp of '2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234' when I ask for the microseconds that I stored, so I can't see how the current behavior isn't asking for problems in applications that use these features. After many hours of looking at how to solve this, the only way I could find to ensure "what I put in is what I get out," was the change in timestamptz_part(). Anyone have any idea of how (better) to fix this, short of changing the internal representation of Timestamp to a struct? Is there a better place to 'force' the accuracy of the fractional second part that would be less prone to introduce other problems? thanks. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
Brent Verner <brent@rcfile.org> writes: > ... if I insert a timestamp of > '2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234' > when I ask for the microseconds that I stored, You cannot expect to get that back exactly, because *the precision is not there* (and yes, the term here is precision not accuracy). Right now (late 2001) we have about seven digits of precision to the right of the decimal point in a timestamp. As we get further away from the 2000-01-01 origin, precision will drop; it'll be six digits or less by 2010. The further you go from 2000 in either direction, the worse the precision. This is an inherent property of float arithmetic and can't be papered over with display hacks, at least not without losing more than you gain. > Is there a better place to 'force' the accuracy of the fractional > second part that would be less prone to introduce other problems? You cannot force accuracy that isn't there. regards, tom lane
On 24 Nov 2001 at 15:32 (-0500), Tom Lane wrote: | Brent Verner <brent@rcfile.org> writes: | > ... if I insert a timestamp of | > '2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234' | > when I ask for the microseconds that I stored, | | You cannot expect to get that back exactly, because *the precision is | not there* (and yes, the term here is precision not accuracy). Right | now (late 2001) we have about seven digits of precision to the right | of the decimal point in a timestamp. As we get further away from the | 2000-01-01 origin, precision will drop; it'll be six digits or less | by 2010. The further you go from 2000 in either direction, the worse | the precision. ah, I see. I wasn't thinking about the limited range that could be accurately stored in the double... :-( | This is an inherent property of float arithmetic and can't be papered | over with display hacks, at least not without losing more than you gain. | | > Is there a better place to 'force' the accuracy of the fractional | > second part that would be less prone to introduce other problems? | | You cannot force accuracy that isn't there. thanks, and sorry for the hassle. wandering-off-to-try-making-Timestamp-a-struct-ly yours, brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
On Sat, Nov 24, 2001 at 02:24:09PM -0500, Tom Lane wrote: > Brent Verner <brent@rcfile.org> writes: > > This patch fixes a problem where extract/date_part returns bad values > > for microsecond and millisecond parts. > > I'm not convinced the existing behavior is wrong. Timestamps are floats > in PG, and therefore not infinitely precise. Your patch seems to be > trying to cover this up, but I fear it will only succeed in introducing > other problems. Tom is right. If you need non-float microsecond/millisecond you can use to_char() -- it's news in 7.2: test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'MS'); to_char --------- 827 test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'US'); to_char --------- 826833 Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
On 26 Nov 2001 at 13:38 (+0100), Karel Zak wrote: | On Sat, Nov 24, 2001 at 02:24:09PM -0500, Tom Lane wrote: | > Brent Verner <brent@rcfile.org> writes: | > > This patch fixes a problem where extract/date_part returns bad values | > > for microsecond and millisecond parts. | > | > I'm not convinced the existing behavior is wrong. Timestamps are floats | > in PG, and therefore not infinitely precise. Your patch seems to be | > trying to cover this up, but I fear it will only succeed in introducing | > other problems. | | Tom is right. If you need non-float microsecond/millisecond you can | use to_char() -- it's news in 7.2: | | test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'MS'); | to_char | --------- | 827 Also, casting to float4 gives the 'expected' results. The behavior still seems wrong, since it gives me back something that I did not put in. Additionally, I can get back a value that I /can't/ put back in the database. I do agree with Tom that my solution was simply a cover-up. The SQL-99 spec says nothing of MILLI or MICROSECONDS, so I'm curious if there is interest in really fixing the date_part behavior WRT fractional seconds. Comments Thomas? The only way I can see to fix it is to make Timestamp a struct something like... typedef struct _pgTimestamp { unsigned long ts_epoch; /* seconds */ unsigned int ts_frac; /* microseconds */ signed char ts_tzo; /* GMT offset in hours */ unsigned char ts_precision; /* (???) precision of fractional part */ } Timestamp; I'll probably try to implement this new Timestamp as a learning exercise. If my hacking goes well and there is a desire to fix the current Timestamp limitations, I'll submit a patch. cheers. brent -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
> ... The behavior > still seems wrong, since it gives me back something that I did not > put in. Additionally, I can get back a value that I /can't/ put back > in the database. I do agree with Tom that my solution was simply > a cover-up. The SQL-99 spec says nothing of MILLI or MICROSECONDS, > so I'm curious if there is interest in really fixing the date_part > behavior WRT fractional seconds. Comments Thomas? You are exploring the downside of using float8 for timestamp storage. This has not been a major issue until now (and may not be for most users) but since we now allow more than two decimal places in the output format it becomes more obvious. > The only way I can see to fix it is to make Timestamp a struct > something like... Yuck. That *may* be how other databases do it, but having to carry along multiple fields complicates the math and comparison functions, and slows down the code. I have been thinking of implementing timestamp (and other related types) as 8 byte integers, which are not slow on some platforms. doubles are fast on most platforms nowadays. The tradeoff will be repeatability vs range, since we will not get the quasi-infinite range we have now when using a fixed decimal type. We could have a third option, which uses 4 byte integers in structures, which you proposed. But that seems the least desirable for the long run. If I do this it will be as a configurable option, at least at first. - Thomas
Thomas Lockhart <lockhart@fourpalms.org> writes: > I have been thinking of implementing timestamp (and other related types) > as 8 byte integers, which are not slow on some platforms. doubles are > fast on most platforms nowadays. OTOH, 8 byte integers fail to exist at all on some other platforms. I'd be in favor of reimplementing timestamps as int8 if it weren't for the portability issue. But I'm not sure I want to blow off int8-less platforms quite yet. > The tradeoff will be repeatability vs > range, since we will not get the quasi-infinite range we have now when > using a fixed decimal type. If we didn't mind increasing the storage requirement, we could do "float8 for the integral seconds, plus an int4 for the nanoseconds", which would work perfectly out to about 2^52 seconds either way from the epoch, and continue to work with reduced accuracy until the heat death of the universe. But this would be 12 bytes, plus 4 bytes alignment padding on some platforms, which might be excessive just to guarantee nanosecond precision out to the next geological era. regards, tom lane
Thomas Lockhart <lockhart@fourpalms.org> writes: > Yes. Mixing the float with the int seems to invite math troubles though True, any multi-member struct will be a huge PITA for arithmetic operations. Using int8 if available or float8 if not might work pretty nicely now that I think about it, at least from a coding point of view. You could hide a lot of the differences in the PG_GET and PG_RETURN macros. But do we want to have such a radical difference in accuracy and range across platforms? regards, tom lane
> > I have been thinking of implementing timestamp (and other related types) > > as 8 byte integers, which are not slow on some platforms. doubles are > > fast on most platforms nowadays. > OTOH, 8 byte integers fail to exist at all on some other platforms. Sure. It would be selectable at configure/build time. Failing to support int64 would mean you don't get the integer alternative for timestamp either. > > The tradeoff will be repeatability vs > > range, since we will not get the quasi-infinite range we have now when > > using a fixed decimal type. > If we didn't mind increasing the storage requirement, we could do > "float8 for the integral seconds, plus an int4 for the nanoseconds", > which would work perfectly out to about 2^52 seconds either way from the > epoch, and continue to work with reduced accuracy until the heat death > of the universe. But this would be 12 bytes, plus 4 bytes alignment > padding on some platforms, which might be excessive just to guarantee > nanosecond precision out to the next geological era. Yes. Mixing the float with the int seems to invite math troubles though (as does the struct solution of using two ints, since roundoff and carryover might be troublesome). Just a thought for now, but the int8 alternative may be pretty easy to do. - Thomas