Thread: Interval aggregate regression failure (expected seems wrong)
Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I get a regression failure in the interval tests. I am no export for the interval type, but the expected "9 days 28 hours" seem wrong, don't they? The actual value seems to be the same. Is it possible that this is broken on the platform where the expected results were generated? *** ./expected/interval.out Tue Oct 25 19:13:07 2005 --- ./results/interval.out Mon Nov 7 09:11:27 2005 *************** *** 218,224 **** select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs (1 row) -- test long interval input --- 218,224 ---- select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs (1 row) -- test long interval input The last commit to interval.out seems to be this one, and it changed exactly this line. revision 1.14 date: 2005/10/25 17:13:07; author: tgl; state: Exp; lines: +1 -1 Well, this is CVS tip, so there is a chance this is fixed in REL_8_1_STABLE which has a 1.14.0.2. At least the release tarball should be rebuilt, no? Sorry, if this is just noise. Just wanted to be sure you know about it. Best Regards, Michael
On Nov 7, 2005, at 17:24 , Michael Paesold wrote: > Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I > get a regression failure in the interval tests. I am no export for > the interval type, but the expected "9 days 28 hours" seem wrong, > don't they? The actual value seems to be the same. > > Is it possible that this is broken on the platform where the > expected results were generated? What platform are you testing on? With or without integer-datetimes? I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3 test=# select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------------- PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc. build 5026) I didn't have any regression failures. I'd also expect we'd see a lot more failures on the build farm if it were the case that it was broken just on the platform that the expected results were generated on. From a quick look at the current build farm failures on HEAD and REL8_1_STABLE, it doesn't look like any of the failures are failing here. Michael Glaesemann grzm myrealbox com
Michael Glaesemann wrote: > > On Nov 7, 2005, at 17:24 , Michael Paesold wrote: > >> Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I >> get a regression failure in the interval tests. I am no export for >> the interval type, but the expected "9 days 28 hours" seem wrong, >> don't they? The actual value seems to be the same. >> >> Is it possible that this is broken on the platform where the expected >> results were generated? > > What platform are you testing on? With or without integer-datetimes? Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux 2.4.21, glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without x86-64). > I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3 [snip] > I didn't have any regression failures. I'd also expect we'd see a lot > more failures on the build farm if it were the case that it was broken > just on the platform that the expected results were generated on. From > a quick look at the current build farm failures on HEAD and > REL8_1_STABLE, it doesn't look like any of the failures are failing here. I just started to wonder about buildfarm, too, but found that most build farm members have --enable-integer-datetimes. Could that be an explanation? Is it possible that the code is wrong with --enable-integer-datetimes? So what do you have in results/interval.out? @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no? Tom wrote for that commit: revision 1.14 date: 2005/10/25 17:13:07; author: tgl; state: Exp; lines: +1 -1 Remove justify_hours call from interval_mul and interval_div, and make some small stylistic improvements in these functions. Also fix several places where TMODULO() was being used with wrong-sized quotient argument, creating a risk of overflow --- interval2tm was actually capable of going into an infinite loop because of this. Perhaps it is an intended behavior? If so, it still fails without integer-datetimes. Best Regards, Michael Paesold
Michael Paesold wrote: >> On Nov 7, 2005, at 17:24 , Michael Paesold wrote: >> >>> Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I >>> get a regression failure in the interval tests. I am no export for >>> the interval type, but the expected "9 days 28 hours" seem wrong, >>> don't they? The actual value seems to be the same. >>> >>> Is it possible that this is broken on the platform where the >>> expected results were generated? > Perhaps it is an intended behavior? If so, it still fails without > integer-datetimes. Well, no, it also fails with integer-datetimes for me in the same way. pg_config output below. And yes, I did "cvs up; make distclean; ./configure... ; make ; make install ; make check". Could this be DST-related? I thought plain interval was not affected by DST changes. BINDIR = /usr/local/postgresql-8cvs/bin DOCDIR = /usr/local/postgresql-8cvs/doc INCLUDEDIR = /usr/local/postgresql-8cvs/include PKGINCLUDEDIR = /usr/local/postgresql-8cvs/include INCLUDEDIR-SERVER = /usr/local/postgresql-8cvs/include/server LIBDIR = /usr/local/postgresql-8cvs/lib PKGLIBDIR = /usr/local/postgresql-8cvs/lib LOCALEDIR = MANDIR = /usr/local/postgresql-8cvs/man SHAREDIR = /usr/local/postgresql-8cvs/share SYSCONFDIR = /usr/local/postgresql-8cvs/etc PGXS = /usr/local/postgresql-8cvs/lib/pgxs/src/makefiles/pgxs.mk CONFIGURE = '--prefix=/usr/local/postgresql-8cvs' '--with-pgport=54321' '--with-perl' 'CFLAGS=-O2 -mcpu=pentium4 -march=pentium4' '--enable-casserts' '--enable-debug' '--enable-integer-datetimes' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -fno-strict-aliasing -g CFLAGS_SL = -fpic LDFLAGS = -Wl,-rpath,/usr/local/postgresql-8cvs/lib LDFLAGS_SL = LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -lresolv -lnsl -ldl -lm -lbsd VERSION = PostgreSQL 8.2devel Best Regards, Michael Paesold
On Nov 7, 2005, at 18:28 , Michael Paesold wrote: > Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux > 2.4.21, glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without > x86-64). > >> I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3 Heh. I forgot, too ;) My test was also without integer-datetimes. > [snip] >> I didn't have any regression failures. I'd also expect we'd see a >> lot more failures on the build farm if it were the case that it >> was broken just on the platform that the expected results were >> generated on. From a quick look at the current build farm >> failures on HEAD and REL8_1_STABLE, it doesn't look like any of >> the failures are failing here. > > I just started to wonder about buildfarm, too, but found that most > build farm members have --enable-integer-datetimes. Could that be > an explanation? Is it possible that the code is wrong with --enable- > integer-datetimes? > > So what do you have in results/interval.out? > @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no? > select avg(f1) from interval_tbl; avg ------------------------------------------------- @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs (1 row) The point of the change to the interval datatype in 8.1 is to keep track of months, days, and seconds (which in turn are represented as hours, minutes and seconds). Previous releases tracked only months and seconds. This has advantages for using intervals with dates and timestamps that involve daylight saving time changes. Admittedly, it looks odd at first, but it falls out of the change in behavior of the interval datatype. There are two new functions, justify_days and justify_hours, that you can use to put intervals into more traditional forms. http://developer.postgresql.org/docs/postgres/functions-datetime.html Doesn't explain why you're getting a regression failure though. Michael Glaesemann grzm myrealbox com
Michael Glaesemann wrote: >> So what do you have in results/interval.out? >> @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no? >> > > select avg(f1) from interval_tbl; > avg > ------------------------------------------------- > @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs > (1 row) > > The point of the change to the interval datatype in 8.1 is to keep > track of months, days, and seconds (which in turn are represented as > hours, minutes and seconds). Previous releases tracked only months and > seconds. This has advantages for using intervals with dates and > timestamps that involve daylight saving time changes. Admittedly, it > looks odd at first, but it falls out of the change in behavior of the > interval datatype. There are two new functions, justify_days and > justify_hours, that you can use to put intervals into more traditional > forms. > > http://developer.postgresql.org/docs/postgres/functions-datetime.html Thank you very much for the insight. > Doesn't explain why you're getting a regression failure though. Well, I have something now. It seems to be a compiler/optimization issue. I wrote:> CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes> -Wpointer-arith -Winline -Wdeclaration-after-statement>-fno-strict-aliasing -g I had set CFLAGS to -O2 -mcpu=pentium4 -march=pentium4. I have been using these settings for testing PostgreSQL tip for some time now and never had any problems. Removing the cpu and architecture optimization part changes the behavior of the interval aggrate, so the results/interval.out now also looks like the expected output. select avg(f1) from interval_tbl; avg ------------------------------------------------- @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs (1 row) Switching -mcpu=pentium4 -march=pentium4 back on, results in wrong output. This is 100% reproducable. Can somebody with more knowledge explain why the compiler should stumble over just this? Pure luck? I have tested these combination of CFLAGS: -O2 OK -O2 -mcpu=i686 -march=i686 OK (good, RPMS are built with these) -O2 -mcpu=pentium4 -march=i686 OK -O2 -mcpu=pentium4 -march=pentium4 fails I am definatly not going to use -march=pentium4 in any production system. Should I open a bug report with RedHat (gcc vendor)? Best Regards, Michael Paesold
Michael Paesold <mpaesold@gmx.at> writes: > I have tested these combination of CFLAGS: > -O2 OK > -O2 -mcpu=i686 -march=i686 OK (good, RPMS are built with these) > -O2 -mcpu=pentium4 -march=i686 OK > -O2 -mcpu=pentium4 -march=pentium4 fails > I am definatly not going to use -march=pentium4 in any production > system. Should I open a bug report with RedHat (gcc vendor)? Yeah, but they'll probably want a smaller test case than "Postgres fails its regression tests" :-( My guess is that the problem is misoptimization of the interval_div() function (look in utils/adt/timestamp.c), and that you could probably construct a nice small test case for them out of that function. regards, tom lane
I wrote: > Michael Paesold <mpaesold@gmx.at> writes: >> I am definatly not going to use -march=pentium4 in any production >> system. Should I open a bug report with RedHat (gcc vendor)? > Yeah, but they'll probably want a smaller test case than "Postgres fails > its regression tests" :-( I have just confirmed that the problem still exists in FC4's current compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the priority of the complaint quite a long way in Red Hat's eyes. I've also confirmed that the problem is in interval_div; you can reproduce the failure with select '41 years 1 mon 11 days'::interval / 10; which should give '4 years 1 mon 9 days 26:24:00', but when timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4", you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes is not relevant because the interesting part is all double/integer arithmetic. Looking at this, though, I wonder if the pentium4 answer isn't "more correct". If I'm doing the math by hand correctly, what we end up with is having to cascade 3/10ths of a month down into the days field, and since the conversion factor is supposed to be 30 days/month, that should be exactly 9 days. Plus the one full day from the 11/10 days gives 10 days. I think what is happening on all the non-Pentium platforms is that (3.0/10.0)*30.0 is producing something just a shade less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly due to rearrangement of the calculation. I think we can still file this as a compiler bug, because I'm pretty sure the C spec does not allow rearrangement of floating-point calculations ... but we might want to think about tweaking the code's roundoff behavior just a bit. An example that's a little easier to look at is select '41 years 1 mon'::interval / 10; I get '4 years 1 mon 9 days' with the pentium4 optimization, and '4 years 1 mon 8 days 24:00:00' without, and the former seems more correct... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think we can still file this as a compiler bug, because I'm pretty sure > the C spec does not allow rearrangement of floating-point calculations ... It may have more to do with whether the floating point value can stay in a floating point register long enough to complete the calculation. IIRC, floating point registers are actually longer than a double so if the entire calculation is done in registers and then the result rounded off to store in memory it may get the right answer. Whereas if it loses the extra bits on the intermediate values (the infinite repeating fractions) that might be where you get the imprecise results. It makes some sense that -mcpu and -march give the compiler enough information to keep the intermediate results in registers more effectively. -- greg
c.f.: `-ffloat-store' Do not store floating point variables in registers, and inhibit other options that might change whethera floating point value is taken from a register or memory. This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the68881) keep more precision than a `double' is supposed to have. Similarly for the x86 architecture. For most programs,the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use `-ffloat-store' for such programs, after modifying them to store all pertinent intermediate computations into variables. -- greg
On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote: > IIRC, floating point registers are actually longer than a double so if the > entire calculation is done in registers and then the result rounded off to > store in memory it may get the right answer. Whereas if it loses the extra > bits on the intermediate values (the infinite repeating fractions) that might > be where you get the imprecise results. Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse. SSE is a much better choice on P4 for performance reasons, and never has excess precision. I'm guessing from the above that I'm incorrect, in which case we should always be compiled with -mfpmath=sse -msse2 when we are complied -march=pentium4, this should remove problems caused by excess precision. The same behavior can be had on non sse platforms with -ffloat-store.
Gregory Maxwell wrote: > On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote: > >>IIRC, floating point registers are actually longer than a double so if the >>entire calculation is done in registers and then the result rounded off to >>store in memory it may get the right answer. Whereas if it loses the extra >>bits on the intermediate values (the infinite repeating fractions) that might >>be where you get the imprecise results. > > > Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse. > SSE is a much better choice on P4 for performance reasons, and never > has excess precision. I'm guessing from the above that I'm incorrect, > in which case we should always be compiled with -mfpmath=sse -msse2 > when we are complied -march=pentium4, this should remove problems > caused by excess precision. The same behavior can be had on non sse > platforms with -ffloat-store. Just for the record (and those interested): using 'CFLAGS=-O2 -mcpu=pentium4 -march=pentium4 -mfpmath=sse -msse2' actually passes the regression tests. Best Regards, Michael Paesold
Tom Lane wrote: > I wrote: > >>Michael Paesold <mpaesold@gmx.at> writes: >> >>>I am definatly not going to use -march=pentium4 in any production >>>system. Should I open a bug report with RedHat (gcc vendor)? > > >>Yeah, but they'll probably want a smaller test case than "Postgres fails >>its regression tests" :-( > > > I have just confirmed that the problem still exists in FC4's current > compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the > priority of the complaint quite a long way in Red Hat's eyes. > > I've also confirmed that the problem is in interval_div; you can > reproduce the failure with > > select '41 years 1 mon 11 days'::interval / 10; [snip] Would you mind reporting this to RedHat Bugzilla? I believe a bug report from you would have more weight then mine, because you actually understand what's going on here. :-) Otherwise I am going to do do my best... Best Regards, Michael Paesold
Michael Paesold <mpaesold@gmx.at> writes: > Would you mind reporting this to RedHat Bugzilla? I believe a bug report > from you would have more weight then mine, because you actually > understand what's going on here. :-) Actually, given the thought that this may be an artifact of keeping an intermediate value in a wider-than-normal register rather than genuinely rearranging the computation, I'm not certain it is a compiler bug. We'd have to study it a lot more closely before filing it as one, anyway. If you accept the idea that the pentium4 answer is the right one, then what we really need to do is focus on a better rounding rule than "strict truncation". I was toying with the notion of adding the equivalent of half a microsecond to the fractional-day value before truncating it to integer. But I'm not certain that that wouldn't have some bad effects in other cases. regards, tom lane
Tom Lane wrote: > Michael Paesold <mpaesold@gmx.at> writes: > > Would you mind reporting this to RedHat Bugzilla? I believe a bug report > > from you would have more weight then mine, because you actually > > understand what's going on here. :-) > > Actually, given the thought that this may be an artifact of keeping an > intermediate value in a wider-than-normal register rather than genuinely > rearranging the computation, I'm not certain it is a compiler bug. > We'd have to study it a lot more closely before filing it as one, anyway. > > If you accept the idea that the pentium4 answer is the right one, > then what we really need to do is focus on a better rounding rule than > "strict truncation". I was toying with the notion of adding the > equivalent of half a microsecond to the fractional-day value before > truncating it to integer. But I'm not certain that that wouldn't have > some bad effects in other cases. Looking at the code, do we need additional rint() calls in there, or rint(x + 0.5)? Frankly, I am confused why interval_div() has caused such problems for us? Are we going at this the right way? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
I assume no more progress has been made on this? --------------------------------------------------------------------------- Tom Lane wrote: > I wrote: > > Michael Paesold <mpaesold@gmx.at> writes: > >> I am definatly not going to use -march=pentium4 in any production > >> system. Should I open a bug report with RedHat (gcc vendor)? > > > Yeah, but they'll probably want a smaller test case than "Postgres fails > > its regression tests" :-( > > I have just confirmed that the problem still exists in FC4's current > compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the > priority of the complaint quite a long way in Red Hat's eyes. > > I've also confirmed that the problem is in interval_div; you can > reproduce the failure with > > select '41 years 1 mon 11 days'::interval / 10; > > which should give '4 years 1 mon 9 days 26:24:00', but when > timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4", > you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes > is not relevant because the interesting part is all double/integer > arithmetic. > > Looking at this, though, I wonder if the pentium4 answer isn't "more > correct". If I'm doing the math by hand correctly, what we end up > with is having to cascade 3/10ths of a month down into the days field, > and since the conversion factor is supposed to be 30 days/month, that > should be exactly 9 days. Plus the one full day from the 11/10 days > gives 10 days. I think what is happening on all the non-Pentium > platforms is that (3.0/10.0)*30.0 is producing something just a shade > less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly > due to rearrangement of the calculation. I think we can still file this > as a compiler bug, because I'm pretty sure the C spec does not allow > rearrangement of floating-point calculations ... but we might want to > think about tweaking the code's roundoff behavior just a bit. > > An example that's a little easier to look at is > > select '41 years 1 mon'::interval / 10; > > I get '4 years 1 mon 9 days' with the pentium4 optimization, and > '4 years 1 mon 8 days 24:00:00' without, and the former seems more > correct... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > I've also confirmed that the problem is in interval_div; you can > reproduce the failure with > > select '41 years 1 mon 11 days'::interval / 10; > > which should give '4 years 1 mon 9 days 26:24:00', but when > timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4", > you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes > is not relevant because the interesting part is all double/integer > arithmetic. > > Looking at this, though, I wonder if the pentium4 answer isn't "more > correct". If I'm doing the math by hand correctly, what we end up > with is having to cascade 3/10ths of a month down into the days field, > and since the conversion factor is supposed to be 30 days/month, that > should be exactly 9 days. Plus the one full day from the 11/10 days > gives 10 days. I think what is happening on all the non-Pentium > platforms is that (3.0/10.0)*30.0 is producing something just a shade > less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly > due to rearrangement of the calculation. I think we can still file > this > as a compiler bug, because I'm pretty sure the C spec does not allow > rearrangement of floating-point calculations ... but we might want to > think about tweaking the code's roundoff behavior just a bit. I took a naive look at this, but have been unable to come up with a satisfactory solution. Here's an even simpler example that exhibits the behavior: # select '41 mon'::interval / 10; ?column? ------------------------ 4 mons 2 days 24:00:00 And one using a non-integer divisor: # select '2 mon'::interval / 1.5; ?column? ----------------------- 1 mon 9 days 24:00:00 Here's the relevant code in interval_div (timestamp.c c2573). month_remainder holds the fractional part of the months field of the original interval (41) divided by the divisor (10) as a double. /* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;result->day += (int32)month_remainder_days;/* fractional months partial days into time */day_remainder += month_remainder_days - (int32)month_remainder_days; #ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder * USECS_PER_DAY); #elseresult->time = span->time / factor + day_remainder * SECS_PER_DAY; #endif My understanding is that as month_remainder is a float (as is month_remainder_days), month_remainder_days may be equal to 24 hours after rounding. As we're converting from months to days, and from days to time, rather than from months to time directly, we're assuming that we should only have time less than 24 hours remaining in the month_remainder_days when it's added to day_remainder. If month_remainder_days is equal to 24 hours, we should increment result- >day rather than carrying that down into result->time. With that in mind, I produced this hack: #ifdef HAVE_INT64_TIMESTAMP month_remainder_seconds = month_remainder_day_fraction * USECS_PER_DAY; if ( USECS_PER_DAY == rint(month_remainder_seconds) ) result->day++; else if ( USECS_PER_DAY == - rint(month_remainder_seconds)) result->day--; else day_remainder += month_remainder_day_fraction; result->time = rint(span->time / factor + day_remainder * USECS_PER_DAY); #else month_remainder_seconds = month_remainder_day_fraction * SECS_PER_DAY; if ( SECS_PER_DAY == (int32) month_remainder_seconds ) result->day++; else if ( SECS_PER_DAY == - (int32) month_remainder_seconds) result->day--; else day_remainder += month_remainder_day_fraction; result->time = span->time / factor + day_remainder * SECS_PER_DAY; #endif It works okay with --enable-integer-datetimes postgres=# select '41 mon'::interval/10; ?column? --------------- 4 mons 3 days (1 row) postgres=# select '2 mon'::interval/1.5; ?column? --------------- 1 mon 10 days (1 row) It also changes the result of the aggregate test for intervals, but I think that's to be expected. *** ./expected/interval.out Tue Mar 7 07:49:17 2006 --- ./results/interval.out Thu Jun 22 22:52:41 2006 *************** *** 218,224 **** select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs (1 row) -- test long interval input --- 218,224 ---- select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs (1 row) -- test long interval input But doesn't work without --enable-integer-datetimes. It still gives the same answer as before. I don't have a firm grasp of floating point math so I'm fully aware this might be entirely the wrong way to go about this. It definitely feels kludgy (especially my sign- handling), but I submit this in the hope it moves the discussion forward a little. If someone wanted to point me in the right direction, I'll try to finish this up, updating the regression test and adding a few more to test this more thoroughly. Thanks! Michael Glaesemann grzm myrealbox com
On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote: > It also changes the result of the aggregate test for intervals, but > I think that's to be expected. My goodness. Of course it changes the aggregate test results. That was what brought this up in the first place. (*kicks self for not reading the subject line when responding*) Michael Glaesemann grzm seespotcode net
On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote: > # select '41 mon'::interval / 10; > ?column? > ------------------------ > 4 mons 2 days 24:00:00 > My understanding is that as month_remainder is a float (as is > month_remainder_days), month_remainder_days may be equal to 24 > hours after rounding. As we're converting from months to days, and > from days to time, rather than from months to time directly, we're > assuming that we should only have time less than 24 hours remaining > in the month_remainder_days when it's added to day_remainder. This behavior is the same as applying justify_hours before adding the days and time components to the result. With this in mind, I rewrote interval_div to call interval_justify_hours. Good news is that -- enable-integer-datetimes works as expected. Bad news is that without --enable-integer-datetimes, still get the current behavior. I also came across something I think is odd: # select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------------- PostgreSQL 8.1.4 on powerpc-apple-darwin8.7.0, compiled by GCC powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5341) (1 row) select justify_hours(a/10) as divided_and_justified , justify_hours(b) as justified , a/10 = b as are_equal from (select '41 mon'::interval,'4 mons 2 days 24:00:00'::interval) as s(a,b); without --enable-integer-datetimes: divided_and_justified | justified | are_equal ------------------------+---------------+----------- 4 mons 2 days 24:00:00 | 4 mons 3 days | t (1 row) with --enable-integer-datetimes: divided_and_justified | justified | are_equal -----------------------+---------------+----------- 4 mons 3 days | 4 mons 3 days | t I think this just confirms that there is some kind of rounding (or lack of) in interval_div. Kind of frustrating that it's not visible in the result. Anyway, there's another data point. Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes: > ... I think this just confirms that there is some kind of rounding (or > lack of) in interval_div. Kind of frustrating that it's not visible > in the result. I think the fundamental problem is that the float8 results of division are inaccurate, and yet we're assuming that we can (for instance) coerce them to integer and get exactly the right answer. For instance, in the '41 months'/10 example, I get month_remainder_days being computed as (gdb) p month_remainder $19 = 0.099999999999999645 (gdb) s 2575 result->day += (int32) month_remainder_days; (gdb) p month_remainder_days $20 = 2.9999999999999893 The only way we can really fix this is to be willing to round off the numbers, and I think the only principled way to do that is to settle on a specific target accuracy, probably 1 microsecond. Then the thing to do would be to scale up all the intermediate float results to microseconds and apply rint(). Something like (untested) month_remainder = rint(span->month * USECS_PER_MONTH / factor);day_remainder = rint(span->day * USECS_PER_DAY / factor);result->month= (int32) (month_remainder / USECS_PER_MONTH);result->day = (int32) (day_remainder / USECS_PER_DAY);month_remainder-= result->month * USECS_PER_MONTH;day_remainder -= result->day * USECS_PER_DAY; /* * Handle any fractional parts the same way as in interval_mul. */ /* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;extra_days = (int32) (month_remainder_days/ USECS_PER_DAY);result->day += extra_days;/* fractional months partial days into time */day_remainder+= month_remainder_days - extra_days * USECS_PER_DAY; #ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder); #elseresult->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6; #endif This might need a few more rint() calls --- I'm assuming that float ops with exact integral inputs will be OK, which is an assumption used pretty widely in the datetime code, but ... Per the comment, if we do this here we probably want to make interval_mul work similarly. regards, tom lane
Have we made any progress on this? --------------------------------------------------------------------------- Tom Lane wrote: > Michael Glaesemann <grzm@seespotcode.net> writes: > > ... I think this just confirms that there is some kind of rounding (or > > lack of) in interval_div. Kind of frustrating that it's not visible > > in the result. > > I think the fundamental problem is that the float8 results of division > are inaccurate, and yet we're assuming that we can (for instance) coerce > them to integer and get exactly the right answer. For instance, in the > '41 months'/10 example, I get month_remainder_days being computed as > > (gdb) p month_remainder > $19 = 0.099999999999999645 > (gdb) s > 2575 result->day += (int32) month_remainder_days; > (gdb) p month_remainder_days > $20 = 2.9999999999999893 > > The only way we can really fix this is to be willing to round off > the numbers, and I think the only principled way to do that is to > settle on a specific target accuracy, probably 1 microsecond. > Then the thing to do would be to scale up all the intermediate > float results to microseconds and apply rint(). Something like > (untested) > > month_remainder = rint(span->month * USECS_PER_MONTH / factor); > day_remainder = rint(span->day * USECS_PER_DAY / factor); > result->month = (int32) (month_remainder / USECS_PER_MONTH); > result->day = (int32) (day_remainder / USECS_PER_DAY); > month_remainder -= result->month * USECS_PER_MONTH; > day_remainder -= result->day * USECS_PER_DAY; > > /* > * Handle any fractional parts the same way as in interval_mul. > */ > > /* fractional months full days into days */ > month_remainder_days = month_remainder * DAYS_PER_MONTH; > extra_days = (int32) (month_remainder_days / USECS_PER_DAY); > result->day += extra_days; > /* fractional months partial days into time */ > day_remainder += month_remainder_days - extra_days * USECS_PER_DAY; > > #ifdef HAVE_INT64_TIMESTAMP > result->time = rint(span->time / factor + day_remainder); > #else > result->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6; > #endif > > This might need a few more rint() calls --- I'm assuming that float ops > with exact integral inputs will be OK, which is an assumption used > pretty widely in the datetime code, but ... > > Per the comment, if we do this here we probably want to make > interval_mul work similarly. > > regards, tom lane -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Aug 11, 2006, at 13:48 , Bruce Momjian wrote: > > Have we made any progress on this? I made a bit of progress but am still having issues when --enable- integer-datetimes is not enabled. I need to spend some time with gdb and figure out what's going on. I probably won't be able to get time to look at it until Aug 18. Michael Glaesemann grzm seespotcode net