Re: Interval aggregate regression failure (expected seems - Mailing list pgsql-hackers
From | Michael Glaesemann |
---|---|
Subject | Re: Interval aggregate regression failure (expected seems |
Date | |
Msg-id | 17475825-9C14-468E-B405-F3E88F37B65B@myrealbox.com Whole thread Raw |
In response to | Re: Interval aggregate regression failure (expected seems (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Interval aggregate regression failure (expected seems
Re: Interval aggregate regression failure (expected seems |
List | pgsql-hackers |
Tom Lane wrote: > I've also confirmed that the problem is in interval_div; you can > reproduce the failure with > > select '41 years 1 mon 11 days'::interval / 10; > > which should give '4 years 1 mon 9 days 26:24:00', but when > timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4", > you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes > is not relevant because the interesting part is all double/integer > arithmetic. > > Looking at this, though, I wonder if the pentium4 answer isn't "more > correct". If I'm doing the math by hand correctly, what we end up > with is having to cascade 3/10ths of a month down into the days field, > and since the conversion factor is supposed to be 30 days/month, that > should be exactly 9 days. Plus the one full day from the 11/10 days > gives 10 days. I think what is happening on all the non-Pentium > platforms is that (3.0/10.0)*30.0 is producing something just a shade > less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly > due to rearrangement of the calculation. I think we can still file > this > as a compiler bug, because I'm pretty sure the C spec does not allow > rearrangement of floating-point calculations ... but we might want to > think about tweaking the code's roundoff behavior just a bit. I took a naive look at this, but have been unable to come up with a satisfactory solution. Here's an even simpler example that exhibits the behavior: # select '41 mon'::interval / 10; ?column? ------------------------ 4 mons 2 days 24:00:00 And one using a non-integer divisor: # select '2 mon'::interval / 1.5; ?column? ----------------------- 1 mon 9 days 24:00:00 Here's the relevant code in interval_div (timestamp.c c2573). month_remainder holds the fractional part of the months field of the original interval (41) divided by the divisor (10) as a double. /* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;result->day += (int32)month_remainder_days;/* fractional months partial days into time */day_remainder += month_remainder_days - (int32)month_remainder_days; #ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder * USECS_PER_DAY); #elseresult->time = span->time / factor + day_remainder * SECS_PER_DAY; #endif My understanding is that as month_remainder is a float (as is month_remainder_days), month_remainder_days may be equal to 24 hours after rounding. As we're converting from months to days, and from days to time, rather than from months to time directly, we're assuming that we should only have time less than 24 hours remaining in the month_remainder_days when it's added to day_remainder. If month_remainder_days is equal to 24 hours, we should increment result- >day rather than carrying that down into result->time. With that in mind, I produced this hack: #ifdef HAVE_INT64_TIMESTAMP month_remainder_seconds = month_remainder_day_fraction * USECS_PER_DAY; if ( USECS_PER_DAY == rint(month_remainder_seconds) ) result->day++; else if ( USECS_PER_DAY == - rint(month_remainder_seconds)) result->day--; else day_remainder += month_remainder_day_fraction; result->time = rint(span->time / factor + day_remainder * USECS_PER_DAY); #else month_remainder_seconds = month_remainder_day_fraction * SECS_PER_DAY; if ( SECS_PER_DAY == (int32) month_remainder_seconds ) result->day++; else if ( SECS_PER_DAY == - (int32) month_remainder_seconds) result->day--; else day_remainder += month_remainder_day_fraction; result->time = span->time / factor + day_remainder * SECS_PER_DAY; #endif It works okay with --enable-integer-datetimes postgres=# select '41 mon'::interval/10; ?column? --------------- 4 mons 3 days (1 row) postgres=# select '2 mon'::interval/1.5; ?column? --------------- 1 mon 10 days (1 row) It also changes the result of the aggregate test for intervals, but I think that's to be expected. *** ./expected/interval.out Tue Mar 7 07:49:17 2006 --- ./results/interval.out Thu Jun 22 22:52:41 2006 *************** *** 218,224 **** select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs (1 row) -- test long interval input --- 218,224 ---- select avg(f1) from interval_tbl; avg ------------------------------------------------- ! @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs (1 row) -- test long interval input But doesn't work without --enable-integer-datetimes. It still gives the same answer as before. I don't have a firm grasp of floating point math so I'm fully aware this might be entirely the wrong way to go about this. It definitely feels kludgy (especially my sign- handling), but I submit this in the hope it moves the discussion forward a little. If someone wanted to point me in the right direction, I'll try to finish this up, updating the regression test and adding a few more to test this more thoroughly. Thanks! Michael Glaesemann grzm myrealbox com
pgsql-hackers by date: