Re: Interval aggregate regression failure (expected seems - Mailing list pgsql-hackers

From Michael Glaesemann
Subject Re: Interval aggregate regression failure (expected seems
Date
Msg-id 17475825-9C14-468E-B405-F3E88F37B65B@myrealbox.com
Whole thread Raw
In response to Re: Interval aggregate regression failure (expected seems  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Interval aggregate regression failure (expected seems
Re: Interval aggregate regression failure (expected seems
List pgsql-hackers
Tom Lane wrote:

> I've also confirmed that the problem is in interval_div; you can
> reproduce the failure with
>
>     select '41 years 1 mon 11 days'::interval / 10;
>
> which should give '4 years 1 mon 9 days 26:24:00', but when
> timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
> you get '4 years 1 mon 10 days 02:24:00'.  --enable-integer-datetimes
> is not relevant because the interesting part is all double/integer
> arithmetic.
>
> Looking at this, though, I wonder if the pentium4 answer isn't "more
> correct".  If I'm doing the math by hand correctly, what we end up
> with is having to cascade 3/10ths of a month down into the days field,
> and since the conversion factor is supposed to be 30 days/month, that
> should be exactly 9 days.  Plus the one full day from the 11/10 days
> gives 10 days.  I think what is happening on all the non-Pentium
> platforms is that (3.0/10.0)*30.0 is producing something just a shade
> less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
> due to rearrangement of the calculation.  I think we can still file  
> this
> as a compiler bug, because I'm pretty sure the C spec does not allow
> rearrangement of floating-point calculations ... but we might want to
> think about tweaking the code's roundoff behavior just a bit.

I took a naive look at this, but have been unable to come up with a  
satisfactory solution. Here's an even simpler example that exhibits  
the behavior:

# select '41 mon'::interval / 10;        ?column?
------------------------
4 mons 2 days 24:00:00

And one using a non-integer divisor:

# select '2 mon'::interval / 1.5;       ?column?
-----------------------
1 mon 9 days 24:00:00

Here's the relevant code in interval_div (timestamp.c c2573).  
month_remainder holds the fractional part of the months field of the  
original interval (41) divided by the divisor (10) as a double.
/* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;result->day +=
(int32)month_remainder_days;/* fractional months partial days into time */day_remainder += month_remainder_days -
(int32)month_remainder_days;
 

#ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder *  
USECS_PER_DAY);
#elseresult->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif

My understanding is that as month_remainder is a float (as is  
month_remainder_days), month_remainder_days may be equal to 24 hours  
after rounding. As we're converting from months to days, and from  
days to time, rather than from months to time directly, we're  
assuming that we should only have time less than 24 hours remaining  
in the month_remainder_days when it's added to day_remainder. If  
month_remainder_days is equal to 24 hours, we should increment result- >day rather than carrying that down into
result->time.

With that in mind, I produced this hack:

#ifdef HAVE_INT64_TIMESTAMP    month_remainder_seconds = month_remainder_day_fraction *  
USECS_PER_DAY;
    if ( USECS_PER_DAY == rint(month_remainder_seconds) )        result->day++;    else if ( USECS_PER_DAY == -
rint(month_remainder_seconds))        result->day--;    else day_remainder += month_remainder_day_fraction;
 
result->time = rint(span->time / factor + day_remainder *  
USECS_PER_DAY);
#else    month_remainder_seconds = month_remainder_day_fraction *  
SECS_PER_DAY;
    if ( SECS_PER_DAY == (int32) month_remainder_seconds )        result->day++;    else if ( SECS_PER_DAY == - (int32)
month_remainder_seconds)       result->day--;    else day_remainder += month_remainder_day_fraction;
 
result->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif


It works okay with --enable-integer-datetimes

postgres=# select '41 mon'::interval/10;   ?column?
---------------
4 mons 3 days
(1 row)

postgres=# select '2 mon'::interval/1.5;   ?column?
---------------
1 mon 10 days
(1 row)

It also changes the result of the aggregate test for intervals, but I  
think that's to be expected.

*** ./expected/interval.out    Tue Mar  7 07:49:17 2006
--- ./results/interval.out    Thu Jun 22 22:52:41 2006
***************
*** 218,224 ****  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs  (1 row)
  -- test long interval input
--- 218,224 ----  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs  (1 row)
  -- test long interval input


But doesn't work without --enable-integer-datetimes. It still gives  
the same answer as before. I don't have a firm grasp of floating  
point math so I'm fully aware this might be entirely the wrong way to  
go about this. It definitely feels kludgy (especially my sign- 
handling), but I submit this in the hope it moves the discussion  
forward a little. If someone wanted to point me in the right  
direction, I'll try to finish this up, updating the regression test  
and adding a few more to test this more thoroughly.

Thanks!

Michael Glaesemann
grzm myrealbox com





pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [CORE] GPL Source and Copyright Questions
Next
From: Tom Lane
Date:
Subject: Re: [CORE] GPL Source and Copyright Questions