Thread: Interval aggregate regression failure (expected seems wrong)

Interval aggregate regression failure (expected seems wrong)

From
Michael Paesold
Date:
Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I get 
a regression failure in the interval tests. I am no export for the 
interval type, but the expected "9 days 28 hours" seem wrong, don't 
they? The actual value seems to be the same.

Is it possible that this is broken on the platform where the expected 
results were generated?

*** ./expected/interval.out Tue Oct 25 19:13:07 2005
--- ./results/interval.out  Mon Nov  7 09:11:27 2005
***************
*** 218,224 ****  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs  (1 row)
  -- test long interval input
--- 218,224 ----  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs  (1 row)
  -- test long interval input

The last commit to interval.out seems to be this one, and it changed 
exactly this line.
revision 1.14
date: 2005/10/25 17:13:07;  author: tgl;  state: Exp;  lines: +1 -1

Well, this is CVS tip, so there is a chance this is fixed in 
REL_8_1_STABLE which has a 1.14.0.2. At least the release tarball should 
be rebuilt, no?
Sorry, if this is just noise. Just wanted to be sure you know about it.

Best Regards,
Michael



Re: Interval aggregate regression failure (expected seems wrong)

From
Michael Glaesemann
Date:
On Nov 7, 2005, at 17:24 , Michael Paesold wrote:

> Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I  
> get a regression failure in the interval tests. I am no export for  
> the interval type, but the expected "9 days 28 hours" seem wrong,  
> don't they? The actual value seems to be the same.
>
> Is it possible that this is broken on the platform where the  
> expected results were generated?

What platform are you testing on? With or without integer-datetimes?

I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3

test=# select version();                                                                    
version
------------------------------------------------------------------------ 
----------------------------------------------------------------------
PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.  
build 5026)

I didn't have any regression failures. I'd also expect we'd see a lot  
more failures on the build farm if it were the case that it was  
broken just on the platform that the expected results were generated  
on. From a quick look at the current build farm failures on HEAD and  
REL8_1_STABLE, it doesn't look like any of the failures are failing  
here.

Michael Glaesemann
grzm myrealbox com





Re: Interval aggregate regression failure (expected seems

From
Michael Paesold
Date:
Michael Glaesemann wrote:
> 
> On Nov 7, 2005, at 17:24 , Michael Paesold wrote:
> 
>> Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I  
>> get a regression failure in the interval tests. I am no export for  
>> the interval type, but the expected "9 days 28 hours" seem wrong,  
>> don't they? The actual value seems to be the same.
>>
>> Is it possible that this is broken on the platform where the  expected 
>> results were generated?
> 
> What platform are you testing on? With or without integer-datetimes?

Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux 2.4.21, 
glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without x86-64).

> I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3
[snip]
> I didn't have any regression failures. I'd also expect we'd see a lot  
> more failures on the build farm if it were the case that it was  broken 
> just on the platform that the expected results were generated  on. From 
> a quick look at the current build farm failures on HEAD and  
> REL8_1_STABLE, it doesn't look like any of the failures are failing  here.

I just started to wonder about buildfarm, too, but found that most build 
farm members have --enable-integer-datetimes. Could that be an 
explanation? Is it possible that the code is wrong with 
--enable-integer-datetimes?

So what do you have in results/interval.out?
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?

Tom wrote for that commit:
revision 1.14
date: 2005/10/25 17:13:07;  author: tgl;  state: Exp;  lines: +1 -1
Remove justify_hours call from interval_mul and interval_div, and make
some small stylistic improvements in these functions.  Also fix several
places where TMODULO() was being used with wrong-sized quotient argument,
creating a risk of overflow --- interval2tm was actually capable of going
into an infinite loop because of this.

Perhaps it is an intended behavior? If so, it still fails without 
integer-datetimes.

Best Regards,
Michael Paesold


Re: Interval aggregate regression failure (expected seems

From
Michael Paesold
Date:

Michael Paesold wrote:
>> On Nov 7, 2005, at 17:24 , Michael Paesold wrote:
>>
>>> Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I  
>>> get a regression failure in the interval tests. I am no export for  
>>> the interval type, but the expected "9 days 28 hours" seem wrong,  
>>> don't they? The actual value seems to be the same.
>>>
>>> Is it possible that this is broken on the platform where the  
>>> expected results were generated?

> Perhaps it is an intended behavior? If so, it still fails without 
> integer-datetimes.

Well, no, it also fails with integer-datetimes for me in the same way.
pg_config output below. And yes, I did "cvs up; make distclean; 
./configure... ; make ; make install ; make check".

Could this be DST-related? I thought plain interval was not affected by 
DST changes.

BINDIR = /usr/local/postgresql-8cvs/bin
DOCDIR = /usr/local/postgresql-8cvs/doc
INCLUDEDIR = /usr/local/postgresql-8cvs/include
PKGINCLUDEDIR = /usr/local/postgresql-8cvs/include
INCLUDEDIR-SERVER = /usr/local/postgresql-8cvs/include/server
LIBDIR = /usr/local/postgresql-8cvs/lib
PKGLIBDIR = /usr/local/postgresql-8cvs/lib
LOCALEDIR =
MANDIR = /usr/local/postgresql-8cvs/man
SHAREDIR = /usr/local/postgresql-8cvs/share
SYSCONFDIR = /usr/local/postgresql-8cvs/etc
PGXS = /usr/local/postgresql-8cvs/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/postgresql-8cvs' '--with-pgport=54321' 
'--with-perl' 'CFLAGS=-O2 -mcpu=pentium4 -march=pentium4' 
'--enable-casserts' '--enable-debug' '--enable-integer-datetimes'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes 
-Wpointer-arith -Winline -Wdeclaration-after-statement 
-fno-strict-aliasing -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,/usr/local/postgresql-8cvs/lib
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -lresolv -lnsl -ldl -lm 
-lbsd
VERSION = PostgreSQL 8.2devel

Best Regards,
Michael Paesold


Re: Interval aggregate regression failure (expected seems wrong)

From
Michael Glaesemann
Date:
On Nov 7, 2005, at 18:28 , Michael Paesold wrote:

> Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux  
> 2.4.21, glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without  
> x86-64).
>
>> I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3

Heh. I forgot, too ;) My test was also without integer-datetimes.

> [snip]
>> I didn't have any regression failures. I'd also expect we'd see a  
>> lot  more failures on the build farm if it were the case that it  
>> was  broken just on the platform that the expected results were  
>> generated  on. From a quick look at the current build farm  
>> failures on HEAD and  REL8_1_STABLE, it doesn't look like any of  
>> the failures are failing  here.
>
> I just started to wonder about buildfarm, too, but found that most  
> build farm members have --enable-integer-datetimes. Could that be  
> an explanation? Is it possible that the code is wrong with --enable- 
> integer-datetimes?
>
> So what do you have in results/interval.out?
> @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?
>

select avg(f1) from interval_tbl;                       avg
-------------------------------------------------
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
(1 row)

The point of the change to the interval datatype in 8.1 is to keep  
track of months, days, and seconds (which in turn are represented as  
hours, minutes and seconds). Previous releases tracked only months  
and seconds. This has advantages for using intervals with dates and  
timestamps that involve daylight saving time changes. Admittedly, it  
looks odd at first, but it falls out of the change in behavior of the  
interval datatype. There are two new functions, justify_days and  
justify_hours, that you can use to put intervals into more  
traditional forms.

http://developer.postgresql.org/docs/postgres/functions-datetime.html

Doesn't explain why you're getting a regression failure though.

Michael Glaesemann
grzm myrealbox com



Re: Interval aggregate regression failure (expected seems

From
Michael Paesold
Date:
Michael Glaesemann wrote:
>> So what do you have in results/interval.out?
>> @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?
>>
> 
> select avg(f1) from interval_tbl;
>                        avg
> -------------------------------------------------
> @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
> (1 row)
> 
> The point of the change to the interval datatype in 8.1 is to keep  
> track of months, days, and seconds (which in turn are represented as  
> hours, minutes and seconds). Previous releases tracked only months  and 
> seconds. This has advantages for using intervals with dates and  
> timestamps that involve daylight saving time changes. Admittedly, it  
> looks odd at first, but it falls out of the change in behavior of the  
> interval datatype. There are two new functions, justify_days and  
> justify_hours, that you can use to put intervals into more  traditional 
> forms.
> 
> http://developer.postgresql.org/docs/postgres/functions-datetime.html

Thank you very much for the insight.

> Doesn't explain why you're getting a regression failure though.

Well, I have something now. It seems to be a compiler/optimization issue.

I wrote:> CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes> -Wpointer-arith -Winline
-Wdeclaration-after-statement>-fno-strict-aliasing -g
 

I had set CFLAGS to -O2 -mcpu=pentium4 -march=pentium4. I have been 
using these settings for testing PostgreSQL tip for some time now and 
never had any problems.

Removing the cpu and architecture optimization part changes the behavior 
of the interval aggrate, so the results/interval.out now also looks like 
the expected output.
select avg(f1) from interval_tbl;                       avg
------------------------------------------------- @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
(1 row)

Switching -mcpu=pentium4 -march=pentium4 back on, results in wrong 
output. This is 100% reproducable. Can somebody with more knowledge 
explain why the compiler should stumble over just this? Pure luck?

I have tested these combination of CFLAGS:
-O2                                 OK
-O2 -mcpu=i686 -march=i686          OK (good, RPMS are built with these)
-O2 -mcpu=pentium4 -march=i686      OK
-O2 -mcpu=pentium4 -march=pentium4  fails

I am definatly not going to use -march=pentium4 in any production 
system. Should I open a bug report with RedHat (gcc vendor)?

Best Regards,
Michael Paesold


Re: Interval aggregate regression failure (expected seems

From
Tom Lane
Date:
Michael Paesold <mpaesold@gmx.at> writes:
> I have tested these combination of CFLAGS:
> -O2                                 OK
> -O2 -mcpu=i686 -march=i686          OK (good, RPMS are built with these)
> -O2 -mcpu=pentium4 -march=i686      OK
> -O2 -mcpu=pentium4 -march=pentium4  fails

> I am definatly not going to use -march=pentium4 in any production 
> system. Should I open a bug report with RedHat (gcc vendor)?

Yeah, but they'll probably want a smaller test case than "Postgres fails
its regression tests" :-(

My guess is that the problem is misoptimization of the interval_div()
function (look in utils/adt/timestamp.c), and that you could probably
construct a nice small test case for them out of that function.
        regards, tom lane


Re: Interval aggregate regression failure (expected seems

From
Tom Lane
Date:
I wrote:
> Michael Paesold <mpaesold@gmx.at> writes:
>> I am definatly not going to use -march=pentium4 in any production 
>> system. Should I open a bug report with RedHat (gcc vendor)?

> Yeah, but they'll probably want a smaller test case than "Postgres fails
> its regression tests" :-(

I have just confirmed that the problem still exists in FC4's current
compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
priority of the complaint quite a long way in Red Hat's eyes.

I've also confirmed that the problem is in interval_div; you can
reproduce the failure with
select '41 years 1 mon 11 days'::interval / 10;

which should give '4 years 1 mon 9 days 26:24:00', but when
timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
you get '4 years 1 mon 10 days 02:24:00'.  --enable-integer-datetimes
is not relevant because the interesting part is all double/integer
arithmetic.

Looking at this, though, I wonder if the pentium4 answer isn't "more
correct".  If I'm doing the math by hand correctly, what we end up
with is having to cascade 3/10ths of a month down into the days field,
and since the conversion factor is supposed to be 30 days/month, that
should be exactly 9 days.  Plus the one full day from the 11/10 days
gives 10 days.  I think what is happening on all the non-Pentium
platforms is that (3.0/10.0)*30.0 is producing something just a shade
less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
due to rearrangement of the calculation.  I think we can still file this
as a compiler bug, because I'm pretty sure the C spec does not allow
rearrangement of floating-point calculations ... but we might want to
think about tweaking the code's roundoff behavior just a bit.

An example that's a little easier to look at is
select '41 years 1 mon'::interval / 10;

I get '4 years 1 mon 9 days' with the pentium4 optimization, and
'4 years 1 mon 8 days 24:00:00' without, and the former seems more
correct...
        regards, tom lane


Re: Interval aggregate regression failure (expected seems

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I think we can still file this as a compiler bug, because I'm pretty sure
> the C spec does not allow rearrangement of floating-point calculations ...

It may have more to do with whether the floating point value can stay in a
floating point register long enough to complete the calculation. 

IIRC, floating point registers are actually longer than a double so if the
entire calculation is done in registers and then the result rounded off to
store in memory it may get the right answer. Whereas if it loses the extra
bits on the intermediate values (the infinite repeating fractions) that might
be where you get the imprecise results.

It makes some sense that -mcpu and -march give the compiler enough information
to keep the intermediate results in registers more effectively.

-- 
greg



Re: Interval aggregate regression failure (expected seems

From
Greg Stark
Date:
c.f.:


`-ffloat-store'    Do not store floating point variables in registers, and inhibit    other options that might change
whethera floating point value is    taken from a register or memory.
 
    This option prevents undesirable excess precision on machines such    as the 68000 where the floating registers (of
the68881) keep more    precision than a `double' is supposed to have.  Similarly for the    x86 architecture.  For most
programs,the excess precision does    only good, but a few programs rely on the precise definition of    IEEE floating
point. Use `-ffloat-store' for such programs, after    modifying them to store all pertinent intermediate computations
 into variables.
 

-- 
greg



Re: Interval aggregate regression failure (expected seems

From
Gregory Maxwell
Date:
On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote:
> IIRC, floating point registers are actually longer than a double so if the
> entire calculation is done in registers and then the result rounded off to
> store in memory it may get the right answer. Whereas if it loses the extra
> bits on the intermediate values (the infinite repeating fractions) that might
> be where you get the imprecise results.

Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse.
SSE is a much better choice on P4 for performance reasons, and never
has excess precision. I'm guessing from the above that I'm incorrect,
in which case we should always be compiled with -mfpmath=sse -msse2
when we are complied -march=pentium4, this should remove problems
caused by excess precision. The same behavior can be had on non sse
platforms with -ffloat-store.


Re: Interval aggregate regression failure (expected seems

From
Michael Paesold
Date:
Gregory Maxwell wrote:
> On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote:
> 
>>IIRC, floating point registers are actually longer than a double so if the
>>entire calculation is done in registers and then the result rounded off to
>>store in memory it may get the right answer. Whereas if it loses the extra
>>bits on the intermediate values (the infinite repeating fractions) that might
>>be where you get the imprecise results.
> 
> 
> Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse. 
> SSE is a much better choice on P4 for performance reasons, and never
> has excess precision. I'm guessing from the above that I'm incorrect,
> in which case we should always be compiled with -mfpmath=sse -msse2
> when we are complied -march=pentium4, this should remove problems
> caused by excess precision. The same behavior can be had on non sse
> platforms with -ffloat-store.

Just for the record (and those interested): using 'CFLAGS=-O2 
-mcpu=pentium4 -march=pentium4 -mfpmath=sse -msse2' actually passes the 
regression tests.

Best Regards,
Michael Paesold


Re: Interval aggregate regression failure (expected seems

From
Michael Paesold
Date:
Tom Lane wrote:
> I wrote:
> 
>>Michael Paesold <mpaesold@gmx.at> writes:
>>
>>>I am definatly not going to use -march=pentium4 in any production 
>>>system. Should I open a bug report with RedHat (gcc vendor)?
> 
> 
>>Yeah, but they'll probably want a smaller test case than "Postgres fails
>>its regression tests" :-(
> 
> 
> I have just confirmed that the problem still exists in FC4's current
> compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
> priority of the complaint quite a long way in Red Hat's eyes.
> 
> I've also confirmed that the problem is in interval_div; you can
> reproduce the failure with
> 
>     select '41 years 1 mon 11 days'::interval / 10;
[snip]

Would you mind reporting this to RedHat Bugzilla? I believe a bug report 
from you would have more weight then mine, because you actually 
understand what's going on here. :-)

Otherwise I am going to do do my best...

Best Regards,
Michael Paesold


Re: Interval aggregate regression failure (expected seems

From
Tom Lane
Date:
Michael Paesold <mpaesold@gmx.at> writes:
> Would you mind reporting this to RedHat Bugzilla? I believe a bug report 
> from you would have more weight then mine, because you actually 
> understand what's going on here. :-)

Actually, given the thought that this may be an artifact of keeping an
intermediate value in a wider-than-normal register rather than genuinely
rearranging the computation, I'm not certain it is a compiler bug.
We'd have to study it a lot more closely before filing it as one, anyway.

If you accept the idea that the pentium4 answer is the right one,
then what we really need to do is focus on a better rounding rule than
"strict truncation".  I was toying with the notion of adding the
equivalent of half a microsecond to the fractional-day value before
truncating it to integer.  But I'm not certain that that wouldn't have
some bad effects in other cases.
        regards, tom lane


Re: Interval aggregate regression failure (expected seems

From
Bruce Momjian
Date:
Tom Lane wrote:
> Michael Paesold <mpaesold@gmx.at> writes:
> > Would you mind reporting this to RedHat Bugzilla? I believe a bug report 
> > from you would have more weight then mine, because you actually 
> > understand what's going on here. :-)
> 
> Actually, given the thought that this may be an artifact of keeping an
> intermediate value in a wider-than-normal register rather than genuinely
> rearranging the computation, I'm not certain it is a compiler bug.
> We'd have to study it a lot more closely before filing it as one, anyway.
> 
> If you accept the idea that the pentium4 answer is the right one,
> then what we really need to do is focus on a better rounding rule than
> "strict truncation".  I was toying with the notion of adding the
> equivalent of half a microsecond to the fractional-day value before
> truncating it to integer.  But I'm not certain that that wouldn't have
> some bad effects in other cases.

Looking at the code, do we need additional rint() calls in there, or
rint(x + 0.5)?  Frankly, I am confused why interval_div() has caused
such problems for us?  Are we going at this the right way?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Interval aggregate regression failure (expected seems

From
Bruce Momjian
Date:
I assume no more progress has been made on this?

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > Michael Paesold <mpaesold@gmx.at> writes:
> >> I am definatly not going to use -march=pentium4 in any production 
> >> system. Should I open a bug report with RedHat (gcc vendor)?
> 
> > Yeah, but they'll probably want a smaller test case than "Postgres fails
> > its regression tests" :-(
> 
> I have just confirmed that the problem still exists in FC4's current
> compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
> priority of the complaint quite a long way in Red Hat's eyes.
> 
> I've also confirmed that the problem is in interval_div; you can
> reproduce the failure with
> 
>     select '41 years 1 mon 11 days'::interval / 10;
> 
> which should give '4 years 1 mon 9 days 26:24:00', but when
> timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
> you get '4 years 1 mon 10 days 02:24:00'.  --enable-integer-datetimes
> is not relevant because the interesting part is all double/integer
> arithmetic.
> 
> Looking at this, though, I wonder if the pentium4 answer isn't "more
> correct".  If I'm doing the math by hand correctly, what we end up
> with is having to cascade 3/10ths of a month down into the days field,
> and since the conversion factor is supposed to be 30 days/month, that
> should be exactly 9 days.  Plus the one full day from the 11/10 days
> gives 10 days.  I think what is happening on all the non-Pentium
> platforms is that (3.0/10.0)*30.0 is producing something just a shade
> less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
> due to rearrangement of the calculation.  I think we can still file this
> as a compiler bug, because I'm pretty sure the C spec does not allow
> rearrangement of floating-point calculations ... but we might want to
> think about tweaking the code's roundoff behavior just a bit.
> 
> An example that's a little easier to look at is
> 
>     select '41 years 1 mon'::interval / 10;
> 
> I get '4 years 1 mon 9 days' with the pentium4 optimization, and
> '4 years 1 mon 8 days 24:00:00' without, and the former seems more
> correct...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Interval aggregate regression failure (expected seems

From
Michael Glaesemann
Date:
Tom Lane wrote:

> I've also confirmed that the problem is in interval_div; you can
> reproduce the failure with
>
>     select '41 years 1 mon 11 days'::interval / 10;
>
> which should give '4 years 1 mon 9 days 26:24:00', but when
> timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
> you get '4 years 1 mon 10 days 02:24:00'.  --enable-integer-datetimes
> is not relevant because the interesting part is all double/integer
> arithmetic.
>
> Looking at this, though, I wonder if the pentium4 answer isn't "more
> correct".  If I'm doing the math by hand correctly, what we end up
> with is having to cascade 3/10ths of a month down into the days field,
> and since the conversion factor is supposed to be 30 days/month, that
> should be exactly 9 days.  Plus the one full day from the 11/10 days
> gives 10 days.  I think what is happening on all the non-Pentium
> platforms is that (3.0/10.0)*30.0 is producing something just a shade
> less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
> due to rearrangement of the calculation.  I think we can still file  
> this
> as a compiler bug, because I'm pretty sure the C spec does not allow
> rearrangement of floating-point calculations ... but we might want to
> think about tweaking the code's roundoff behavior just a bit.

I took a naive look at this, but have been unable to come up with a  
satisfactory solution. Here's an even simpler example that exhibits  
the behavior:

# select '41 mon'::interval / 10;        ?column?
------------------------
4 mons 2 days 24:00:00

And one using a non-integer divisor:

# select '2 mon'::interval / 1.5;       ?column?
-----------------------
1 mon 9 days 24:00:00

Here's the relevant code in interval_div (timestamp.c c2573).  
month_remainder holds the fractional part of the months field of the  
original interval (41) divided by the divisor (10) as a double.
/* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;result->day +=
(int32)month_remainder_days;/* fractional months partial days into time */day_remainder += month_remainder_days -
(int32)month_remainder_days;
 

#ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder *  
USECS_PER_DAY);
#elseresult->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif

My understanding is that as month_remainder is a float (as is  
month_remainder_days), month_remainder_days may be equal to 24 hours  
after rounding. As we're converting from months to days, and from  
days to time, rather than from months to time directly, we're  
assuming that we should only have time less than 24 hours remaining  
in the month_remainder_days when it's added to day_remainder. If  
month_remainder_days is equal to 24 hours, we should increment result- >day rather than carrying that down into
result->time.

With that in mind, I produced this hack:

#ifdef HAVE_INT64_TIMESTAMP    month_remainder_seconds = month_remainder_day_fraction *  
USECS_PER_DAY;
    if ( USECS_PER_DAY == rint(month_remainder_seconds) )        result->day++;    else if ( USECS_PER_DAY == -
rint(month_remainder_seconds))        result->day--;    else day_remainder += month_remainder_day_fraction;
 
result->time = rint(span->time / factor + day_remainder *  
USECS_PER_DAY);
#else    month_remainder_seconds = month_remainder_day_fraction *  
SECS_PER_DAY;
    if ( SECS_PER_DAY == (int32) month_remainder_seconds )        result->day++;    else if ( SECS_PER_DAY == - (int32)
month_remainder_seconds)       result->day--;    else day_remainder += month_remainder_day_fraction;
 
result->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif


It works okay with --enable-integer-datetimes

postgres=# select '41 mon'::interval/10;   ?column?
---------------
4 mons 3 days
(1 row)

postgres=# select '2 mon'::interval/1.5;   ?column?
---------------
1 mon 10 days
(1 row)

It also changes the result of the aggregate test for intervals, but I  
think that's to be expected.

*** ./expected/interval.out    Tue Mar  7 07:49:17 2006
--- ./results/interval.out    Thu Jun 22 22:52:41 2006
***************
*** 218,224 ****  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs  (1 row)
  -- test long interval input
--- 218,224 ----  select avg(f1) from interval_tbl;                         avg
-------------------------------------------------
!  @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs  (1 row)
  -- test long interval input


But doesn't work without --enable-integer-datetimes. It still gives  
the same answer as before. I don't have a firm grasp of floating  
point math so I'm fully aware this might be entirely the wrong way to  
go about this. It definitely feels kludgy (especially my sign- 
handling), but I submit this in the hope it moves the discussion  
forward a little. If someone wanted to point me in the right  
direction, I'll try to finish this up, updating the regression test  
and adding a few more to test this more thoroughly.

Thanks!

Michael Glaesemann
grzm myrealbox com





Re: Interval aggregate regression failure (expected seems

From
Michael Glaesemann
Date:
On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote:

> It also changes the result of the aggregate test for intervals, but  
> I think that's to be expected.

My goodness. Of course it changes the aggregate test results. That  
was what brought this up in the first place. (*kicks self for not  
reading the subject line when responding*)

Michael Glaesemann
grzm seespotcode net





Re: Interval aggregate regression failure (expected seems

From
Michael Glaesemann
Date:
On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote:

> # select '41 mon'::interval / 10;
>         ?column?
> ------------------------
> 4 mons 2 days 24:00:00

> My understanding is that as month_remainder is a float (as is  
> month_remainder_days), month_remainder_days may be equal to 24  
> hours after rounding. As we're converting from months to days, and  
> from days to time, rather than from months to time directly, we're  
> assuming that we should only have time less than 24 hours remaining  
> in the month_remainder_days when it's added to day_remainder.

This behavior is the same as applying justify_hours before adding the  
days and time components to the result. With this in mind, I rewrote  
interval_div to call interval_justify_hours. Good news is that -- 
enable-integer-datetimes works as expected. Bad news is that without  
--enable-integer-datetimes, still get the current behavior.

I also came across something I think is odd:

# select version();                                                                    
version
------------------------------------------------------------------------ 
----------------------------------------------------------------------
PostgreSQL 8.1.4 on powerpc-apple-darwin8.7.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5341)
(1 row)

select justify_hours(a/10) as divided_and_justified    , justify_hours(b) as justified    , a/10 = b as are_equal
from (select '41 mon'::interval,'4 mons 2 days 24:00:00'::interval)  
as s(a,b);

without --enable-integer-datetimes:

divided_and_justified  |   justified   | are_equal
------------------------+---------------+-----------
4 mons 2 days 24:00:00 | 4 mons 3 days | t
(1 row)

with --enable-integer-datetimes:

divided_and_justified |   justified   | are_equal
-----------------------+---------------+-----------
4 mons 3 days         | 4 mons 3 days | t

I think this just confirms that there is some kind of rounding (or  
lack of) in interval_div. Kind of frustrating that it's not visible  
in the result.

Anyway, there's another data point.

Michael Glaesemann
grzm seespotcode net






Re: Interval aggregate regression failure (expected seems

From
Tom Lane
Date:
Michael Glaesemann <grzm@seespotcode.net> writes:
> ... I think this just confirms that there is some kind of rounding (or  
> lack of) in interval_div. Kind of frustrating that it's not visible  
> in the result.

I think the fundamental problem is that the float8 results of division
are inaccurate, and yet we're assuming that we can (for instance) coerce
them to integer and get exactly the right answer.  For instance, in the
'41 months'/10 example, I get month_remainder_days being computed as

(gdb) p month_remainder
$19 = 0.099999999999999645
(gdb) s
2575            result->day += (int32) month_remainder_days;
(gdb) p month_remainder_days
$20 = 2.9999999999999893

The only way we can really fix this is to be willing to round off
the numbers, and I think the only principled way to do that is to
settle on a specific target accuracy, probably 1 microsecond.
Then the thing to do would be to scale up all the intermediate
float results to microseconds and apply rint().  Something like
(untested)
month_remainder = rint(span->month * USECS_PER_MONTH / factor);day_remainder = rint(span->day * USECS_PER_DAY /
factor);result->month= (int32) (month_remainder / USECS_PER_MONTH);result->day = (int32) (day_remainder /
USECS_PER_DAY);month_remainder-= result->month * USECS_PER_MONTH;day_remainder -= result->day * USECS_PER_DAY;
 
/* * Handle any fractional parts the same way as in interval_mul. */
/* fractional months full days into days */month_remainder_days = month_remainder * DAYS_PER_MONTH;extra_days = (int32)
(month_remainder_days/ USECS_PER_DAY);result->day += extra_days;/* fractional months partial days into time
*/day_remainder+= month_remainder_days - extra_days * USECS_PER_DAY;
 

#ifdef HAVE_INT64_TIMESTAMPresult->time = rint(span->time / factor + day_remainder);
#elseresult->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6;
#endif

This might need a few more rint() calls --- I'm assuming that float ops
with exact integral inputs will be OK, which is an assumption used
pretty widely in the datetime code, but ...

Per the comment, if we do this here we probably want to make
interval_mul work similarly.
        regards, tom lane


Re: Interval aggregate regression failure (expected seems

From
Bruce Momjian
Date:
Have we made any progress on this?

---------------------------------------------------------------------------

Tom Lane wrote:
> Michael Glaesemann <grzm@seespotcode.net> writes:
> > ... I think this just confirms that there is some kind of rounding (or  
> > lack of) in interval_div. Kind of frustrating that it's not visible  
> > in the result.
> 
> I think the fundamental problem is that the float8 results of division
> are inaccurate, and yet we're assuming that we can (for instance) coerce
> them to integer and get exactly the right answer.  For instance, in the
> '41 months'/10 example, I get month_remainder_days being computed as
> 
> (gdb) p month_remainder
> $19 = 0.099999999999999645
> (gdb) s
> 2575            result->day += (int32) month_remainder_days;
> (gdb) p month_remainder_days
> $20 = 2.9999999999999893
> 
> The only way we can really fix this is to be willing to round off
> the numbers, and I think the only principled way to do that is to
> settle on a specific target accuracy, probably 1 microsecond.
> Then the thing to do would be to scale up all the intermediate
> float results to microseconds and apply rint().  Something like
> (untested)
> 
>     month_remainder = rint(span->month * USECS_PER_MONTH / factor);
>     day_remainder = rint(span->day * USECS_PER_DAY / factor);
>     result->month = (int32) (month_remainder / USECS_PER_MONTH);
>     result->day = (int32) (day_remainder / USECS_PER_DAY);
>     month_remainder -= result->month * USECS_PER_MONTH;
>     day_remainder -= result->day * USECS_PER_DAY;
> 
>     /*
>      * Handle any fractional parts the same way as in interval_mul.
>      */
> 
>     /* fractional months full days into days */
>     month_remainder_days = month_remainder * DAYS_PER_MONTH;
>     extra_days = (int32) (month_remainder_days / USECS_PER_DAY);
>     result->day += extra_days;
>     /* fractional months partial days into time */
>     day_remainder += month_remainder_days - extra_days * USECS_PER_DAY;
> 
> #ifdef HAVE_INT64_TIMESTAMP
>     result->time = rint(span->time / factor + day_remainder);
> #else
>     result->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6;
> #endif
> 
> This might need a few more rint() calls --- I'm assuming that float ops
> with exact integral inputs will be OK, which is an assumption used
> pretty widely in the datetime code, but ...
> 
> Per the comment, if we do this here we probably want to make
> interval_mul work similarly.
> 
>             regards, tom lane

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Interval aggregate regression failure (expected seems

From
Michael Glaesemann
Date:
On Aug 11, 2006, at 13:48 , Bruce Momjian wrote:

>
> Have we made any progress on this?

I made a bit of progress but am still having issues when --enable- 
integer-datetimes is not enabled. I need to spend some time with gdb  
and figure out what's going on. I probably won't be able to get time  
to look at it until Aug 18.

Michael Glaesemann
grzm seespotcode net