Followup on the UnixWare Optimizer bug. - Mailing list pgsql-hackers

From Larry Rosenman
Subject Followup on the UnixWare Optimizer bug.
Date
Msg-id E1E85Qg-000082-1T@lerami.lerctr.org
Whole thread Raw
Responses Re: Followup on the UnixWare Optimizer bug.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Followup on the UnixWare Optimizer bug.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
The following is from my SCO Internal contact about the bug.  It's
definitely their bug.  Towards the end of the
Exact diagnosis, is a suggested work-around for now, as well as a (possible)
memory leak.

-----
Dave and I were convinced that the CSE optimization was correct and
manufactured data that we pushed through the interval_div() functions with
and without the CSE optimization was producing the same results.

It was only when we identified the exact input values from the interval
regression test were we able to see the problem

With span->month = 493     span->day   =  11     factor = 10;

The CSE optimization problemd occures with the following code from
interval_div() and I am assuming also in interval_mul()
        /* Cascade fractions to lower units */        /* fractional months full days into days */        result->day +=
month_remainder* DAYS_PER_MONTH;        /* fractional months partial days into time */        day_remainder +=
(month_remainder* DAYS_PER_MONTH) -                                         (int)(month_remainder *
 
DAYS_PER_MONTH);

At the point of failure:

month_remainder =  49.3 - 49 = .3   (INEXACT) and represented in the
80 bit FP register as .29999999999999999

month_remainder * 30 = 8.99999999999999997  (also inexact)

That results in result->day = 8 + 1 = 9, but day_remainder is    .999999999999997 + the .1 left from the earlier
division.

The later call to interval_justify_hours subtracts the 1 days worth of
seconds from time ( day_mainder * SECS_PER_DAY + time portion)......
and bumps result->day by 1  ==> 10.

The FAILURE is because the compiler is trying to reduce the 3 FP multiples
to 1 multiply; using the value 3 times.  The problem occurs because the
8.999999999999997 is inexact and when the CSE values is stored as a
"double", it is rounded to 9.0.

On the 2nd and 3rd uses of the value, the  9.0 (64-bit FP data) is used; but
the 1st use is still using the 80-bit (8.9999999999999) value. So the inter
truncation still gets 8, but the calculation day_remainder now is (9.0 - 9)
+ .1 (previous day_remainder contents).

Our bug is that either:
   - the CSE value should have been preserved as an 80-bit long double     since that is how the internal calculations
arebeing done.....
 
     This is the "correct" fix and will take us some time to make     certain that we haven't broken anything.
   - the CSE value have been rounded (to 64-bit precision) before use     in all 3 points in the code.
     This would have resulted in result->day = 9 + 1 = 10  and the     resuilt->time would have been correctly less
than1 day.  The     interval_justify_hours() would make not adjustments and result     would be identical - as
expected.

As I said, this will take us some time to work up the fix and revalidate the
compiler.  Since you have release coming up, I want to suggest the follow
work-around for a Common Subexpression Elimination (CSE) bug in "some"
compiler.......


For both interval_div() and interval_mul()
        double CSE;
        /* Cascade fractions to lower units */        /* fractional months full days into days */        CSE =
month_remainder* DAYS_PER_MONTH;        result->day += CSE;        /* fractional months partial days into time */
day_remainder += (CSE) - (int)(CSE);
 


Also note that there appears to be a memory leak in the interval_****
routines.  For example interval_div() allocates a "result" Interval.
It eventually passes this result through to interval_justify_hours() which
allocates another Interval "result" and that "result" is what gets passed
back to caller on interval_div().  The 1st Interval allocated appears to be
left around.......
------

I will get a pre-release copy of the compiler to test, but it will take a
while, since they have to revalidate it.

Comments?



-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 3535 Gaspar Drive, Dallas, TX 75220-3611 US



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Pre-allocated free space for row updating (like PCTFREE)
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] Proposed patch to getaddrinfo.c to support