Thread: Followup on the UnixWare Optimizer bug.

Followup on the UnixWare Optimizer bug.

From
"Larry Rosenman"
Date:
The following is from my SCO Internal contact about the bug.  It's
definitely their bug.  Towards the end of the
Exact diagnosis, is a suggested work-around for now, as well as a (possible)
memory leak.

-----
Dave and I were convinced that the CSE optimization was correct and
manufactured data that we pushed through the interval_div() functions with
and without the CSE optimization was producing the same results.

It was only when we identified the exact input values from the interval
regression test were we able to see the problem

With span->month = 493     span->day   =  11     factor = 10;

The CSE optimization problemd occures with the following code from
interval_div() and I am assuming also in interval_mul()
        /* Cascade fractions to lower units */        /* fractional months full days into days */        result->day +=
month_remainder* DAYS_PER_MONTH;        /* fractional months partial days into time */        day_remainder +=
(month_remainder* DAYS_PER_MONTH) -                                         (int)(month_remainder *
 
DAYS_PER_MONTH);

At the point of failure:

month_remainder =  49.3 - 49 = .3   (INEXACT) and represented in the
80 bit FP register as .29999999999999999

month_remainder * 30 = 8.99999999999999997  (also inexact)

That results in result->day = 8 + 1 = 9, but day_remainder is    .999999999999997 + the .1 left from the earlier
division.

The later call to interval_justify_hours subtracts the 1 days worth of
seconds from time ( day_mainder * SECS_PER_DAY + time portion)......
and bumps result->day by 1  ==> 10.

The FAILURE is because the compiler is trying to reduce the 3 FP multiples
to 1 multiply; using the value 3 times.  The problem occurs because the
8.999999999999997 is inexact and when the CSE values is stored as a
"double", it is rounded to 9.0.

On the 2nd and 3rd uses of the value, the  9.0 (64-bit FP data) is used; but
the 1st use is still using the 80-bit (8.9999999999999) value. So the inter
truncation still gets 8, but the calculation day_remainder now is (9.0 - 9)
+ .1 (previous day_remainder contents).

Our bug is that either:
   - the CSE value should have been preserved as an 80-bit long double     since that is how the internal calculations
arebeing done.....
 
     This is the "correct" fix and will take us some time to make     certain that we haven't broken anything.
   - the CSE value have been rounded (to 64-bit precision) before use     in all 3 points in the code.
     This would have resulted in result->day = 9 + 1 = 10  and the     resuilt->time would have been correctly less
than1 day.  The     interval_justify_hours() would make not adjustments and result     would be identical - as
expected.

As I said, this will take us some time to work up the fix and revalidate the
compiler.  Since you have release coming up, I want to suggest the follow
work-around for a Common Subexpression Elimination (CSE) bug in "some"
compiler.......


For both interval_div() and interval_mul()
        double CSE;
        /* Cascade fractions to lower units */        /* fractional months full days into days */        CSE =
month_remainder* DAYS_PER_MONTH;        result->day += CSE;        /* fractional months partial days into time */
day_remainder += (CSE) - (int)(CSE);
 


Also note that there appears to be a memory leak in the interval_****
routines.  For example interval_div() allocates a "result" Interval.
It eventually passes this result through to interval_justify_hours() which
allocates another Interval "result" and that "result" is what gets passed
back to caller on interval_div().  The 1st Interval allocated appears to be
left around.......
------

I will get a pre-release copy of the compiler to test, but it will take a
while, since they have to revalidate it.

Comments?



-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 3535 Gaspar Drive, Dallas, TX 75220-3611 US



Re: Followup on the UnixWare Optimizer bug.

From
Tom Lane
Date:
"Larry Rosenman" <ler@lerctr.org> forwards:
> Also note that there appears to be a memory leak in the interval_****
> routines.  For example interval_div() allocates a "result" Interval.
> It eventually passes this result through to interval_justify_hours() which
> allocates another Interval "result" and that "result" is what gets passed
> back to caller on interval_div().  The 1st Interval allocated appears to be
> left around.......

Just to clarify my discussion with Bruce: this is a leak from the point
of view of these specific routines, but we do not care, because the
memory is leaked in a short-lived palloc context that will soon be reset
("soon" being "the next tuple cycle" in most cases).  We rely on this
behavior in a lot of places --- for example, detoasting a toasted input
datum leaks memory that in most places isn't explicitly cleaned up.
Cleaning up just a dozen or so bytes is really not worth the cycles
needed to do it.  Despite that, I wouldn't have objected to Bruce's
patch if it hadn't made the code noticeably more obscure.

As a general rule, datatype-specific functions only need to be paranoid
about not leaking memory if they may be invoked by index operations;
btree indexes, at least, call such functions in a query-lifespan memory
context.  (I think GIST was recently fixed to not do this, but btree
still does.)
        regards, tom lane


Re: Followup on the UnixWare Optimizer bug.

From
Tom Lane
Date:
"Larry Rosenman" <ler@lerctr.org> forwards:
> As I said, this will take us some time to work up the fix and revalidate the
> compiler.  Since you have release coming up, I want to suggest the follow
> work-around for a Common Subexpression Elimination (CSE) bug in "some"
> compiler.......

Done.  I think the code is more legible this way anyway.
        regards, tom lane