Re: Have I found an interval arithmetic bug? - Mailing list pgsql-hackers

From Bryn Llewellyn
Subject Re: Have I found an interval arithmetic bug?
Date
Msg-id AC1B8865-F73A-450F-92AF-4B6070D7F671@yugabyte.com
Whole thread Raw
In response to Re: Have I found an interval arithmetic bug?  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Have I found an interval arithmetic bug?
List pgsql-hackers
On 23-Jul-2021, bruce@momjian.us wrote:

On Fri, Jul 23, 2021 at 10:55:11AM -0700, Bryn Llewellyn wrote:
SELECT
 '1.2345 months 1.2345 days 1.2345 seconds'::interval =
 '1 month 1 day 1 second'::interval*1.2345;

In 13.3, the result is TRUE. (I know that this doesn’t guarantee that the
internal representations of the two compared interval values are the same. But
it’s a necessary condition for the outcome that I’m referring to and serves to
indecate the pont I’m making. A more careful test can be made.

So you are saying fractional unit output should match multiplication
output?  It doesn't now for all units:

SELECT interval '1.3443 years';
  interval
---------------
1 year 4 mons

SELECT interval '1 years' * 1.3443;
           ?column?
---------------------------------
1 year 4 mons 3 days 22:45:07.2

It is true this patch is further reducing that matching.  Do people
think I should make them match as part of this patch?

Summary:
--------

It seems to me that the best thing to do is fix the coarse bug about which there is unanimous agreement and to leave everything else (quirks and all) untouched.

Detail:
-------

My previous email (to which Bruce replies here) was muddled. Sorry. The challenge is that are a number of mechanisms at work. Their effects are conflated. And it’s very hard to unscramble them.

The underlying problem is that the internal representation of an interval value is a [mm, dd, ss] tuple. This fact is documented. The representation is key to understanding the definitions of these operations:

— defining an interval value from a text literal that uses real number values for its various fields.

— defining an interval value from make_interval(). (This one is easy because the API requires integral values for all but the seconds argument. It would be interesting to know why this asymmetrical definition was implemented. It seems to imply that somebody though that spilldown was a bad idea and should be prevented before it might happen.)

— creating the text typecast of an extant interval value.

— creating an interval value by adding/subtracting an extant interval value to/from another

— creating an interval value by multiplying or dividing an extant interval value by a (real) number

— creating an interval value by subtracting a pair of moments of the same data type (timestamptz, plain timestamp, or time)

— creating a new moment value by adding or subtracting an extant interval value to an extant moment value of the same data type.

— creating an interval value by applying  justify_hours(i), justify_days(i), and justify_interval(i) to an extant interval value, i.

— creating a double precision value by applying extract(epoch from i) 
to an extant interval value, i.

— evaluating inequality and equality tests to compare two extant interval values.

Notice that, for example, this test:

select
  interval  '1.3443 years' as i1,
  interval '1 years' * 1.3443 as i2;

conflates three things: converting an interval literal to a [mm, dd, ss] tuple; multiplying a [mm, dd, ss] tuple by a real number; and converting a [mm, dd, ss] tuple to a text representation. Similarly, this test:

select
  interval  '1.3443 years' <
  interval '1 years' * 1.3443;

conflates three things: converting an interval literal to a [mm, dd, ss] tuple; multiplying a [mm, dd, ss] tuple by a real number; and inequality comparison of two [mm, dd, ss] two [mm, dd, ss] tuples.

As far as I’ve been able, the PG documentation doesn’t do a good job of defining the semantics of any of these operations. Some (like the “justify” functions” are sketched reasonably well. Others, like interval multiplication, are entirely undefined.

This makes discussion of simple test like the two I showed immediately above hard. It also makes any discussion of correctness, possible bugs, and proposed implementation changes very difficult.

Further, it also makes it hard to see how tests for application code that uses any of these operations can be designed. The normal approach relies on testing that you get what you expect. But here, you don't know what to expect—unless (as I’ve done) you first assert hypotheses for the undefined operations and test them with programmed simulations. Of course, this is, in general, an unreliable approach. The only way to know what code is intended to do is to read the prose specification that informs the implementation.

I had forgotten one piece of the long history of this thread. Soon after I presented the testcase that folks agree shows a clear bug, I asked about the rules for creating the internal [mm, dd, ss] tuple from a text literal that uses real numbers for the fields. My tests showed two things: (1) an intuitively clear model for the spilldown of nonintegral months to days and then, in turn, of nonintegral days to seconds; and (2) a quirky rule for deriving intermediate months from fractional years and fractional months before then using the more obvious rules to spill to days. (This defies terse expression in prose. I copied my PL/pgSQL implementation below.)

There was initially some discussion about changing implementation o the spill-down from [years, months] in the interval literal to the ultimate [mm, dd, ss] representation. This is what's Bruces is asking about. And it's what I was muddled about.

As I’ve said, my conclusion is that the only safe approach is to create and use only “pure” interval values (where just one of the internal fields is non-zero). For this reason (and having seen what I decided was the impossibly unmemorable rules that my modeled implementation uses) I didn’t look at the rules for the other fields that the interval literal allows (weeks, centuries, millennia, and so on).

--------------------------------------------------------------------------------
mm_trunc                constant int              not null := trunc(p.mm);
mm_remainder            constant double precision not null := p.mm - mm_trunc::double precision;

-- This is a quirk.
mm_out                  constant int              not null := trunc(p.yy*mm_per_yy) + mm_trunc;

dd_real_from_mm         constant double precision not null := mm_remainder*dd_per_mm;

dd_int_from_mm          constant int              not null := trunc(dd_real_from_mm);
dd_remainder_from_mm    constant double precision not null := dd_real_from_mm - dd_int_from_mm::double precision;

dd_int_from_user        constant int              not null := trunc(p.dd);
dd_remainder_from_user  constant double precision not null := p.dd - dd_int_from_user::double precision;

dd_out                  constant int              not null := dd_int_from_mm + dd_int_from_user;

d_remainder             constant double precision not null := dd_remainder_from_mm + dd_remainder_from_user;

ss_out                  constant double precision not null := d_remainder*ss_per_dd +
                                                              p.hh*ss_per_hh +
                                                              p.mi*ss_per_mi +
                                                              p.ss;
--------------------------------------------------------------------------------












pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Removing "long int"-related limit on hash table sizes
Next
From: Ranier Vilela
Date:
Subject: Re: Removing "long int"-related limit on hash table sizes