Re: Have I found an interval arithmetic bug? - Mailing list pgsql-hackers

From Bryn Llewellyn
Subject Re: Have I found an interval arithmetic bug?
Date
Msg-id 654FEA5F-79CF-414A-9801-F1482EEBCD88@yugabyte.com
Whole thread Raw
In response to Re: Have I found an interval arithmetic bug?  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Have I found an interval arithmetic bug?  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 21-Jul-2021, at 17:07, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Jul 21, 2021 at 01:29:49PM -0400, Tom Lane wrote:
Bryn Llewellyn <bryn@yugabyte.com> writes:
It was me that started the enormous thread with the title “Have I found an interval arithmetic bug?” on 01-Apr-2021. I presented this testcase:

select interval '-1.7 years';                          -- -1 years -8 mons

select interval '29.4 months';                         --  2 years  5 mons 12 days

select interval '-1.7 years 29.4 months';              --           8 mons 12 days << wrong
select interval '29.4 months -1.7 years';              --           9 mons 12 days

select interval '-1.7 years' + interval '29.4 months'; --           9 mons 12 days
select interval '29.4 months' + interval '-1.7 years'; --           9 mons 12 days

The consensus was that the outcome that I flagged with “wrong” does indeed have that status.

Yeah, I think it's self-evident that your last four cases should
produce the same results.  Whether '9 mons 12 days' is the best
possible result is debatable --- in a perfect world, maybe we'd
produce '9 mons' exactly --- but given that the first two cases
produce what they do, that does seem self-consistent.  I think
we should be setting out to fix that outlier without causing
any of the other five results to change.

OK, I decided to reverse some of the changes I was proposing once I
started to think about the inaccuracy of not spilling down from 'weeks'
to seconds when hours also appear.  The fundamental issue is that the
months-to-days conversion is almost always an approximation, while the
days to seconds conversion is almost always accurate.  This means we are
never going to have consistent spill-down that is useful.

Therefore, I went ahead and accepted that years and larger units spill
only to months, months spill only to days, and weeks and lower spill all
the way down to seconds.  I also spelled this out in the docs, and
explained why we have this behavior.

Also, with my patch, the last four queries return the same result
because of the proper rounding also added by the patch, attached.

Your statement

“months-to-days conversion is almost always an approximation, while the days to seconds conversion is almost always accurate.” 

is misleading. Any conversion like these (and also the “spill up” conversions that the justify_hours(), justify_days(), and justify_interval() built-in functions bring) are semantically dangerous because of the different rules for adding a pure months, a pure days, or a pure seconds interval to a timestamptz value.

Unless you avoid mixed interval values, then it’s so hard (even though it is possible) to predict the outcomes of interval arithmetic. Rather, all you get is emergent behavior that I fail to see can be relied upon in deliberately designed application code. Here’s a telling example:

set timezone = 'America/Los_Angeles';
with
  c as (
    select
      '2021-03-13 19:00:00 America/Los_Angeles'::timestamptz as d,
      '25 hours'::interval                                   as i)
select
  d +               i  as "d + i",
  d + justify_hours(i) as "d + justify_hours(i)"
from c;

This is the result:

         d + i          |  d + justify_hours(i)  
------------------------+------------------------
 2021-03-14 21:00:00-07 | 2021-03-14 20:00:00-07

The two results are different, even though the native equality test shows that the two different interval values are the same:

with
  c as (select '25 hours'::interval as i)
select (i = justify_hours(i))::text
from c;

The result is TRUE.

The only route to sanity is to use only pure interval values (i.e. where only one of the fields of the internal [mm, dd, ss] representation is non-zero.

I mentioned that you can use a set of three domain types to enforce your intended practice here.

In other words, by programming application code defensively, it’s possible to insulate oneself entirely from the emergent behavior of the decades old PG code that implements the unconstrained native interval functionality and that brings what can only be considered to be unpredictable results.

Moreover, this defensive approach insulates you from any changes that Bruce’s patch might make.

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: [PATCH] Use optimized single-datum tuplesort in ExecSort
Next
From: Bruce Momjian
Date:
Subject: Re: Have I found an interval arithmetic bug?