Re: [BUGS] BUG #3431: age() gets the days wrong - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [BUGS] BUG #3431: age() gets the days wrong
Date
Msg-id 17990.1183925224@sss.pgh.pa.us
Whole thread Raw
Responses Re: [BUGS] BUG #3431: age() gets the days wrong  (Pelle Johansson <pelle@morth.org>)
Re: [BUGS] BUG #3431: age() gets the days wrong  (Bruce Momjian <bruce@momjian.us>)
Re: [BUGS] BUG #3431: age() gets the days wrong  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
"Pelle Johansson" <pelle@morth.org> writes:
> The age() function seem to work by first counting months until less than a
> month remains to to the second argument, then counting days left. This
> doesn't give the correct result, as shown by this example:

> # select column1, age(column1, '2006-11-02'), date '2006-11-02' +
> age(column1, '2006-11-02') from (values ('2007-01-31'::date),
> ('2007-02-01')) as alias;
>   column1   |      age       |      ?column?
> ------------+----------------+---------------------
>  2007-01-31 | 2 mons 29 days | 2007-01-31 00:00:00
>  2007-02-01 | 2 mons 29 days | 2007-01-31 00:00:00
> (2 rows)

I took another look at this example.  I believe what is actually going
wrong here is that when timestamp_age converts a month into an
equivalent number of days, it uses the number of days in the first
month of the interval it's dealing with (ie, the month containing
the earlier of the two dates).  This is just wrong, because interval
addition adds months first and then days.  The appropriate conversion
to use is actually the length of the next-to-last month of the interval.

As an example, 8.2 and CVS HEAD produce

regression=# select age('2007-03-14', '2007-02-15');
   age
---------
 27 days
(1 row)

which is reasonable, but

regression=# select age('2007-04-14', '2007-02-15');
      age
---------------
 1 mon 27 days
(1 row)

is not so reasonable, nor is

regression=# select age('2007-03-14', '2007-01-15');
      age
---------------
 1 mon 30 days
(1 row)

If we change the code to use the next-to-last month of the interval
then these two cases produce '1 mon 30 days' and '1 mon 27 days'
respectively.

Another problem is that the code isn't doing the propagate-to-next-field
bit for negative fractional seconds.  Hence it produces

regression=# select age('2007-02-14 01:00:00', '2007-01-15 01:00:00.4');
         age
----------------------
 30 days -00:00:00.40
(1 row)

which is maybe not incorrect, but certainly fairly inconsistent with

regression=# select age('2007-02-14 01:00:00', '2007-01-15 01:00:01');
       age
------------------
 29 days 23:59:59
(1 row)


Hence I propose the attached patch.  This does not change any existing
regression test outputs, but it does change the example given in the
documentation: age(timestamp '2001-04-10', timestamp '1957-06-13')
will now produce '43 years 9 mons 28 days' not 27 days.  Which actually
is correct if you try to add back the result to timestamp '1957-06-13'.
It also appears to fix Palle's example:

regression=# select column1, age(column1, '2006-11-02'), date '2006-11-02' +
age(column1, '2006-11-02') from (values ('2007-01-31'::date),
('2007-02-01')) as alias;
  column1   |      age       |      ?column?
------------+----------------+---------------------
 2007-01-31 | 2 mons 29 days | 2007-01-31 00:00:00
 2007-02-01 | 2 mons 30 days | 2007-02-01 00:00:00
(2 rows)

As I said earlier, I'm worried about changing the behavior of a function
that's been around for so long, so I'm disinclined to back-patch this.
But it seems like a reasonable change to make in 8.3.  Comments?

            regards, tom lane

Index: timestamp.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.179
diff -c -r1.179 timestamp.c
*** timestamp.c    6 Jul 2007 04:15:59 -0000    1.179
--- timestamp.c    8 Jul 2007 19:45:04 -0000
***************
*** 3044,3050 ****
      if (timestamp2tm(dt1, NULL, tm1, &fsec1, NULL, NULL) == 0 &&
          timestamp2tm(dt2, NULL, tm2, &fsec2, NULL, NULL) == 0)
      {
!         fsec = (fsec1 - fsec2);
          tm->tm_sec = tm1->tm_sec - tm2->tm_sec;
          tm->tm_min = tm1->tm_min - tm2->tm_min;
          tm->tm_hour = tm1->tm_hour - tm2->tm_hour;
--- 3044,3051 ----
      if (timestamp2tm(dt1, NULL, tm1, &fsec1, NULL, NULL) == 0 &&
          timestamp2tm(dt2, NULL, tm2, &fsec2, NULL, NULL) == 0)
      {
!         /* form the symbolic difference */
!         fsec = fsec1 - fsec2;
          tm->tm_sec = tm1->tm_sec - tm2->tm_sec;
          tm->tm_min = tm1->tm_min - tm2->tm_min;
          tm->tm_hour = tm1->tm_hour - tm2->tm_hour;
***************
*** 3064,3069 ****
--- 3065,3081 ----
              tm->tm_year = -tm->tm_year;
          }

+         /* propagate any negative fields into the next higher field */
+         while (fsec < 0)
+         {
+ #ifdef HAVE_INT64_TIMESTAMP
+             fsec += USECS_PER_SEC;
+ #else
+             fsec += 1.0;
+ #endif
+             tm->tm_sec--;
+         }
+
          while (tm->tm_sec < 0)
          {
              tm->tm_sec += SECS_PER_MINUTE;
***************
*** 3082,3097 ****
              tm->tm_mday--;
          }

!         while (tm->tm_mday < 0)
          {
              if (dt1 < dt2)
              {
!                 tm->tm_mday += day_tab[isleap(tm1->tm_year)][tm1->tm_mon - 1];
!                 tm->tm_mon--;
              }
              else
              {
!                 tm->tm_mday += day_tab[isleap(tm2->tm_year)][tm2->tm_mon - 1];
                  tm->tm_mon--;
              }
          }
--- 3094,3130 ----
              tm->tm_mday--;
          }

!         /*
!          * day-to-month conversion is tricky because variable.  For each
!          * decrement in tm_mon, we should adjust tm_mday by the length of
!          * the next-to-last month(s) of the original time interval.
!          * This corresponds to the notion that interval addition will add
!          * months first, then days.
!          */
!         if (tm->tm_mday < 0)
          {
+             int        end_year;
+             int        end_mon;
+
              if (dt1 < dt2)
              {
!                 end_year = tm2->tm_year;
!                 end_mon = tm2->tm_mon;
              }
              else
              {
!                 end_year = tm1->tm_year;
!                 end_mon = tm1->tm_mon;
!             }
!
!             while (tm->tm_mday < 0)
!             {
!                 if (--end_mon <= 0)
!                 {
!                     end_mon = MONTHS_PER_YEAR;
!                     end_year--;
!                 }
!                 tm->tm_mday += day_tab[isleap(end_year)][end_mon - 1];
                  tm->tm_mon--;
              }
          }
***************
*** 3158,3163 ****
--- 3191,3197 ----
      if (timestamp2tm(dt1, &tz1, tm1, &fsec1, &tzn, NULL) == 0 &&
          timestamp2tm(dt2, &tz2, tm2, &fsec2, &tzn, NULL) == 0)
      {
+         /* form the symbolic difference */
          fsec = fsec1 - fsec2;
          tm->tm_sec = tm1->tm_sec - tm2->tm_sec;
          tm->tm_min = tm1->tm_min - tm2->tm_min;
***************
*** 3178,3183 ****
--- 3212,3228 ----
              tm->tm_year = -tm->tm_year;
          }

+         /* propagate any negative fields into the next higher field */
+         while (fsec < 0)
+         {
+ #ifdef HAVE_INT64_TIMESTAMP
+             fsec += USECS_PER_SEC;
+ #else
+             fsec += 1.0;
+ #endif
+             tm->tm_sec--;
+         }
+
          while (tm->tm_sec < 0)
          {
              tm->tm_sec += SECS_PER_MINUTE;
***************
*** 3196,3211 ****
              tm->tm_mday--;
          }

!         while (tm->tm_mday < 0)
          {
              if (dt1 < dt2)
              {
!                 tm->tm_mday += day_tab[isleap(tm1->tm_year)][tm1->tm_mon - 1];
!                 tm->tm_mon--;
              }
              else
              {
!                 tm->tm_mday += day_tab[isleap(tm2->tm_year)][tm2->tm_mon - 1];
                  tm->tm_mon--;
              }
          }
--- 3241,3277 ----
              tm->tm_mday--;
          }

!         /*
!          * day-to-month conversion is tricky because variable.  For each
!          * decrement in tm_mon, we should adjust tm_mday by the length of
!          * the next-to-last month(s) of the original time interval.
!          * This corresponds to the notion that interval addition will add
!          * months first, then days.
!          */
!         if (tm->tm_mday < 0)
          {
+             int        end_year;
+             int        end_mon;
+
              if (dt1 < dt2)
              {
!                 end_year = tm2->tm_year;
!                 end_mon = tm2->tm_mon;
              }
              else
              {
!                 end_year = tm1->tm_year;
!                 end_mon = tm1->tm_mon;
!             }
!
!             while (tm->tm_mday < 0)
!             {
!                 if (--end_mon <= 0)
!                 {
!                     end_mon = MONTHS_PER_YEAR;
!                     end_year--;
!                 }
!                 tm->tm_mday += day_tab[isleap(end_year)][end_mon - 1];
                  tm->tm_mon--;
              }
          }

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Should we bump libpq major version for 8.3?
Next
From: Stephen Frost
Date:
Subject: Re: Should we bump libpq major version for 8.3?