Thread: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Tom Lane
Date:
As I mentioned a bit ago
http://archives.postgresql.org/pgsql-hackers/2009-05/msg01505.php
there seems to be a definite problem still remaining with our handling
of interval literals.  To wit, this behavior is absolutely not per spec:

regression=# select '999'::interval second;
 interval
----------
 00:00:39
(1 row)

The correct interpretation of the input value is certainly 999 seconds.
The spec would allow us to throw error if it exceeds the available range
of the field, but a silent modulo operation is not per spec and seems
against our general design principle of not silently discarding data.
I propose the attached patch to make the code not throw away high-order
values in this fashion.

A somewhat more debatable case is this:

regression=# select '1 day 1 hour'::interval hour;
 interval
----------
 01:00:00
(1 row)

which with the attached patch we would render as

regression=# select '1 day 1 hour'::interval hour;
    interval
----------------
 1 day 01:00:00
(1 row)

There is some case to be made that we should throw error here,
which we could do by putting error tests where the attached patch
has comments suggesting an error test.  However I'm inclined to think
that doing that would expose an implementation dependency rather more
than we should.  It is usually not clear to novices that '1 day 1 hour'
is different from '25 hours', and it would be even less clear why the
latter would be acceptable input for an INTERVAL HOUR field when the
former isn't.  So I'm proposing the patch as-is rather than with the
extra error tests, but am open to being convinced otherwise.

The reason I'm bringing this up now is that we've already changed the
behavior of interval literals quite a bit in 8.4.  I would rather try to
finish getting it right in this release than have the behavior change
twice in successive releases.

Comments?

            regards, tom lane

Index: src/backend/utils/adt/timestamp.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.199
diff -c -r1.199 timestamp.c
*** src/backend/utils/adt/timestamp.c    26 May 2009 02:17:50 -0000    1.199
--- src/backend/utils/adt/timestamp.c    31 May 2009 22:13:35 -0000
***************
*** 962,967 ****
--- 962,979 ----
          int            range = INTERVAL_RANGE(typmod);
          int            precision = INTERVAL_PRECISION(typmod);

+         /*
+          * Our interpretation of intervals with a limited set of fields
+          * is that fields to the right of the last one specified are zeroed
+          * out, but those to the left of it remain valid.  Since we do not
+          * have any equivalent of SQL's <interval leading field precision>,
+          * we can't properly enforce any limit on the leading field.  (Before
+          * PG 8.4 we interpreted a limited set of fields as actually causing
+          * a "modulo" operation on a given value, potentially losing high-order
+          * as well as low-order information; but there is no support for such
+          * behavior in the standard.)
+          */
+
          if (range == INTERVAL_FULL_RANGE)
          {
              /* Do nothing... */
***************
*** 974,1000 ****
          }
          else if (range == INTERVAL_MASK(MONTH))
          {
-             interval->month %= MONTHS_PER_YEAR;
              interval->day = 0;
              interval->time = 0;
          }
          /* YEAR TO MONTH */
          else if (range == (INTERVAL_MASK(YEAR) | INTERVAL_MASK(MONTH)))
          {
-             /* month is already year to month */
              interval->day = 0;
              interval->time = 0;
          }
          else if (range == INTERVAL_MASK(DAY))
          {
!             interval->month = 0;
              interval->time = 0;
          }
          else if (range == INTERVAL_MASK(HOUR))
          {
!             interval->month = 0;
!             interval->day = 0;
!
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_HOUR) *
                  USECS_PER_HOUR;
--- 986,1008 ----
          }
          else if (range == INTERVAL_MASK(MONTH))
          {
              interval->day = 0;
              interval->time = 0;
          }
          /* YEAR TO MONTH */
          else if (range == (INTERVAL_MASK(YEAR) | INTERVAL_MASK(MONTH)))
          {
              interval->day = 0;
              interval->time = 0;
          }
          else if (range == INTERVAL_MASK(DAY))
          {
!             /* throw error if month is not 0? */
              interval->time = 0;
          }
          else if (range == INTERVAL_MASK(HOUR))
          {
!             /* throw error if month or day is not 0? */
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_HOUR) *
                  USECS_PER_HOUR;
***************
*** 1004,1045 ****
          }
          else if (range == INTERVAL_MASK(MINUTE))
          {
!             TimeOffset    hour;
!
!             interval->month = 0;
!             interval->day = 0;
!
  #ifdef HAVE_INT64_TIMESTAMP
-             hour = interval->time / USECS_PER_HOUR;
-             interval->time -= hour * USECS_PER_HOUR;
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
  #else
-             TMODULO(interval->time, hour, (double) SECS_PER_HOUR);
              interval->time = ((int) (interval->time / SECS_PER_MINUTE)) * (double) SECS_PER_MINUTE;
  #endif
          }
          else if (range == INTERVAL_MASK(SECOND))
          {
!             TimeOffset    minute;
!
!             interval->month = 0;
!             interval->day = 0;
!
! #ifdef HAVE_INT64_TIMESTAMP
!             minute = interval->time / USECS_PER_MINUTE;
!             interval->time -= minute * USECS_PER_MINUTE;
! #else
!             TMODULO(interval->time, minute, (double) SECS_PER_MINUTE);
!             /* return subseconds too */
! #endif
          }
          /* DAY TO HOUR */
          else if (range == (INTERVAL_MASK(DAY) |
                             INTERVAL_MASK(HOUR)))
          {
!             interval->month = 0;
!
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_HOUR) *
                  USECS_PER_HOUR;
--- 1012,1035 ----
          }
          else if (range == INTERVAL_MASK(MINUTE))
          {
!             /* throw error if month or day is not 0? */
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
  #else
              interval->time = ((int) (interval->time / SECS_PER_MINUTE)) * (double) SECS_PER_MINUTE;
  #endif
          }
          else if (range == INTERVAL_MASK(SECOND))
          {
!             /* throw error if month or day is not 0? */
!             /* for the moment, leave fractional seconds as-is */
          }
          /* DAY TO HOUR */
          else if (range == (INTERVAL_MASK(DAY) |
                             INTERVAL_MASK(HOUR)))
          {
!             /* throw error if month is not 0? */
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_HOUR) *
                  USECS_PER_HOUR;
***************
*** 1052,1059 ****
                             INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE)))
          {
!             interval->month = 0;
!
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
--- 1042,1048 ----
                             INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE)))
          {
!             /* throw error if month is not 0? */
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
***************
*** 1066,1080 ****
                             INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
!             interval->month = 0;
!
          /* HOUR TO MINUTE */
          else if (range == (INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE)))
          {
!             interval->month = 0;
!             interval->day = 0;
!
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
--- 1055,1069 ----
                             INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
!         {
!             /* throw error if month is not 0? */
!             /* for the moment, leave fractional seconds as-is */
!         }
          /* HOUR TO MINUTE */
          else if (range == (INTERVAL_MASK(HOUR) |
                             INTERVAL_MASK(MINUTE)))
          {
!             /* throw error if month or day is not 0? */
  #ifdef HAVE_INT64_TIMESTAMP
              interval->time = (interval->time / USECS_PER_MINUTE) *
                  USECS_PER_MINUTE;
***************
*** 1087,1111 ****
                             INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
          {
!             interval->month = 0;
!             interval->day = 0;
!             /* return subseconds too */
          }
          /* MINUTE TO SECOND */
          else if (range == (INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
          {
!             TimeOffset    hour;
!
!             interval->month = 0;
!             interval->day = 0;
!
! #ifdef HAVE_INT64_TIMESTAMP
!             hour = interval->time / USECS_PER_HOUR;
!             interval->time -= hour * USECS_PER_HOUR;
! #else
!             TMODULO(interval->time, hour, (double) SECS_PER_HOUR);
! #endif
          }
          else
              elog(ERROR, "unrecognized interval typmod: %d", typmod);
--- 1076,1090 ----
                             INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
          {
!             /* throw error if month or day is not 0? */
!             /* for the moment, leave fractional seconds as-is */
          }
          /* MINUTE TO SECOND */
          else if (range == (INTERVAL_MASK(MINUTE) |
                             INTERVAL_MASK(SECOND)))
          {
!             /* throw error if month or day is not 0? */
!             /* for the moment, leave fractional seconds as-is */
          }
          else
              elog(ERROR, "unrecognized interval typmod: %d", typmod);
Index: src/test/regress/expected/interval.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/interval.out,v
retrieving revision 1.27
diff -c -r1.27 interval.out
*** src/test/regress/expected/interval.out    4 Apr 2009 04:53:25 -0000    1.27
--- src/test/regress/expected/interval.out    31 May 2009 22:13:35 -0000
***************
*** 472,486 ****
  LINE 1: SELECT interval '1 2' hour to minute;
                          ^
  SELECT interval '1 2:03' hour to minute;
!  interval
! ----------
!  02:03:00
  (1 row)

  SELECT interval '1 2:03:04' hour to minute;
!  interval
! ----------
!  02:03:00
  (1 row)

  SELECT interval '1 2' hour to second;
--- 472,486 ----
  LINE 1: SELECT interval '1 2' hour to minute;
                          ^
  SELECT interval '1 2:03' hour to minute;
!     interval
! ----------------
!  1 day 02:03:00
  (1 row)

  SELECT interval '1 2:03:04' hour to minute;
!     interval
! ----------------
!  1 day 02:03:00
  (1 row)

  SELECT interval '1 2' hour to second;
***************
*** 488,502 ****
  LINE 1: SELECT interval '1 2' hour to second;
                          ^
  SELECT interval '1 2:03' hour to second;
!  interval
! ----------
!  02:03:00
  (1 row)

  SELECT interval '1 2:03:04' hour to second;
!  interval
! ----------
!  02:03:04
  (1 row)

  SELECT interval '1 2' minute to second;
--- 488,502 ----
  LINE 1: SELECT interval '1 2' hour to second;
                          ^
  SELECT interval '1 2:03' hour to second;
!     interval
! ----------------
!  1 day 02:03:00
  (1 row)

  SELECT interval '1 2:03:04' hour to second;
!     interval
! ----------------
!  1 day 02:03:04
  (1 row)

  SELECT interval '1 2' minute to second;
***************
*** 504,518 ****
  LINE 1: SELECT interval '1 2' minute to second;
                          ^
  SELECT interval '1 2:03' minute to second;
!  interval
! ----------
!  00:02:03
  (1 row)

  SELECT interval '1 2:03:04' minute to second;
!  interval
! ----------
!  00:03:04
  (1 row)

  -- test syntaxes for restricted precision
--- 504,518 ----
  LINE 1: SELECT interval '1 2' minute to second;
                          ^
  SELECT interval '1 2:03' minute to second;
!     interval
! ----------------
!  1 day 00:02:03
  (1 row)

  SELECT interval '1 2:03:04' minute to second;
!     interval
! ----------------
!  1 day 02:03:04
  (1 row)

  -- test syntaxes for restricted precision
***************
*** 585,599 ****
  LINE 1: SELECT interval '1 2.345' hour to second(2);
                          ^
  SELECT interval '1 2:03.45678' hour to second(2);
!   interval
! -------------
!  00:02:03.46
  (1 row)

  SELECT interval '1 2:03:04.5678' hour to second(2);
!   interval
! -------------
!  02:03:04.57
  (1 row)

  SELECT interval '1 2.3456' minute to second(2);
--- 585,599 ----
  LINE 1: SELECT interval '1 2.345' hour to second(2);
                          ^
  SELECT interval '1 2:03.45678' hour to second(2);
!      interval
! -------------------
!  1 day 00:02:03.46
  (1 row)

  SELECT interval '1 2:03:04.5678' hour to second(2);
!      interval
! -------------------
!  1 day 02:03:04.57
  (1 row)

  SELECT interval '1 2.3456' minute to second(2);
***************
*** 601,615 ****
  LINE 1: SELECT interval '1 2.3456' minute to second(2);
                          ^
  SELECT interval '1 2:03.5678' minute to second(2);
!   interval
! -------------
!  00:02:03.57
  (1 row)

  SELECT interval '1 2:03:04.5678' minute to second(2);
!   interval
! -------------
!  00:03:04.57
  (1 row)

  -- test inputting and outputting SQL standard interval literals
--- 601,615 ----
  LINE 1: SELECT interval '1 2.3456' minute to second(2);
                          ^
  SELECT interval '1 2:03.5678' minute to second(2);
!      interval
! -------------------
!  1 day 00:02:03.57
  (1 row)

  SELECT interval '1 2:03:04.5678' minute to second(2);
!      interval
! -------------------
!  1 day 02:03:04.57
  (1 row)

  -- test inputting and outputting SQL standard interval literals

Re: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Sam Mason
Date:
On Sun, May 31, 2009 at 06:32:53PM -0400, Tom Lane wrote:
> regression=# select '999'::interval second;
> The correct interpretation of the input value is certainly 999 seconds.

Agreed; silent truncation like this is confusing and will lead to
unnecessary bugs in users' code.

> the attached patch we would render as
> regression=# select '1 day 1 hour'::interval hour;
>  1 day 01:00:00
> 
> There is some case to be made that we should throw error here,
> which we could do by putting error tests where the attached patch
> has comments suggesting an error test.

With things as they are I think it would be useful to throw an error
here; if the user means 25 hours they should say 25 hours!

> However I'm inclined to think
> that doing that would expose an implementation dependency rather more
> than we should.  It is usually not clear to novices that '1 day 1 hour'
> is different from '25 hours', and it would be even less clear why the
> latter would be acceptable input for an INTERVAL HOUR field when the
> former isn't.  So I'm proposing the patch as-is rather than with the
> extra error tests, but am open to being convinced otherwise.

It would only be different when the interval is used with values of type
timestamptz, or am I missing something?  How much sense does it make to
have a timezone aware interval where this distinction is true and leave
the current interval as timezone naive.  Not sure if that would help to
clean up the semantics at all or if it's just adding more unnecessary
complexity.  I have a feeling it's probably the latter, but thought it
may help things.

--  Sam  http://samason.me.uk/


Re: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Tom Lane
Date:
Sam Mason <sam@samason.me.uk> writes:
> On Sun, May 31, 2009 at 06:32:53PM -0400, Tom Lane wrote:
>> There is some case to be made that we should throw error here,
>> which we could do by putting error tests where the attached patch
>> has comments suggesting an error test.

> With things as they are I think it would be useful to throw an error
> here; if the user means 25 hours they should say 25 hours!

Well, maybe, but I'm not really convinced.  By the same logic, if the
column is INTERVAL MONTH and the user puts in '1 year 1 month', then
we ought to throw an error for that too.  But AdjustIntervalForTypmod
is incapable of doing that because months and years are folded into
a single field.  There is a logical reason for the difference --- 1 year
is always exactly 12 months whereas 1 day is not always exactly 24 hours
--- but that difference isn't acknowledged by the SQL standard, which
last I checked still pretends daylight savings time doesn't exist.

The real bottom line here is that our underlying implementation and
semantics for INTERVAL are considerably different from what the SQL
standard has in mind.  AFAICS they intend an interval to contain six
separate numeric fields with "what you see is what you get" behavior.
We could argue some other time about whether that's a better design
than what we're using, but it's surely not going to change for 8.4.
My ambitions for the moment are limited to making sure that we accept
all spec-compliant interval literals and interpret them in a fashion
reasonably compatible with what the spec says the value is.  I don't
feel that we need to throw error for stuff we used to accept in order
to meet that goal.

> It would only be different when the interval is used with values of type
> timestamptz, or am I missing something?  How much sense does it make to
> have a timezone aware interval where this distinction is true and leave
> the current interval as timezone naive.

Doesn't seem practical, certainly not for 8.4.  In any case I'm
uncomfortable with the idea that a value would be accepted at entry
and then fail later on depending on how you used it.
        regards, tom lane


Re: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Tom Lane
Date:
I wrote:
> Sam Mason <sam@samason.me.uk> writes:
>> On Sun, May 31, 2009 at 06:32:53PM -0400, Tom Lane wrote:
>>> There is some case to be made that we should throw error here,
>>> which we could do by putting error tests where the attached patch
>>> has comments suggesting an error test.

>> With things as they are I think it would be useful to throw an error
>> here; if the user means 25 hours they should say 25 hours!

> Well, maybe, but I'm not really convinced.

I've gone ahead and committed the patch as-is, without the error tests.
There's still time to change it if anyone has a killer argument, but
I thought of another issue that would have to be dealt with: consider
values such as INTERVAL '13' MONTH.  Since per spec we should not
reduce this to 1 month, what is going to happen barring significant
redesign on the output side is that the value will print out as
'1 year 1 month'.  If we were to consider that as illegal input for
INTERVAL MONTH then we'd be creating a situation where valid data
fails to dump and reload.  This won't happen for all cases (eg 60
days doesn't overflow into months) but it shows the danger of throwing
error for cases that we can't clearly distinguish on both input and
output.  So I think we should be satisfied for now with accepting
inputs that are valid per spec, and not worry too much about whether
we are rejecting all values that are a bit outside spec.
        regards, tom lane


Re: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Robert Haas
Date:
On Mon, Jun 1, 2009 at 8:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Sam Mason <sam@samason.me.uk> writes:
>>> On Sun, May 31, 2009 at 06:32:53PM -0400, Tom Lane wrote:
>>>> There is some case to be made that we should throw error here,
>>>> which we could do by putting error tests where the attached patch
>>>> has comments suggesting an error test.
>
>>> With things as they are I think it would be useful to throw an error
>>> here; if the user means 25 hours they should say 25 hours!
>
>> Well, maybe, but I'm not really convinced.
>
> I've gone ahead and committed the patch as-is, without the error tests.
> There's still time to change it if anyone has a killer argument, but
> I thought of another issue that would have to be dealt with: consider
> values such as INTERVAL '13' MONTH.  Since per spec we should not
> reduce this to 1 month, what is going to happen barring significant
> redesign on the output side is that the value will print out as
> '1 year 1 month'.  If we were to consider that as illegal input for
> INTERVAL MONTH then we'd be creating a situation where valid data
> fails to dump and reload.  This won't happen for all cases (eg 60
> days doesn't overflow into months) but it shows the danger of throwing
> error for cases that we can't clearly distinguish on both input and
> output.  So I think we should be satisfied for now with accepting
> inputs that are valid per spec, and not worry too much about whether
> we are rejecting all values that are a bit outside spec.

Well, there is the possibility that if we implement something fully
spec-compliant in the future, we might run into a situation where
someone puts 13 months in, dumps and reloads, then puts in 13 months
in again, compares the two, and surprisingly they turn out to be
unequal.  But I'm having a hard time caring.  The behavior your patch
implements is clearly a lot more useful than what it replaced, and I
think it's arguably more useful than the spec behavior as well.

More to the point, it's also what 8.3.7 does:

Welcome to psql 8.3.7, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms      \h for help with SQL commands      \? for help with psql commands      \g
orterminate with semicolon to execute query      \q to quit 

rhaas=# select '99 seconds'::interval;interval
----------00:01:39
(1 row)

rhaas=# select '99 minutes'::interval;interval
----------01:39:00
(1 row)

rhaas=# select '99 hours'::interval;interval
----------99:00:00
(1 row)

rhaas=# select '99 days'::interval;interval
----------99 days
(1 row)

rhaas=# select '99 weeks'::interval;interval
----------693 days
(1 row)

rhaas=# select '99 months'::interval;   interval
----------------8 years 3 mons
(1 row)

I haven't checked, but hopefully these all now match the 8.4 behavior?

...Robert


Re: Patch: AdjustIntervalForTypmod shouldn't discard high-order data

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> More to the point, it's also what 8.3.7 does:

Well, no, because the cases at issue are where an <interval qualifier>
is specified.  8.3 did this:

regression=# select '99 seconds'::interval second;interval 
----------00:00:39
(1 row)

and even more amusingly,

regression=# select interval '99' minute;interval 
----------00:01:00
(1 row)

regression=# select interval '99' hour;  interval 
----------00:00:00
(1 row)

It was all pretty broken back then.
        regards, tom lane