ISO 8601 "Time Intervals" of the "format with time-unit deignators" - Mailing list pgsql-patches

Short summary:

   This patch allows ISO 8601 "time intervals" using the "format
   with time-unit designators" to specify postgresql "intervals".

   Below I have (A) What these time intervals are, (B) What I
   modified to support them, (C) Issues with intervals I want
   to bring up, and (D) a patch supporting them.

   It's helpful to me.  Any feedback is appreciated.  If you
   did want to consider including it, let me know what to clean
   up.  If not, I thought I'd just put it here if anyone else finds
   it useful too.

   Thanks for your time,

      Ron Mayer

Longer:

(A) What these intervals are.

   ISO 8601, the standard from which PostgreSQL gets some of it's
   time syntax, also has a specification for "time-intervals".

   In particular, section 5.5.4.2 has a "Representation of
   time-interval by duration only" which I believe maps
   nicely to ISO intervals.

   Compared to the ISO 8601 time interval specification, the
   postgresql interval syntax is quite verbose.  For example:

     Postgresql interval:              ISO8601 Interval
     ---------------------------------------------------
     '1 year 6 months'                'P1Y6M'
     '3 hours 25 minutes 42 seconds'  'PT3H25M42S'

   Yeah, it's uglier, but it sure is short which can make
   for quicker typing and shorter scripts, and if for some
   strange reason you had an application using this format
   it's nice not to have to translate.

   The syntax is as follows:
       Basic extended format:  PnYnMnDTnHnMnS
                               PnW

       Where everything before the "T" is a date-part and everything
       after is a time-part.  W is for weeks.
       In the date-part, Y=Year, M=Month,  D=Day
       In the time-part, H=Hour, M=Minute, S=Second

   Much more info can be found from the draft standard
   ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
   The final standard's only available for $$$ so I didn't
   look at it.  Some other sites imply that this part didn't
   change from the last draft to the standard.


(B) This change was made by adding two functions to "datetime.c"
    next to where DecodeInterval parses the normal interval syntax.

    A total of 313 lines were added, including comments and sgml docs.
    Of these only 136 are actual code, the rest, comments, whitespace, etc.


    One new function "DecodeISO8601Interval" follows the style of
    "DecodeInterval" below it, and trys to strictly follow the ISO
    syntax.  If it doesn't match, it'll return -1 and the old syntax
    will be checked as before.

    The first test (first character of the first field must be 'P',
    and second character must be 'T' or '\0') should be fast so I don't
    think this will impact performance of existing code.


    The second function ("adjust_fval") is just a small helper-function
    to remove some of the cut&paste style that DecodeInterval used.

    It seems to work.
    =======================================================================
    betadb=# select 'P1M15DT12H30M7S'::interval;
            interval
    ------------------------
     1 mon 15 days 12:30:07
    (1 row)

    betadb=# select '1 month 15 days 12 hours 30 minutes 7 seconds'::interval;
        interval
    ------------------------
    1 mon 15 days 12:30:07
    (1 row)
    =====================================================================



(C) Open issues with intervals, and questions I'd like to ask.

    1.  DecodeInterval seems to have a hardcoded '.' for specifying
        fractional times.  ISO 8601 states that both '.' and ',' are
        ok, but "of these, the comma is the preferred sign".

        In DecodeISO8601Interval I loosened the test to allow
        both but left it as it was in DecodeInterval.  Should
        both be changed to make them more consistant?

    2.  In "DecodeInterval", fractional weeks and fractional months
        can produce seconds; but fractional years can not (rounded
        to months).  I didn't understand the reasoning for this, so
        I left it the same, and followed the same convention for
        ISO intervals.  Should I change this?

    3.  I could save a bunch of copy-paste-lines-of-code from the
        pre-existing DecodeInterval by calling the adjust_fval helper
        function.  The tradeoff is a few extra function-calls when
        decoding an interval.  However I didn't want to risk changes
        to the existing part unless you guys encourage me to do so.


(D) The patch.


Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datatype.sgml,v
retrieving revision 1.123
diff -u -1 -0 -r1.123 datatype.sgml
--- doc/src/sgml/datatype.sgml    31 Aug 2003 17:32:18 -0000    1.123
+++ doc/src/sgml/datatype.sgml    8 Sep 2003 04:04:58 -0000
@@ -1735,20 +1735,71 @@
       Quantities of days, hours, minutes, and seconds can be specified without
       explicit unit markings.  For example, <literal>'1 12:59:10'</> is read
       the same as <literal>'1 day 12 hours 59 min 10 sec'</>.
      </para>

      <para>
       The optional precision
       <replaceable>p</replaceable> should be between 0 and 6, and
       defaults to the precision of the input literal.
      </para>
+
+
+     <para>
+      Alternatively, <type>interval</type> values can be written as
+      ISO 8601 time intervals, using the "Format with time-unit designators".
+      This format always starts with the character <literal>'P'</>, followed
+      by a string of values followed by single character time-unit designators.
+      A <literal>'T'</> separates the date and time parts of the interval.
+     </para>
+
+     <para>
+       Format:  PnYnMnDTnHnMnS
+     </para>
+     <para>
+       In this format, <literal>'n'</> gets replaced by a number, and
+       <literal>Y</> represents years,
+       <literal>M</> (in the date part) months,
+       <literal>D</> months,
+       <literal>H</> hours,
+       <literal>M</> (in the time part) minutes,
+       and <literal>S</> seconds.
+     </para>
+
+
+     <table id="interval-example-table">
+       <title>Interval Example</title>
+       <tgroup cols="2">
+        <thead>
+         <row>
+          <entry>Traditional</entry>
+          <entry>ISO-8601 time-interval</entry>
+         </row>
+        </thead>
+        <tbody>
+         <row>
+          <entry>1 month</entry>
+          <entry>P1M</entry>
+         </row>
+         <row>
+          <entry>1 hour 30 minutes</entry>
+          <entry>PT1H30M</entry>
+         </row>
+         <row>
+          <entry>2 years 10 months 15 days 10 hours 30 minutes 20 seconds</entry>
+          <entry>P2Y10M15DT10H30M20S</entry>
+         </row>
+        </tbody>
+       </thead>
+      </table>
+
+     </para>
     </sect3>

     <sect3>
      <title>Special Values</title>

      <indexterm>
       <primary>time</primary>
       <secondary>constants</secondary>
      </indexterm>

Index: src/backend/utils/adt/datetime.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/adt/datetime.c,v
retrieving revision 1.116
diff -u -1 -0 -r1.116 datetime.c
--- src/backend/utils/adt/datetime.c    27 Aug 2003 23:29:28 -0000    1.116
+++ src/backend/utils/adt/datetime.c    8 Sep 2003 04:04:59 -0000
@@ -30,20 +30,21 @@
              struct tm * tm, fsec_t *fsec, int *is2digits);
 static int DecodeNumberField(int len, char *str,
                   int fmask, int *tmask,
                   struct tm * tm, fsec_t *fsec, int *is2digits);
 static int DecodeTime(char *str, int fmask, int *tmask,
            struct tm * tm, fsec_t *fsec);
 static int    DecodeTimezone(char *str, int *tzp);
 static datetkn *datebsearch(char *key, datetkn *base, unsigned int nel);
 static int    DecodeDate(char *str, int fmask, int *tmask, struct tm * tm);
 static void TrimTrailingZeros(char *str);
+static int  DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec);


 int            day_tab[2][13] = {
     {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0},
 {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0}};

 char       *months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun",
 "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", NULL};

 char       *days[] = {"Sunday", "Monday", "Tuesday", "Wednesday",
@@ -2872,30 +2873,271 @@
             default:
                 *val = tp->value;
                 break;
         }
     }

     return type;
 }


+void adjust_fval(double fval,struct tm * tm, fsec_t *fsec, int scale);
+{
+    int    sec;
+    fval       *= scale;
+    sec            = fval;
+    tm->tm_sec += sec;
+#ifdef HAVE_INT64_TIMESTAMP
+    *fsec       += ((fval - sec) * 1000000);
+#else
+    *fsec       += (fval - sec);
+#endif
+}
+
+
+/* DecodeISO8601Interval()
+ *
+ *  Check if it's a ISO 8601 Section 5.5.4.2 "Representation of
+ *  time-interval by duration only."
+ *  Basic extended format:  PnYnMnDTnHnMnS
+ *                          PnW
+ *  For more info.
+ *  http://www.astroclark.freeserve.co.uk/iso8601/index.html
+ *  ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
+ *
+ *  Examples:  P1D  for 1 day
+ *             PT1H for 1 hour
+ *             P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min
+ *
+ *  The first field is exactly "p" or "pt" it may be of this type.
+ *
+ *  Returns -1 if the field is not of this type.
+ *
+ *  It pretty strictly checks the spec, with the two exceptions
+ *  that a week field ('W') may coexist with other units, and that
+ *  this function allows decimals in fields other than the least
+ *  significant units.
+ */
+int
+DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec)
+{
+    char       *cp;
+    int            fmask = 0,
+                tmask;
+    int            val;
+    double        fval;
+    int            arg;
+    int            datepart;
+
+    /*
+     * An ISO 8601 "time-interval by duration only" must start
+     * with a 'P'.  If it contains a date-part, 'p' will be the
+     * only character in the field.  If it contains no date part
+     * it will contain exactly to characters 'PT' indicating a
+     * time part.
+     * Anything else is illegal and will be treated like a
+     * traditional postgresql interval.
+     */
+    if (!(field[0][0] == 'p' &&
+          ((field[0][1] == 0) || (field[0][1] == 't' && field[0][2] == 0))))
+    {
+      return -1;
+    }
+
+
+    /*
+     * If the first field is exactly 1 character ('P'), it starts
+     * with date elements.  Otherwise it's two characters ('PT');
+     * indicating it starts with a time part.
+     */
+    datepart = (field[0][1] == 0);
+
+    /*
+     * Every value must have a unit, so we require an even
+     * number of value/unit pairs. Therefore we require an
+     * odd nubmer of fields, including the prefix 'P'.
+     */
+    if ((nf & 1) == 0)
+        return -1;
+
+    /*
+     * Process pairs of fields at a time.
+     */
+    for (arg = 1 ; arg < nf ; arg+=2)
+    {
+        char * value = field[arg  ];
+        char * units = field[arg+1];
+
+        /*
+         * The value part must be a number.
+         */
+        if (ftype[arg] != DTK_NUMBER)
+            return -1;
+
+        /*
+         * extract the number, almost exactly like the non-ISO interval.
+         */
+        val = strtol(value, &cp, 10);
+
+        /*
+         * One difference from the normal postgresql interval below...
+         * ISO 8601 states that "Of these, the comma is the preferred
+         * sign" so I allow it here for locales that support it.
+         * Note: Perhaps the old-style interval code below should
+         * allow for this too, but I didn't want to risk backward
+         * compatability.
+         */
+        if (*cp == '.' || *cp == ',')
+        {
+            fval = strtod(cp, &cp);
+            if (*cp != '\0')
+                return -1;
+
+            if (val < 0)
+                fval = -(fval);
+        }
+        else if (*cp == '\0')
+            fval = 0;
+        else
+            return -1;
+
+
+        if (datepart)
+        {
+            /*
+             * All the 8601 unit specifiers are 1 character, but may
+             * be followed by a 'T' character if transitioning between
+             * the date part and the time part.  If it's not either
+             * one character or two characters with the second being 't'
+             * it's an error.
+             */
+            if (!(units[1] == 0 || (units[1] == 't' && units[2] == 0)))
+                return -1;
+
+            if (units[1] == 't')
+                datepart = 0;
+
+            switch (units[0]) /* Y M D W */
+            {
+                case 'd':
+                    tm->tm_mday += val;
+                    if (fval != 0)
+                      adjust_fval(fval,tm,fsec, 86400);
+                    tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY));
+                    break;
+
+                case 'w':
+                    tm->tm_mday += val * 7;
+                    if (fval != 0)
+                      adjust_fval(fval,tm,fsec,7 * 86400);
+                    tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY));
+                    break;
+
+                case 'm':
+                    tm->tm_mon += val;
+                    if (fval != 0)
+                      adjust_fval(fval,tm,fsec,30 * 86400);
+                    tmask = DTK_M(MONTH);
+                    break;
+
+                case 'y':
+                    /*
+                     * Why can fractional months produce seconds,
+                     * but fractional years can't?  Well the older
+                     * interval code below has the same property
+                     * so this one follows the other one too.
+                     */
+                    tm->tm_year += val;
+                    if (fval != 0)
+                        tm->tm_mon += (fval * 12);
+                    tmask = ((fmask & DTK_M(YEAR)) ? 0 : DTK_M(YEAR));
+                    break;
+
+                default:
+                    return -1;  /* invald date unit prefix */
+            }
+        }
+        else
+        {
+            /*
+             * ISO 8601 time part.
+             * In the time part, only one-character
+             * unit prefixes are allowed.  If it's more
+             * than one character, it's not a valid ISO 8601
+             * time interval by duration.
+             */
+            if (units[1] != 0)
+                return -1;
+
+            switch (units[0]) /* H M S */
+            {
+                case 's':
+                    tm->tm_sec += val;
+#ifdef HAVE_INT64_TIMESTAMP
+                    *fsec += (fval * 1000000);
+#else
+                    *fsec += fval;
+#endif
+                    tmask = DTK_M(SECOND);
+                    break;
+
+                case 'm':
+                    tm->tm_min += val;
+                    if (fval != 0)
+                      adjust_fval(fval,tm,fsec,60);
+                    tmask = DTK_M(MINUTE);
+                    break;
+
+                case 'h':
+                    tm->tm_hour += val;
+                    if (fval != 0)
+                      adjust_fval(fval,tm,fsec,3600);
+                    tmask = DTK_M(HOUR);
+                    break;
+
+                default:
+                    return -1; /* invald time unit prefix */
+            }
+        }
+        fmask |= tmask;
+    }
+
+    if (*fsec != 0)
+    {
+        int            sec;
+
+#ifdef HAVE_INT64_TIMESTAMP
+        sec = (*fsec / INT64CONST(1000000));
+        *fsec -= (sec * INT64CONST(1000000));
+#else
+        TMODULO(*fsec, sec, 1e0);
+#endif
+        tm->tm_sec += sec;
+    }
+    return (fmask != 0) ? 0 : -1;
+}
+
+
 /* DecodeInterval()
  * Interpret previously parsed fields for general time interval.
  * Returns 0 if successful, DTERR code if bogus input detected.
  *
  * Allow "date" field DTK_DATE since this could be just
  *    an unsigned floating point number. - thomas 1997-11-16
  *
  * Allow ISO-style time span, with implicit units on number of days
  *    preceding an hh:mm:ss field. - thomas 1998-04-30
+ *
+ * Allow ISO-8601 style "Representation of time-interval by duration only"
+ *  of the format 'PnYnMnDTnHnMnS' and 'PnW' - ron 2003-08-30
  */
+
 int
 DecodeInterval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec)
 {
     int            is_before = FALSE;
     char       *cp;
     int            fmask = 0,
                 tmask,
                 type;
     int            i;
     int            dterr;
@@ -2906,20 +3148,37 @@

     type = IGNORE_DTF;
     tm->tm_year = 0;
     tm->tm_mon = 0;
     tm->tm_mday = 0;
     tm->tm_hour = 0;
     tm->tm_min = 0;
     tm->tm_sec = 0;
     *fsec = 0;

+    /*
+     *  Check if it's a ISO 8601 Section 5.5.4.2 "Representation of
+     *  time-interval by duration only."
+     *  Basic extended format:  PnYnMnDTnHnMnS
+     *                          PnW
+     *  http://www.astroclark.freeserve.co.uk/iso8601/index.html
+     *  ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
+     *  Examples:  P1D  for 1 day
+     *             PT1H for 1 hour
+     *             P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min
+     *
+     *  The first field is exactly "p" or "pt" it may be of this type.
+     */
+    if (DecodeISO8601Interval(field,ftype,nf,dtype,tm,fsec) == 0) {
+        return 0;
+    }
+
     /* read through list backwards to pick up units before values */
     for (i = nf - 1; i >= 0; i--)
     {
         switch (ftype[i])
         {
             case DTK_TIME:
                 dterr = DecodeTime(field[i], fmask, &tmask, tm, fsec);
                 if (dterr)
                     return dterr;
                 type = DTK_DAY;
@@ -2983,20 +3242,21 @@
                 }
                 /* DROP THROUGH */

             case DTK_DATE:
             case DTK_NUMBER:
                 val = strtol(field[i], &cp, 10);

                 if (type == IGNORE_DTF)
                     type = DTK_SECOND;

+                /* should this allow ',' for locales that use it ? */
                 if (*cp == '.')
                 {
                     fval = strtod(cp, &cp);
                     if (*cp != '\0')
                         return DTERR_BAD_FORMAT;

                     if (val < 0)
                         fval = -(fval);
                 }
                 else if (*cp == '\0')

===================================================================

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Minor lmgr code cleanup
Next
From: Tom Lane
Date:
Subject: Re: Minor lmgr code cleanup