Re: ISO 8601 "Time Intervals" of the "format with time-unit - Mailing list pgsql-patches

From Peter Eisentraut
Subject Re: ISO 8601 "Time Intervals" of the "format with time-unit
Date
Msg-id Pine.LNX.4.44.0312010719450.22012-100000@peter.localdomain
Whole thread Raw
In response to Re: ISO 8601 "Time Intervals" of the "format with time-unit  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: ISO 8601 "Time Intervals" of the "format with time-unit
Re: ISO 8601 "Time Intervals" of the "format with time-unit
List pgsql-patches
Bruce Momjian writes:

> Is this ready for application?  It looks good to me.  However, there is
> an "Open issues" section.

It would be more useful to implement the SQL standard for intervals first
instead of inventing more nonstandard formats for it.

>
> ---------------------------------------------------------------------------
>
> Ron Mayer wrote:
> > Short summary:
> >
> >    This patch allows ISO 8601 "time intervals" using the "format
> >    with time-unit designators" to specify postgresql "intervals".
> >
> >    Below I have (A) What these time intervals are, (B) What I
> >    modified to support them, (C) Issues with intervals I want
> >    to bring up, and (D) a patch supporting them.
> >
> >    It's helpful to me.  Any feedback is appreciated.  If you
> >    did want to consider including it, let me know what to clean
> >    up.  If not, I thought I'd just put it here if anyone else finds
> >    it useful too.
> >
> >    Thanks for your time,
> >
> >       Ron Mayer
> >
> > Longer:
> >
> > (A) What these intervals are.
> >
> >    ISO 8601, the standard from which PostgreSQL gets some of it's
> >    time syntax, also has a specification for "time-intervals".
> >
> >    In particular, section 5.5.4.2 has a "Representation of
> >    time-interval by duration only" which I believe maps
> >    nicely to ISO intervals.
> >
> >    Compared to the ISO 8601 time interval specification, the
> >    postgresql interval syntax is quite verbose.  For example:
> >
> >      Postgresql interval:              ISO8601 Interval
> >      ---------------------------------------------------
> >      '1 year 6 months'                'P1Y6M'
> >      '3 hours 25 minutes 42 seconds'  'PT3H25M42S'
> >
> >    Yeah, it's uglier, but it sure is short which can make
> >    for quicker typing and shorter scripts, and if for some
> >    strange reason you had an application using this format
> >    it's nice not to have to translate.
> >
> >    The syntax is as follows:
> >        Basic extended format:  PnYnMnDTnHnMnS
> >                                PnW
> >
> >        Where everything before the "T" is a date-part and everything
> >        after is a time-part.  W is for weeks.
> >        In the date-part, Y=Year, M=Month,  D=Day
> >        In the time-part, H=Hour, M=Minute, S=Second
> >
> >    Much more info can be found from the draft standard
> >    ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
> >    The final standard's only available for $$$ so I didn't
> >    look at it.  Some other sites imply that this part didn't
> >    change from the last draft to the standard.
> >
> >
> > (B) This change was made by adding two functions to "datetime.c"
> >     next to where DecodeInterval parses the normal interval syntax.
> >
> >     A total of 313 lines were added, including comments and sgml docs.
> >     Of these only 136 are actual code, the rest, comments, whitespace, etc.
> >
> >
> >     One new function "DecodeISO8601Interval" follows the style of
> >     "DecodeInterval" below it, and trys to strictly follow the ISO
> >     syntax.  If it doesn't match, it'll return -1 and the old syntax
> >     will be checked as before.
> >
> >     The first test (first character of the first field must be 'P',
> >     and second character must be 'T' or '\0') should be fast so I don't
> >     think this will impact performance of existing code.
> >
> >
> >     The second function ("adjust_fval") is just a small helper-function
> >     to remove some of the cut&paste style that DecodeInterval used.
> >
> >     It seems to work.
> >     =======================================================================
> >     betadb=# select 'P1M15DT12H30M7S'::interval;
> >             interval
> >     ------------------------
> >      1 mon 15 days 12:30:07
> >     (1 row)
> >
> >     betadb=# select '1 month 15 days 12 hours 30 minutes 7 seconds'::interval;
> >         interval
> >     ------------------------
> >     1 mon 15 days 12:30:07
> >     (1 row)
> >     =====================================================================
> >
> >
> >
> > (C) Open issues with intervals, and questions I'd like to ask.
> >
> >     1.  DecodeInterval seems to have a hardcoded '.' for specifying
> >         fractional times.  ISO 8601 states that both '.' and ',' are
> >         ok, but "of these, the comma is the preferred sign".
> >
> >         In DecodeISO8601Interval I loosened the test to allow
> >         both but left it as it was in DecodeInterval.  Should
> >         both be changed to make them more consistant?
> >
> >     2.  In "DecodeInterval", fractional weeks and fractional months
> >         can produce seconds; but fractional years can not (rounded
> >         to months).  I didn't understand the reasoning for this, so
> >         I left it the same, and followed the same convention for
> >         ISO intervals.  Should I change this?
> >
> >     3.  I could save a bunch of copy-paste-lines-of-code from the
> >         pre-existing DecodeInterval by calling the adjust_fval helper
> >         function.  The tradeoff is a few extra function-calls when
> >         decoding an interval.  However I didn't want to risk changes
> >         to the existing part unless you guys encourage me to do so.
> >
> >
> > (D) The patch.
> >
> >
> > Index: doc/src/sgml/datatype.sgml
> > ===================================================================
> > RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datatype.sgml,v
> > retrieving revision 1.123
> > diff -u -1 -0 -r1.123 datatype.sgml
> > --- doc/src/sgml/datatype.sgml    31 Aug 2003 17:32:18 -0000    1.123
> > +++ doc/src/sgml/datatype.sgml    8 Sep 2003 04:04:58 -0000
> > @@ -1735,20 +1735,71 @@
> >        Quantities of days, hours, minutes, and seconds can be specified without
> >        explicit unit markings.  For example, <literal>'1 12:59:10'</> is read
> >        the same as <literal>'1 day 12 hours 59 min 10 sec'</>.
> >       </para>
> >
> >       <para>
> >        The optional precision
> >        <replaceable>p</replaceable> should be between 0 and 6, and
> >        defaults to the precision of the input literal.
> >       </para>
> > +
> > +
> > +     <para>
> > +      Alternatively, <type>interval</type> values can be written as
> > +      ISO 8601 time intervals, using the "Format with time-unit designators".
> > +      This format always starts with the character <literal>'P'</>, followed
> > +      by a string of values followed by single character time-unit designators.
> > +      A <literal>'T'</> separates the date and time parts of the interval.
> > +     </para>
> > +
> > +     <para>
> > +       Format:  PnYnMnDTnHnMnS
> > +     </para>
> > +     <para>
> > +       In this format, <literal>'n'</> gets replaced by a number, and
> > +       <literal>Y</> represents years,
> > +       <literal>M</> (in the date part) months,
> > +       <literal>D</> months,
> > +       <literal>H</> hours,
> > +       <literal>M</> (in the time part) minutes,
> > +       and <literal>S</> seconds.
> > +     </para>
> > +
> > +
> > +     <table id="interval-example-table">
> > +       <title>Interval Example</title>
> > +       <tgroup cols="2">
> > +        <thead>
> > +         <row>
> > +          <entry>Traditional</entry>
> > +          <entry>ISO-8601 time-interval</entry>
> > +         </row>
> > +        </thead>
> > +        <tbody>
> > +         <row>
> > +          <entry>1 month</entry>
> > +          <entry>P1M</entry>
> > +         </row>
> > +         <row>
> > +          <entry>1 hour 30 minutes</entry>
> > +          <entry>PT1H30M</entry>
> > +         </row>
> > +         <row>
> > +          <entry>2 years 10 months 15 days 10 hours 30 minutes 20 seconds</entry>
> > +          <entry>P2Y10M15DT10H30M20S</entry>
> > +         </row>
> > +        </tbody>
> > +       </thead>
> > +      </table>
> > +
> > +     </para>
> >      </sect3>
> >
> >      <sect3>
> >       <title>Special Values</title>
> >
> >       <indexterm>
> >        <primary>time</primary>
> >        <secondary>constants</secondary>
> >       </indexterm>
> >
> > Index: src/backend/utils/adt/datetime.c
> > ===================================================================
> > RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/adt/datetime.c,v
> > retrieving revision 1.116
> > diff -u -1 -0 -r1.116 datetime.c
> > --- src/backend/utils/adt/datetime.c    27 Aug 2003 23:29:28 -0000    1.116
> > +++ src/backend/utils/adt/datetime.c    8 Sep 2003 04:04:59 -0000
> > @@ -30,20 +30,21 @@
> >               struct tm * tm, fsec_t *fsec, int *is2digits);
> >  static int DecodeNumberField(int len, char *str,
> >                    int fmask, int *tmask,
> >                    struct tm * tm, fsec_t *fsec, int *is2digits);
> >  static int DecodeTime(char *str, int fmask, int *tmask,
> >             struct tm * tm, fsec_t *fsec);
> >  static int    DecodeTimezone(char *str, int *tzp);
> >  static datetkn *datebsearch(char *key, datetkn *base, unsigned int nel);
> >  static int    DecodeDate(char *str, int fmask, int *tmask, struct tm * tm);
> >  static void TrimTrailingZeros(char *str);
> > +static int  DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec);
> >
> >
> >  int            day_tab[2][13] = {
> >      {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0},
> >  {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0}};
> >
> >  char       *months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun",
> >  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", NULL};
> >
> >  char       *days[] = {"Sunday", "Monday", "Tuesday", "Wednesday",
> > @@ -2872,30 +2873,271 @@
> >              default:
> >                  *val = tp->value;
> >                  break;
> >          }
> >      }
> >
> >      return type;
> >  }
> >
> >
> > +void adjust_fval(double fval,struct tm * tm, fsec_t *fsec, int scale);
> > +{
> > +    int    sec;
> > +    fval       *= scale;
> > +    sec            = fval;
> > +    tm->tm_sec += sec;
> > +#ifdef HAVE_INT64_TIMESTAMP
> > +    *fsec       += ((fval - sec) * 1000000);
> > +#else
> > +    *fsec       += (fval - sec);
> > +#endif
> > +}
> > +
> > +
> > +/* DecodeISO8601Interval()
> > + *
> > + *  Check if it's a ISO 8601 Section 5.5.4.2 "Representation of
> > + *  time-interval by duration only."
> > + *  Basic extended format:  PnYnMnDTnHnMnS
> > + *                          PnW
> > + *  For more info.
> > + *  http://www.astroclark.freeserve.co.uk/iso8601/index.html
> > + *  ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
> > + *
> > + *  Examples:  P1D  for 1 day
> > + *             PT1H for 1 hour
> > + *             P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min
> > + *
> > + *  The first field is exactly "p" or "pt" it may be of this type.
> > + *
> > + *  Returns -1 if the field is not of this type.
> > + *
> > + *  It pretty strictly checks the spec, with the two exceptions
> > + *  that a week field ('W') may coexist with other units, and that
> > + *  this function allows decimals in fields other than the least
> > + *  significant units.
> > + */
> > +int
> > +DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec)
> > +{
> > +    char       *cp;
> > +    int            fmask = 0,
> > +                tmask;
> > +    int            val;
> > +    double        fval;
> > +    int            arg;
> > +    int            datepart;
> > +
> > +    /*
> > +     * An ISO 8601 "time-interval by duration only" must start
> > +     * with a 'P'.  If it contains a date-part, 'p' will be the
> > +     * only character in the field.  If it contains no date part
> > +     * it will contain exactly to characters 'PT' indicating a
> > +     * time part.
> > +     * Anything else is illegal and will be treated like a
> > +     * traditional postgresql interval.
> > +     */
> > +    if (!(field[0][0] == 'p' &&
> > +          ((field[0][1] == 0) || (field[0][1] == 't' && field[0][2] == 0))))
> > +    {
> > +      return -1;
> > +    }
> > +
> > +
> > +    /*
> > +     * If the first field is exactly 1 character ('P'), it starts
> > +     * with date elements.  Otherwise it's two characters ('PT');
> > +     * indicating it starts with a time part.
> > +     */
> > +    datepart = (field[0][1] == 0);
> > +
> > +    /*
> > +     * Every value must have a unit, so we require an even
> > +     * number of value/unit pairs. Therefore we require an
> > +     * odd nubmer of fields, including the prefix 'P'.
> > +     */
> > +    if ((nf & 1) == 0)
> > +        return -1;
> > +
> > +    /*
> > +     * Process pairs of fields at a time.
> > +     */
> > +    for (arg = 1 ; arg < nf ; arg+=2)
> > +    {
> > +        char * value = field[arg  ];
> > +        char * units = field[arg+1];
> > +
> > +        /*
> > +         * The value part must be a number.
> > +         */
> > +        if (ftype[arg] != DTK_NUMBER)
> > +            return -1;
> > +
> > +        /*
> > +         * extract the number, almost exactly like the non-ISO interval.
> > +         */
> > +        val = strtol(value, &cp, 10);
> > +
> > +        /*
> > +         * One difference from the normal postgresql interval below...
> > +         * ISO 8601 states that "Of these, the comma is the preferred
> > +         * sign" so I allow it here for locales that support it.
> > +         * Note: Perhaps the old-style interval code below should
> > +         * allow for this too, but I didn't want to risk backward
> > +         * compatability.
> > +         */
> > +        if (*cp == '.' || *cp == ',')
> > +        {
> > +            fval = strtod(cp, &cp);
> > +            if (*cp != '\0')
> > +                return -1;
> > +
> > +            if (val < 0)
> > +                fval = -(fval);
> > +        }
> > +        else if (*cp == '\0')
> > +            fval = 0;
> > +        else
> > +            return -1;
> > +
> > +
> > +        if (datepart)
> > +        {
> > +            /*
> > +             * All the 8601 unit specifiers are 1 character, but may
> > +             * be followed by a 'T' character if transitioning between
> > +             * the date part and the time part.  If it's not either
> > +             * one character or two characters with the second being 't'
> > +             * it's an error.
> > +             */
> > +            if (!(units[1] == 0 || (units[1] == 't' && units[2] == 0)))
> > +                return -1;
> > +
> > +            if (units[1] == 't')
> > +                datepart = 0;
> > +
> > +            switch (units[0]) /* Y M D W */
> > +            {
> > +                case 'd':
> > +                    tm->tm_mday += val;
> > +                    if (fval != 0)
> > +                      adjust_fval(fval,tm,fsec, 86400);
> > +                    tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY));
> > +                    break;
> > +
> > +                case 'w':
> > +                    tm->tm_mday += val * 7;
> > +                    if (fval != 0)
> > +                      adjust_fval(fval,tm,fsec,7 * 86400);
> > +                    tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY));
> > +                    break;
> > +
> > +                case 'm':
> > +                    tm->tm_mon += val;
> > +                    if (fval != 0)
> > +                      adjust_fval(fval,tm,fsec,30 * 86400);
> > +                    tmask = DTK_M(MONTH);
> > +                    break;
> > +
> > +                case 'y':
> > +                    /*
> > +                     * Why can fractional months produce seconds,
> > +                     * but fractional years can't?  Well the older
> > +                     * interval code below has the same property
> > +                     * so this one follows the other one too.
> > +                     */
> > +                    tm->tm_year += val;
> > +                    if (fval != 0)
> > +                        tm->tm_mon += (fval * 12);
> > +                    tmask = ((fmask & DTK_M(YEAR)) ? 0 : DTK_M(YEAR));
> > +                    break;
> > +
> > +                default:
> > +                    return -1;  /* invald date unit prefix */
> > +            }
> > +        }
> > +        else
> > +        {
> > +            /*
> > +             * ISO 8601 time part.
> > +             * In the time part, only one-character
> > +             * unit prefixes are allowed.  If it's more
> > +             * than one character, it's not a valid ISO 8601
> > +             * time interval by duration.
> > +             */
> > +            if (units[1] != 0)
> > +                return -1;
> > +
> > +            switch (units[0]) /* H M S */
> > +            {
> > +                case 's':
> > +                    tm->tm_sec += val;
> > +#ifdef HAVE_INT64_TIMESTAMP
> > +                    *fsec += (fval * 1000000);
> > +#else
> > +                    *fsec += fval;
> > +#endif
> > +                    tmask = DTK_M(SECOND);
> > +                    break;
> > +
> > +                case 'm':
> > +                    tm->tm_min += val;
> > +                    if (fval != 0)
> > +                      adjust_fval(fval,tm,fsec,60);
> > +                    tmask = DTK_M(MINUTE);
> > +                    break;
> > +
> > +                case 'h':
> > +                    tm->tm_hour += val;
> > +                    if (fval != 0)
> > +                      adjust_fval(fval,tm,fsec,3600);
> > +                    tmask = DTK_M(HOUR);
> > +                    break;
> > +
> > +                default:
> > +                    return -1; /* invald time unit prefix */
> > +            }
> > +        }
> > +        fmask |= tmask;
> > +    }
> > +
> > +    if (*fsec != 0)
> > +    {
> > +        int            sec;
> > +
> > +#ifdef HAVE_INT64_TIMESTAMP
> > +        sec = (*fsec / INT64CONST(1000000));
> > +        *fsec -= (sec * INT64CONST(1000000));
> > +#else
> > +        TMODULO(*fsec, sec, 1e0);
> > +#endif
> > +        tm->tm_sec += sec;
> > +    }
> > +    return (fmask != 0) ? 0 : -1;
> > +}
> > +
> > +
> >  /* DecodeInterval()
> >   * Interpret previously parsed fields for general time interval.
> >   * Returns 0 if successful, DTERR code if bogus input detected.
> >   *
> >   * Allow "date" field DTK_DATE since this could be just
> >   *    an unsigned floating point number. - thomas 1997-11-16
> >   *
> >   * Allow ISO-style time span, with implicit units on number of days
> >   *    preceding an hh:mm:ss field. - thomas 1998-04-30
> > + *
> > + * Allow ISO-8601 style "Representation of time-interval by duration only"
> > + *  of the format 'PnYnMnDTnHnMnS' and 'PnW' - ron 2003-08-30
> >   */
> > +
> >  int
> >  DecodeInterval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec)
> >  {
> >      int            is_before = FALSE;
> >      char       *cp;
> >      int            fmask = 0,
> >                  tmask,
> >                  type;
> >      int            i;
> >      int            dterr;
> > @@ -2906,20 +3148,37 @@
> >
> >      type = IGNORE_DTF;
> >      tm->tm_year = 0;
> >      tm->tm_mon = 0;
> >      tm->tm_mday = 0;
> >      tm->tm_hour = 0;
> >      tm->tm_min = 0;
> >      tm->tm_sec = 0;
> >      *fsec = 0;
> >
> > +    /*
> > +     *  Check if it's a ISO 8601 Section 5.5.4.2 "Representation of
> > +     *  time-interval by duration only."
> > +     *  Basic extended format:  PnYnMnDTnHnMnS
> > +     *                          PnW
> > +     *  http://www.astroclark.freeserve.co.uk/iso8601/index.html
> > +     *  ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF
> > +     *  Examples:  P1D  for 1 day
> > +     *             PT1H for 1 hour
> > +     *             P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min
> > +     *
> > +     *  The first field is exactly "p" or "pt" it may be of this type.
> > +     */
> > +    if (DecodeISO8601Interval(field,ftype,nf,dtype,tm,fsec) == 0) {
> > +        return 0;
> > +    }
> > +
> >      /* read through list backwards to pick up units before values */
> >      for (i = nf - 1; i >= 0; i--)
> >      {
> >          switch (ftype[i])
> >          {
> >              case DTK_TIME:
> >                  dterr = DecodeTime(field[i], fmask, &tmask, tm, fsec);
> >                  if (dterr)
> >                      return dterr;
> >                  type = DTK_DAY;
> > @@ -2983,20 +3242,21 @@
> >                  }
> >                  /* DROP THROUGH */
> >
> >              case DTK_DATE:
> >              case DTK_NUMBER:
> >                  val = strtol(field[i], &cp, 10);
> >
> >                  if (type == IGNORE_DTF)
> >                      type = DTK_SECOND;
> >
> > +                /* should this allow ',' for locales that use it ? */
> >                  if (*cp == '.')
> >                  {
> >                      fval = strtod(cp, &cp);
> >                      if (*cp != '\0')
> >                          return DTERR_BAD_FORMAT;
> >
> >                      if (val < 0)
> >                          fval = -(fval);
> >                  }
> >                  else if (*cp == '\0')
> >
> > ===================================================================
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
>
>

--
Peter Eisentraut   peter_e@gmx.net


pgsql-patches by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: introduce "default_use_oids"
Next
From: Joe Conway
Date:
Subject: Re: export FUNC_MAX_ARGS as a read-only GUC variable