ISO 8601 "Time Intervals" of the "format with time-unit deignators" - Mailing list pgsql-patches
From | Ron Mayer |
---|---|
Subject | ISO 8601 "Time Intervals" of the "format with time-unit deignators" |
Date | |
Msg-id | POEDIPIPKGJJLDNIEMBEGEPADIAA.ron@intervideo.com Whole thread Raw |
In response to | Re: [HACKERS] Are we losing momentum? (Jan Wieck <JanWieck@Yahoo.com>) |
Responses |
Re: ISO 8601 "Time Intervals" of the "format with time-unit deignators"
Re: ISO 8601 'Time Intervals' of the 'format with time-unit deignators' Re: ISO 8601 "Time Intervals" of the "format with time-unit Re: ISO 8601 "Time Intervals" of the "format with time-unit |
List | pgsql-patches |
Short summary: This patch allows ISO 8601 "time intervals" using the "format with time-unit designators" to specify postgresql "intervals". Below I have (A) What these time intervals are, (B) What I modified to support them, (C) Issues with intervals I want to bring up, and (D) a patch supporting them. It's helpful to me. Any feedback is appreciated. If you did want to consider including it, let me know what to clean up. If not, I thought I'd just put it here if anyone else finds it useful too. Thanks for your time, Ron Mayer Longer: (A) What these intervals are. ISO 8601, the standard from which PostgreSQL gets some of it's time syntax, also has a specification for "time-intervals". In particular, section 5.5.4.2 has a "Representation of time-interval by duration only" which I believe maps nicely to ISO intervals. Compared to the ISO 8601 time interval specification, the postgresql interval syntax is quite verbose. For example: Postgresql interval: ISO8601 Interval --------------------------------------------------- '1 year 6 months' 'P1Y6M' '3 hours 25 minutes 42 seconds' 'PT3H25M42S' Yeah, it's uglier, but it sure is short which can make for quicker typing and shorter scripts, and if for some strange reason you had an application using this format it's nice not to have to translate. The syntax is as follows: Basic extended format: PnYnMnDTnHnMnS PnW Where everything before the "T" is a date-part and everything after is a time-part. W is for weeks. In the date-part, Y=Year, M=Month, D=Day In the time-part, H=Hour, M=Minute, S=Second Much more info can be found from the draft standard ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF The final standard's only available for $$$ so I didn't look at it. Some other sites imply that this part didn't change from the last draft to the standard. (B) This change was made by adding two functions to "datetime.c" next to where DecodeInterval parses the normal interval syntax. A total of 313 lines were added, including comments and sgml docs. Of these only 136 are actual code, the rest, comments, whitespace, etc. One new function "DecodeISO8601Interval" follows the style of "DecodeInterval" below it, and trys to strictly follow the ISO syntax. If it doesn't match, it'll return -1 and the old syntax will be checked as before. The first test (first character of the first field must be 'P', and second character must be 'T' or '\0') should be fast so I don't think this will impact performance of existing code. The second function ("adjust_fval") is just a small helper-function to remove some of the cut&paste style that DecodeInterval used. It seems to work. ======================================================================= betadb=# select 'P1M15DT12H30M7S'::interval; interval ------------------------ 1 mon 15 days 12:30:07 (1 row) betadb=# select '1 month 15 days 12 hours 30 minutes 7 seconds'::interval; interval ------------------------ 1 mon 15 days 12:30:07 (1 row) ===================================================================== (C) Open issues with intervals, and questions I'd like to ask. 1. DecodeInterval seems to have a hardcoded '.' for specifying fractional times. ISO 8601 states that both '.' and ',' are ok, but "of these, the comma is the preferred sign". In DecodeISO8601Interval I loosened the test to allow both but left it as it was in DecodeInterval. Should both be changed to make them more consistant? 2. In "DecodeInterval", fractional weeks and fractional months can produce seconds; but fractional years can not (rounded to months). I didn't understand the reasoning for this, so I left it the same, and followed the same convention for ISO intervals. Should I change this? 3. I could save a bunch of copy-paste-lines-of-code from the pre-existing DecodeInterval by calling the adjust_fval helper function. The tradeoff is a few extra function-calls when decoding an interval. However I didn't want to risk changes to the existing part unless you guys encourage me to do so. (D) The patch. Index: doc/src/sgml/datatype.sgml =================================================================== RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/datatype.sgml,v retrieving revision 1.123 diff -u -1 -0 -r1.123 datatype.sgml --- doc/src/sgml/datatype.sgml 31 Aug 2003 17:32:18 -0000 1.123 +++ doc/src/sgml/datatype.sgml 8 Sep 2003 04:04:58 -0000 @@ -1735,20 +1735,71 @@ Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, <literal>'1 12:59:10'</> is read the same as <literal>'1 day 12 hours 59 min 10 sec'</>. </para> <para> The optional precision <replaceable>p</replaceable> should be between 0 and 6, and defaults to the precision of the input literal. </para> + + + <para> + Alternatively, <type>interval</type> values can be written as + ISO 8601 time intervals, using the "Format with time-unit designators". + This format always starts with the character <literal>'P'</>, followed + by a string of values followed by single character time-unit designators. + A <literal>'T'</> separates the date and time parts of the interval. + </para> + + <para> + Format: PnYnMnDTnHnMnS + </para> + <para> + In this format, <literal>'n'</> gets replaced by a number, and + <literal>Y</> represents years, + <literal>M</> (in the date part) months, + <literal>D</> months, + <literal>H</> hours, + <literal>M</> (in the time part) minutes, + and <literal>S</> seconds. + </para> + + + <table id="interval-example-table"> + <title>Interval Example</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Traditional</entry> + <entry>ISO-8601 time-interval</entry> + </row> + </thead> + <tbody> + <row> + <entry>1 month</entry> + <entry>P1M</entry> + </row> + <row> + <entry>1 hour 30 minutes</entry> + <entry>PT1H30M</entry> + </row> + <row> + <entry>2 years 10 months 15 days 10 hours 30 minutes 20 seconds</entry> + <entry>P2Y10M15DT10H30M20S</entry> + </row> + </tbody> + </thead> + </table> + + </para> </sect3> <sect3> <title>Special Values</title> <indexterm> <primary>time</primary> <secondary>constants</secondary> </indexterm> Index: src/backend/utils/adt/datetime.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/utils/adt/datetime.c,v retrieving revision 1.116 diff -u -1 -0 -r1.116 datetime.c --- src/backend/utils/adt/datetime.c 27 Aug 2003 23:29:28 -0000 1.116 +++ src/backend/utils/adt/datetime.c 8 Sep 2003 04:04:59 -0000 @@ -30,20 +30,21 @@ struct tm * tm, fsec_t *fsec, int *is2digits); static int DecodeNumberField(int len, char *str, int fmask, int *tmask, struct tm * tm, fsec_t *fsec, int *is2digits); static int DecodeTime(char *str, int fmask, int *tmask, struct tm * tm, fsec_t *fsec); static int DecodeTimezone(char *str, int *tzp); static datetkn *datebsearch(char *key, datetkn *base, unsigned int nel); static int DecodeDate(char *str, int fmask, int *tmask, struct tm * tm); static void TrimTrailingZeros(char *str); +static int DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec); int day_tab[2][13] = { {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0}, {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31, 0}}; char *months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", NULL}; char *days[] = {"Sunday", "Monday", "Tuesday", "Wednesday", @@ -2872,30 +2873,271 @@ default: *val = tp->value; break; } } return type; } +void adjust_fval(double fval,struct tm * tm, fsec_t *fsec, int scale); +{ + int sec; + fval *= scale; + sec = fval; + tm->tm_sec += sec; +#ifdef HAVE_INT64_TIMESTAMP + *fsec += ((fval - sec) * 1000000); +#else + *fsec += (fval - sec); +#endif +} + + +/* DecodeISO8601Interval() + * + * Check if it's a ISO 8601 Section 5.5.4.2 "Representation of + * time-interval by duration only." + * Basic extended format: PnYnMnDTnHnMnS + * PnW + * For more info. + * http://www.astroclark.freeserve.co.uk/iso8601/index.html + * ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF + * + * Examples: P1D for 1 day + * PT1H for 1 hour + * P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min + * + * The first field is exactly "p" or "pt" it may be of this type. + * + * Returns -1 if the field is not of this type. + * + * It pretty strictly checks the spec, with the two exceptions + * that a week field ('W') may coexist with other units, and that + * this function allows decimals in fields other than the least + * significant units. + */ +int +DecodeISO8601Interval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec) +{ + char *cp; + int fmask = 0, + tmask; + int val; + double fval; + int arg; + int datepart; + + /* + * An ISO 8601 "time-interval by duration only" must start + * with a 'P'. If it contains a date-part, 'p' will be the + * only character in the field. If it contains no date part + * it will contain exactly to characters 'PT' indicating a + * time part. + * Anything else is illegal and will be treated like a + * traditional postgresql interval. + */ + if (!(field[0][0] == 'p' && + ((field[0][1] == 0) || (field[0][1] == 't' && field[0][2] == 0)))) + { + return -1; + } + + + /* + * If the first field is exactly 1 character ('P'), it starts + * with date elements. Otherwise it's two characters ('PT'); + * indicating it starts with a time part. + */ + datepart = (field[0][1] == 0); + + /* + * Every value must have a unit, so we require an even + * number of value/unit pairs. Therefore we require an + * odd nubmer of fields, including the prefix 'P'. + */ + if ((nf & 1) == 0) + return -1; + + /* + * Process pairs of fields at a time. + */ + for (arg = 1 ; arg < nf ; arg+=2) + { + char * value = field[arg ]; + char * units = field[arg+1]; + + /* + * The value part must be a number. + */ + if (ftype[arg] != DTK_NUMBER) + return -1; + + /* + * extract the number, almost exactly like the non-ISO interval. + */ + val = strtol(value, &cp, 10); + + /* + * One difference from the normal postgresql interval below... + * ISO 8601 states that "Of these, the comma is the preferred + * sign" so I allow it here for locales that support it. + * Note: Perhaps the old-style interval code below should + * allow for this too, but I didn't want to risk backward + * compatability. + */ + if (*cp == '.' || *cp == ',') + { + fval = strtod(cp, &cp); + if (*cp != '\0') + return -1; + + if (val < 0) + fval = -(fval); + } + else if (*cp == '\0') + fval = 0; + else + return -1; + + + if (datepart) + { + /* + * All the 8601 unit specifiers are 1 character, but may + * be followed by a 'T' character if transitioning between + * the date part and the time part. If it's not either + * one character or two characters with the second being 't' + * it's an error. + */ + if (!(units[1] == 0 || (units[1] == 't' && units[2] == 0))) + return -1; + + if (units[1] == 't') + datepart = 0; + + switch (units[0]) /* Y M D W */ + { + case 'd': + tm->tm_mday += val; + if (fval != 0) + adjust_fval(fval,tm,fsec, 86400); + tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY)); + break; + + case 'w': + tm->tm_mday += val * 7; + if (fval != 0) + adjust_fval(fval,tm,fsec,7 * 86400); + tmask = ((fmask & DTK_M(DAY)) ? 0 : DTK_M(DAY)); + break; + + case 'm': + tm->tm_mon += val; + if (fval != 0) + adjust_fval(fval,tm,fsec,30 * 86400); + tmask = DTK_M(MONTH); + break; + + case 'y': + /* + * Why can fractional months produce seconds, + * but fractional years can't? Well the older + * interval code below has the same property + * so this one follows the other one too. + */ + tm->tm_year += val; + if (fval != 0) + tm->tm_mon += (fval * 12); + tmask = ((fmask & DTK_M(YEAR)) ? 0 : DTK_M(YEAR)); + break; + + default: + return -1; /* invald date unit prefix */ + } + } + else + { + /* + * ISO 8601 time part. + * In the time part, only one-character + * unit prefixes are allowed. If it's more + * than one character, it's not a valid ISO 8601 + * time interval by duration. + */ + if (units[1] != 0) + return -1; + + switch (units[0]) /* H M S */ + { + case 's': + tm->tm_sec += val; +#ifdef HAVE_INT64_TIMESTAMP + *fsec += (fval * 1000000); +#else + *fsec += fval; +#endif + tmask = DTK_M(SECOND); + break; + + case 'm': + tm->tm_min += val; + if (fval != 0) + adjust_fval(fval,tm,fsec,60); + tmask = DTK_M(MINUTE); + break; + + case 'h': + tm->tm_hour += val; + if (fval != 0) + adjust_fval(fval,tm,fsec,3600); + tmask = DTK_M(HOUR); + break; + + default: + return -1; /* invald time unit prefix */ + } + } + fmask |= tmask; + } + + if (*fsec != 0) + { + int sec; + +#ifdef HAVE_INT64_TIMESTAMP + sec = (*fsec / INT64CONST(1000000)); + *fsec -= (sec * INT64CONST(1000000)); +#else + TMODULO(*fsec, sec, 1e0); +#endif + tm->tm_sec += sec; + } + return (fmask != 0) ? 0 : -1; +} + + /* DecodeInterval() * Interpret previously parsed fields for general time interval. * Returns 0 if successful, DTERR code if bogus input detected. * * Allow "date" field DTK_DATE since this could be just * an unsigned floating point number. - thomas 1997-11-16 * * Allow ISO-style time span, with implicit units on number of days * preceding an hh:mm:ss field. - thomas 1998-04-30 + * + * Allow ISO-8601 style "Representation of time-interval by duration only" + * of the format 'PnYnMnDTnHnMnS' and 'PnW' - ron 2003-08-30 */ + int DecodeInterval(char **field, int *ftype, int nf, int *dtype, struct tm * tm, fsec_t *fsec) { int is_before = FALSE; char *cp; int fmask = 0, tmask, type; int i; int dterr; @@ -2906,20 +3148,37 @@ type = IGNORE_DTF; tm->tm_year = 0; tm->tm_mon = 0; tm->tm_mday = 0; tm->tm_hour = 0; tm->tm_min = 0; tm->tm_sec = 0; *fsec = 0; + /* + * Check if it's a ISO 8601 Section 5.5.4.2 "Representation of + * time-interval by duration only." + * Basic extended format: PnYnMnDTnHnMnS + * PnW + * http://www.astroclark.freeserve.co.uk/iso8601/index.html + * ftp://ftp.qsl.net/pub/g1smd/154N362_.PDF + * Examples: P1D for 1 day + * PT1H for 1 hour + * P2Y6M7DT1H30M for 2 years, 6 months, 7 days 1 hour 30 min + * + * The first field is exactly "p" or "pt" it may be of this type. + */ + if (DecodeISO8601Interval(field,ftype,nf,dtype,tm,fsec) == 0) { + return 0; + } + /* read through list backwards to pick up units before values */ for (i = nf - 1; i >= 0; i--) { switch (ftype[i]) { case DTK_TIME: dterr = DecodeTime(field[i], fmask, &tmask, tm, fsec); if (dterr) return dterr; type = DTK_DAY; @@ -2983,20 +3242,21 @@ } /* DROP THROUGH */ case DTK_DATE: case DTK_NUMBER: val = strtol(field[i], &cp, 10); if (type == IGNORE_DTF) type = DTK_SECOND; + /* should this allow ',' for locales that use it ? */ if (*cp == '.') { fval = strtod(cp, &cp); if (*cp != '\0') return DTERR_BAD_FORMAT; if (val < 0) fval = -(fval); } else if (*cp == '\0') ===================================================================
pgsql-patches by date: