Thread: Support for jsonpath .datetime() method
Hi! Attached patchset implements jsonpath .datetime() method. * 0001-datetime-in-JsonbValue-1.patch This patch allows JsonbValue struct to hold datetime values. It appears to be convenient since jsonpath execution engine uses JsonbValue to store intermediate calculation results. On serialization datetime values are converted into strings. * 0002-datetime-conversion-for-jsonpath-1.patch This patch adds some datetime conversion infrastructure missing according to SQL/JSON standard. It includes FF1-FF6 format patterns, runtime identification of datetime type, strict parsing mode. * 0003-error-suppression-for-datetime-1.path As jsonpath supports error suppression in general, it's required for datetime functions too. This commit implements it in the same manner as we did for numerics before. * 0004-implement-jsonpath-datetime-1.path .datetime() method itself and additionally comparison of datetime values. Here goes a trick. Out exising jsonb_path_*() functions are immutable, while comparison of timezoned and non-timezoned type is obviously not. This patch makes existing immutable jsonb_path_*() functions throw error on non-immutable comparison. Additionally it implements stable jsonb_path_*_tz() functions, which support full set of features. I was going to discuss this patchset among the other SQL/JSON problems on PGCon unconference, but I didn't make it there. I found most questionable point in this patchset to be two sets of functions: immutable and stable. However, I don't see better solution here: we need immutable functions for expression indexes, and also we need function with full set of jsonpath features, which are not all immutable. Sometimes immutability of jsonpath expression could be determined runtime. When .datetime() method is used with template string argument we may know result type in advance. Thus, in some times we may know in advance that given jsonpath is immutable. So, we could hack contain_mutable_functions_checker() or something to make an exclusive heuristics for jsonb_path_*() functions. But I think it's better to go with jsonb_path_*() and jsonb_path_*_tz() variants for now. We could come back to idea of heuristics during consideration of standard SQL/JSON clauses. Any thoughts? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, May 28, 2019 at 8:55 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Attached patchset implements jsonpath .datetime() method. Revised patchset is attached. Some inconsistencies around parse_datetime() function are fixed. Rebased to current master as well. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, Jul 1, 2019 at 7:28 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, May 28, 2019 at 8:55 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Attached patchset implements jsonpath .datetime() method. > > Revised patchset is attached. Some inconsistencies around > parse_datetime() function are fixed. Rebased to current master as > well. I found commitfest.cputube.org is unhappy with this patchset because of gcc warning. Fixed in attached patchset. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: not tested Hi, In general, the feature looks good. It is consistent with the standard and the code around. It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented. Here are also minor questions on implementation and code style: 1) + case jbvDatetime: elog(ERROR, "unexpected jbvBinary value"); We should use separate error message for jvbDatetime here. 2) + *jentry = JENTRY_ISSTRING | len; Here we can avoid using JENTRY_ISSTRING since it defined to 0x0. I propose to do so to be consistent with jbvString case. 3) + * Default time-zone for tz types is specified with 'tzname'. If 'tzname' is + * NULL and the input string does not contain zone components then "missing tz" + * error is thrown. + */ +Datum +parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid, + int32 *typmod, int *tz) The comment about 'tzname' is outdated. 4) Some typos: + * Convinience macros for error handling > * Convenience macros for error handling + * Two macros below helps handling errors in functions, which takes > * Two macros below help to handle errors in functions, which take 5) + * RETURN_ERROR() macro intended to wrap ereport() calls. When have_error + * argument is provided, then instead of ereport'ing we set *have_error flag have_error is not a macro argument, so I suggest to rephrase this comment. Shouldn't we pass have_error explicitly? In case someone will change the name of the variable, this macro will work incorrectly. 6) * When no argument is supplied, first fitting ISO format is selected. + /* Try to recognize one of ISO formats. */ + static const char *fmt_str[] = + { + "yyyy-mm-dd HH24:MI:SS TZH:TZM", + "yyyy-mm-dd HH24:MI:SS TZH", + "yyyy-mm-dd HH24:MI:SS", + "yyyy-mm-dd", + "HH24:MI:SS TZH:TZM", + "HH24:MI:SS TZH", + "HH24:MI:SS" + }; How do we choose the order of formats to check? Is it in standard? Anyway, I think this struct needs a comment that explains that changing of order can affect end-user. 7) + if (res == jperNotFound) + RETURN_ERROR(ereport(ERROR, + (errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION), + errmsg("invalid argument for SQL/JSON datetime function"), + errdetail("unrecognized datetime format"), + errhint("use datetime template argument for explicit format specification")))); The hint is confusing. If I understand correctly, no-arg datetime function supports all formats, so if parsing failed, it must be an invalid argument and providing format explicitly won't help. The new status of this patch is: Waiting on Author
Hi! Thank you for the review! Revised version of patch is attached. On Mon, Jul 15, 2019 at 3:57 PM Anastasia Lubennikova <lubennikovaav@gmail.com> wrote: > In general, the feature looks good. It is consistent with the standard and the code around. > It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented. Documentation is added for both jsonpath .datetime() method and SQL jsonb_path_*_tz() functions. > Here are also minor questions on implementation and code style: > > 1) + case jbvDatetime: > elog(ERROR, "unexpected jbvBinary value"); > We should use separate error message for jvbDatetime here. Fixed. > 2) + *jentry = JENTRY_ISSTRING | len; > Here we can avoid using JENTRY_ISSTRING since it defined to 0x0. > I propose to do so to be consistent with jbvString case. Fixed. > 3) > + * Default time-zone for tz types is specified with 'tzname'. If 'tzname' is > + * NULL and the input string does not contain zone components then "missing tz" > + * error is thrown. > + */ > +Datum > +parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid, > + int32 *typmod, int *tz) > > The comment about 'tzname' is outdated. Fixed. > 4) Some typos: > > + * Convinience macros for error handling > > * Convenience macros for error handling > > + * Two macros below helps handling errors in functions, which takes > > * Two macros below help to handle errors in functions, which take Fixed. > 5) + * RETURN_ERROR() macro intended to wrap ereport() calls. When have_error > + * argument is provided, then instead of ereport'ing we set *have_error flag > > have_error is not a macro argument, so I suggest to rephrase this comment. > > Shouldn't we pass have_error explicitly? > In case someone will change the name of the variable, this macro will work incorrectly. Comment about RETURN_ERROR() is updated. RETURN_ERROR() is just file-wide macro. So I think in this case it's ok to pass *have_error flag implicitly for the sake of brevity. > 6) * When no argument is supplied, first fitting ISO format is selected. > + /* Try to recognize one of ISO formats. */ > + static const char *fmt_str[] = > + { > + "yyyy-mm-dd HH24:MI:SS TZH:TZM", > + "yyyy-mm-dd HH24:MI:SS TZH", > + "yyyy-mm-dd HH24:MI:SS", > + "yyyy-mm-dd", > + "HH24:MI:SS TZH:TZM", > + "HH24:MI:SS TZH", > + "HH24:MI:SS" > + }; > > How do we choose the order of formats to check? Is it in standard? > Anyway, I think this struct needs a comment that explains that changing of order can affect end-user. Yes, standard defines which order we should try datetime types (and corresponding ISO formats). I've updated respectively array, its comment and docs. > 7) + if (res == jperNotFound) > + RETURN_ERROR(ereport(ERROR, > + (errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION), > + errmsg("invalid argument for SQL/JSON datetime function"), > + errdetail("unrecognized datetime format"), > + errhint("use datetime template argument for explicitformat specification")))); > > The hint is confusing. If I understand correctly, no-arg datetime function supports all formats, > so if parsing failed, it must be an invalid argument and providing format explicitly won't help. Custom format string may define format not enumerated in fmt_str[]. For instance, imagine "dd.mm.yyyy". In some cases custom format string can fix the error. So, ISTM hint is OK. I'm setting this back to "Needs review" waiting for either you or Peter Eisentraut provide additional review. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 7/16/19 6:41 AM, Alexander Korotkov wrote: > Hi! > > Thank you for the review! > > Revised version of patch is attached. > > On Mon, Jul 15, 2019 at 3:57 PM Anastasia Lubennikova > <lubennikovaav@gmail.com> wrote: >> In general, the feature looks good. It is consistent with the standard and the code around. >> It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented. > Documentation is added for both jsonpath .datetime() method and SQL > jsonb_path_*_tz() functions. > >> Here are also minor questions on implementation and code style: >> >> 1) + case jbvDatetime: >> elog(ERROR, "unexpected jbvBinary value"); >> We should use separate error message for jvbDatetime here. > Fixed. > >> 2) + *jentry = JENTRY_ISSTRING | len; >> Here we can avoid using JENTRY_ISSTRING since it defined to 0x0. >> I propose to do so to be consistent with jbvString case. > Fixed. > >> 3) >> + * Default time-zone for tz types is specified with 'tzname'. If 'tzname' is >> + * NULL and the input string does not contain zone components then "missing tz" >> + * error is thrown. >> + */ >> +Datum >> +parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid, >> + int32 *typmod, int *tz) >> >> The comment about 'tzname' is outdated. > Fixed. > >> 4) Some typos: >> >> + * Convinience macros for error handling >>> * Convenience macros for error handling >> + * Two macros below helps handling errors in functions, which takes >>> * Two macros below help to handle errors in functions, which take > Fixed. > >> 5) + * RETURN_ERROR() macro intended to wrap ereport() calls. When have_error >> + * argument is provided, then instead of ereport'ing we set *have_error flag >> >> have_error is not a macro argument, so I suggest to rephrase this comment. >> >> Shouldn't we pass have_error explicitly? >> In case someone will change the name of the variable, this macro will work incorrectly. > Comment about RETURN_ERROR() is updated. RETURN_ERROR() is just > file-wide macro. So I think in this case it's ok to pass *have_error > flag implicitly for the sake of brevity. > >> 6) * When no argument is supplied, first fitting ISO format is selected. >> + /* Try to recognize one of ISO formats. */ >> + static const char *fmt_str[] = >> + { >> + "yyyy-mm-dd HH24:MI:SS TZH:TZM", >> + "yyyy-mm-dd HH24:MI:SS TZH", >> + "yyyy-mm-dd HH24:MI:SS", >> + "yyyy-mm-dd", >> + "HH24:MI:SS TZH:TZM", >> + "HH24:MI:SS TZH", >> + "HH24:MI:SS" >> + }; >> >> How do we choose the order of formats to check? Is it in standard? >> Anyway, I think this struct needs a comment that explains that changing of order can affect end-user. > Yes, standard defines which order we should try datetime types (and > corresponding ISO formats). I've updated respectively array, its > comment and docs. > >> 7) + if (res == jperNotFound) >> + RETURN_ERROR(ereport(ERROR, >> + (errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION), >> + errmsg("invalid argument for SQL/JSON datetime function"), >> + errdetail("unrecognized datetime format"), >> + errhint("use datetime template argument for explicitformat specification")))); >> >> The hint is confusing. If I understand correctly, no-arg datetime function supports all formats, >> so if parsing failed, it must be an invalid argument and providing format explicitly won't help. > Custom format string may define format not enumerated in fmt_str[]. > For instance, imagine "dd.mm.yyyy". In some cases custom format > string can fix the error. So, ISTM hint is OK. > > I'm setting this back to "Needs review" waiting for either you or > Peter Eisentraut provide additional review. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company Hi Alexander, I had look at the added docs and would like to suggest a couple of changes. Please see the attached patches with my my edits for func.sgml and some of the comments. Looks like we also need to change the following entry in features-unsupported.sgml, and probably move it to features-supported.sgml? <row> <entry>T832</entry> <entry></entry> <entry>SQL/JSON path language: item method</entry> <entry>datetime() not yet implemented</entry> </row> -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, Liudmila! On Fri, Jul 19, 2019 at 5:30 PM Liudmila Mantrova <l.mantrova@postgrespro.ru> wrote: > I had look at the added docs and would like to suggest a couple of > changes. Please see the attached patches with my my edits for func.sgml > and some of the comments. Thank you for your edits, they look good to me. Attached patchset contains your edits as well as revised commit messages. > Looks like we also need to change the following entry in > features-unsupported.sgml, and probably move it to features-supported.sgml? > > <row> > <entry>T832</entry> > <entry></entry> > <entry>SQL/JSON path language: item method</entry> > <entry>datetime() not yet implemented</entry> > </row> Yes, that's it. Attached patch updates sql_features.txt, which is a source for generation of both features-unsupported.sgml and features-supported.sgml. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
I think the best way forward here is to focus first on patch 0002 and get the additional format templates in, independent of any surrounding JSON functionality. In particular, remove parse_datetime() and all the related API changes, then it becomes much simpler. The codes FF1..FF6 that you added appear to be correct, but reading the spec I find there is more missing, specifically - RRRR and RR - SSSSS (currently only SSSS is supported, but that's not standard) Also in some cases we allow timestamps with seven digits of fractional precision, so perhaps FF7 should be supported as well. I'm not quite sure about the details here. You tests only cover 6 and 9 digits. It would be good to cover 7 and perhaps 8 as well, since those are the boundary cases. Some concrete pieces of review: + <row> + <entry><literal>FF1</literal></entry> + <entry>decisecond (0-9)</entry> + </row> Let's not use such weird terms as "deciseconds". We could say "fractional seconds, 1 digit" etc. or something like that. +/* Return flags for DCH_from_char() */ +#define DCH_DATED 0x01 +#define DCH_TIMED 0x02 +#define DCH_ZONED 0x04 I think you mean do_to_timestamp() here. These terms "dated" etc. are from the SQL standard text, but they should be explained somewhere for the readers of the code. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23.07.2019 16:44, Peter Eisentraut wrote:
I think the best way forward here is to focus first on patch 0002 and get the additional format templates in, independent of any surrounding JSON functionality. In particular, remove parse_datetime() and all the related API changes, then it becomes much simpler. The codes FF1..FF6 that you added appear to be correct, but reading the spec I find there is more missing, specifically - RRRR and RR
It seems that our YY works like RR should:
SELECT to_date('69', 'YY'); to_date ------------2069-01-01 (1 row) SELECT to_date('70', 'YY'); to_date ------------1970-01-01 (1 row) But by the standard first two digits of current year should be used in YY. Oracle follows the standard but its implementation has the different rounding algorithm: SELECT TO_CHAR(TO_DATE('99', 'YY'), 'YYYY') from dual; 2099 SELECT TO_CHAR(TO_DATE('49', 'RR'), 'YYYY') from dual; 2049 SELECT TO_CHAR(TO_DATE('50', 'RR'), 'YYYY') from dual; 1950 So it's unclear what we should do: - implement YY and RR strictly following the standard only in .datetime()- fix YY implementation in to_date()/to_timestamp() and implement RR- use our non-standard templates in .datetime()
- SSSSS (currently only SSSS is supported, but that's not standard)
SSSSS template can be easily added as alias to SSSS.
Also in some cases we allow timestamps with seven digits of fractional precision, so perhaps FF7 should be supported as well. I'm not quite sure about the details here. You tests only cover 6 and 9 digits. It would be good to cover 7 and perhaps 8 as well, since those are the boundary cases.
FF7-FF9 weer present in earlier versions of the jsonpath patches, but they had been removed (see [1]) because they were not completely supported due to the limited precision of timestamp.
And what about "tenths of seconds", "hundredths of seconds"?Some concrete pieces of review: + <row> + <entry><literal>FF1</literal></entry> + <entry>decisecond (0-9)</entry> + </row> Let's not use such weird terms as "deciseconds". We could say "fractional seconds, 1 digit" etc. or something like that.
+/* Return flags for DCH_from_char() */ +#define DCH_DATED 0x01 +#define DCH_TIMED 0x02 +#define DCH_ZONED 0x04 I think you mean do_to_timestamp() here. These terms "dated" etc. are from the SQL standard text, but they should be explained somewhere for the readers of the code.
On Wed, Jul 24, 2019 at 1:50 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote: > So it's unclear what we should do: > - implement YY and RR strictly following the standard only in .datetime() > - fix YY implementation in to_date()/to_timestamp() and implement RR > - use our non-standard templates in .datetime() Also it appears that according to standard .datetime() should treat spaces and delimiters differently than our to_date()/to_timestamp(). It requires strict matching of spaces and delimiters in input and format strings. We don't have such behavior in both non-FX and FX modes. Also, standard doesn't define FX mode at all. This rules cover jsonpath .datetime() method and CAST(... FORMAT ...) – new cast clause defined by standard. So, I think due to reasons of compatibility it doesn't worth trying to make behavior of our to_date()/to_timestamp() to fit requirements for jsonpath .datetime() and CAST(... FORMAT ...). I propose to leave this functions as is (maybe add new patterns), but introduce another datetime parsing mode, which would fit to the standard. Opinions? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2019-07-24 00:48, Nikita Glukhov wrote: > It seems that our YY works like RR should: > > SELECT to_date('69', 'YY'); > to_date > ------------ > 2069-01-01 > (1 row) > > SELECT to_date('70', 'YY'); > to_date > ------------ > 1970-01-01 > (1 row) > > But by the standard first two digits of current year should be used in YY. Is this behavior even documented anywhere in our documentation? I couldn't find it. What's the exact specification of what it does in these cases? > So it's unclear what we should do: > - implement YY and RR strictly following the standard only in .datetime() > - fix YY implementation in to_date()/to_timestamp() and implement RR > - use our non-standard templates in .datetime() I think we definitely should try to use the same template system in both the general functions and in .datetime(). This might involve some compromises between existing behavior, Oracle behavior, SQL standard. So far I'm not worried: If you're using two-digit years like above, you're playing with fire anyway. Also some of the other cases like dealing with trailing spaces are probably acceptable as slight incompatibilities or extensions. We should collect a list of test cases that illustrate the differences and then work out how to deal with them. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7/24/19 4:25 PM, Peter Eisentraut wrote: > On 2019-07-24 00:48, Nikita Glukhov wrote: >> It seems that our YY works like RR should: >> >> SELECT to_date('69', 'YY'); >> to_date >> ------------ >> 2069-01-01 >> (1 row) >> >> SELECT to_date('70', 'YY'); >> to_date >> ------------ >> 1970-01-01 >> (1 row) >> >> But by the standard first two digits of current year should be used in YY. > Is this behavior even documented anywhere in our documentation? I > couldn't find it. What's the exact specification of what it does in > these cases? > >> So it's unclear what we should do: >> - implement YY and RR strictly following the standard only in .datetime() >> - fix YY implementation in to_date()/to_timestamp() and implement RR >> - use our non-standard templates in .datetime() > I think we definitely should try to use the same template system in both > the general functions and in .datetime(). Agreed. It's too hard to maintain otherwise. > This might involve some > compromises between existing behavior, Oracle behavior, SQL standard. > So far I'm not worried: If you're using two-digit years like above, > you're playing with fire anyway. Also some of the other cases like > dealing with trailing spaces are probably acceptable as slight > incompatibilities or extensions. My instict wouyld be to move as close as possible to the standard, especially if the current behaviour isn't documented. > > We should collect a list of test cases that illustrate the differences > and then work out how to deal with them. > Agreed. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 7/23/19 6:48 PM, Nikita Glukhov wrote: > Some concrete pieces of review: >> + <row> >> + <entry><literal>FF1</literal></entry> >> + <entry>decisecond (0-9)</entry> >> + </row> >> >> Let's not use such weird terms as "deciseconds". We could say >> "fractional seconds, 1 digit" etc. or something like that. > And what about "tenths of seconds", "hundredths of seconds"? Yes, those are much better. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote: > On 7/23/19 6:48 PM, Nikita Glukhov wrote: > > Some concrete pieces of review: > >> + <row> > >> + <entry><literal>FF1</literal></entry> > >> + <entry>decisecond (0-9)</entry> > >> + </row> > >> > >> Let's not use such weird terms as "deciseconds". We could say > >> "fractional seconds, 1 digit" etc. or something like that. > > And what about "tenths of seconds", "hundredths of seconds"? > > Yes, those are much better. I've moved this to the September CF, still in "Waiting on Author" state. -- Thomas Munro https://enterprisedb.com
On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan > <andrew.dunstan@2ndquadrant.com> wrote: > > On 7/23/19 6:48 PM, Nikita Glukhov wrote: > > > Some concrete pieces of review: > > >> + <row> > > >> + <entry><literal>FF1</literal></entry> > > >> + <entry>decisecond (0-9)</entry> > > >> + </row> > > >> > > >> Let's not use such weird terms as "deciseconds". We could say > > >> "fractional seconds, 1 digit" etc. or something like that. > > > And what about "tenths of seconds", "hundredths of seconds"? > > > > Yes, those are much better. > > I've moved this to the September CF, still in "Waiting on Author" state. I'd like to summarize differences between standard datetime parsing and our to_timestamp()/to_date(). 1) Standard defines much less datetime template parts. Namely it defines: YYYY | YYY | YY | Y RRRR | RR MM DD DDD HH | HH12 HH24 MI SS SSSSS FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9 A.M. | P.M. TZH TZM We support majority of them and much more. Incompatibilities are: * SSSS (our name is SSSSS), * We don't support RRRR | RR, * Our handling of YYYY | YYY | YY | Y is different. What we have here is more like RRRR | RR in standard (Nikita explained that upthread [1]), * We don't support FF[1-9]. FF[1-6] are implemented in patch. We can't support FF[7-9], because our binary representation of timestamp datatype don't have enough of precision. 2) Standard defines only following delimiters: <minus sign>, <period>, <solidus>, <comma>, <apostrophe>, <semicolon>, <colon>, <space>. And it requires strict matching of separators between template and input strings. We don't do so either in FX or non-FX mode. For instance, we allow both to_date('2019/12/31', 'YYYY-MM-DD') and to_date('2019/12/31', 'FXYYYY-MM-DD'). But according to standard this date should be written only as '2019-12-31' to match given template string. 3) Standard prescribes recognition of digits according to \p{Nd} regex. \p{Nd} matches to "a digit zero through nine in any script except ideographic scripts". As far as I remember, we currently do recognize only ASCII digits. 4) For non-delimited template parts standard requires matching to digit sequences of lengths between 1 and maximum number of characters of that template part. We don't always do so. For instance, we allow more than 4 digits to correspond to YYYY, more than 3 digits to correspond to YYY and so on. # select to_date('2019-12-31', 'YYY-MM-DD'); to_date ------------ 2019-12-31 (1 row) Links. 1. https://www.postgresql.org/message-id/d6efab15-f3a4-40d6-8ddb-6fd8f64cbc08%40postgrespro.ru ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Aug 13, 2019 at 12:08 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan > > <andrew.dunstan@2ndquadrant.com> wrote: > > > On 7/23/19 6:48 PM, Nikita Glukhov wrote: > > > > Some concrete pieces of review: > > > >> + <row> > > > >> + <entry><literal>FF1</literal></entry> > > > >> + <entry>decisecond (0-9)</entry> > > > >> + </row> > > > >> > > > >> Let's not use such weird terms as "deciseconds". We could say > > > >> "fractional seconds, 1 digit" etc. or something like that. > > > > And what about "tenths of seconds", "hundredths of seconds"? > > > > > > Yes, those are much better. > > > > I've moved this to the September CF, still in "Waiting on Author" state. > > I'd like to summarize differences between standard datetime parsing > and our to_timestamp()/to_date(). Let me describe my proposal to overcome these differences. > 1) Standard defines much less datetime template parts. Namely it defines: > YYYY | YYY | YY | Y > RRRR | RR > MM > DD > DDD > HH | HH12 > HH24 > MI > SS > SSSSS > FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9 > A.M. | P.M. > TZH > TZM > > We support majority of them and much more. Regarding non-contradicting template parts we can support them in .datetime() method too. That would be our extension to standard. See no problem here. > Incompatibilities are: > * SSSS (our name is SSSSS), Since SSSS is not reserved, I'd propose to make SSSS an alias for SSSSS. > * We don't support RRRR | RR, > * Our handling of YYYY | YYY | YY | Y is different. What we have > here is more like RRRR | RR in standard (Nikita explained that > upthread [1]), I'd like to make YYYY | YYY | YY | Y and RRRR | RR behavior standard conforming in both to_timestamp()/to_date() and .datetime(). Handling these template parts differently in different functions would be confusing for users. > * We don't support FF[1-9]. FF[1-6] are implemented in patch. We > can't support FF[7-9], because our binary representation of timestamp > datatype don't have enough of precision. I propose to postpone implementation of FF[7-9]. We can support them later once we have precise enough datatypes. > 2) Standard defines only following delimiters: <minus sign>, <period>, > <solidus>, <comma>, <apostrophe>, <semicolon>, <colon>, <space>. And > it requires strict matching of separators between template and input > strings. We don't do so either in FX or non-FX mode. > > For instance, we allow both to_date('2019/12/31', 'YYYY-MM-DD') and > to_date('2019/12/31', 'FXYYYY-MM-DD'). But according to standard this > date should be written only as '2019-12-31' to match given template > string. > > 4) For non-delimited template parts standard requires matching to > digit sequences of lengths between 1 and maximum number of characters > of that template part. We don't always do so. For instance, we allow > more than 4 digits to correspond to YYYY, more than 3 digits to > correspond to YYY and so on. > > # select to_date('2019-12-31', 'YYY-MM-DD'); > to_date > ------------ > 2019-12-31 > (1 row) In order to implement these I'd like to propose introduction of special do_to_timestamp() flag, which would define standard conforming parsing. This flag would be used in .datetime() jsonpath method. Later we also should use it for CAST(... FORMAT ...) expression, which should also do standard conforming parsing > 3) Standard prescribes recognition of digits according to \p{Nd} > regex. \p{Nd} matches to "a digit zero through nine in any script > except ideographic scripts". As far as I remember, we currently do > recognize only ASCII digits. Support all unicode digit scripts would be cool for both to_timestamp()/to_date() and standard parsing. However, I think this could be postponed. Personally I didn't meet non-ascii digits in databases yet. If needed one can implement this later, shouldn't be hard. If no objections, Nikita and me will work on revised patchset based on this proposal. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Aug 19, 2019 at 1:29 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > If no objections, Nikita and me will work on revised patchset based on > this proposal. Revised patchset is attached. It still requires some polishing. But the most doubtful part is handling of RR, YYY, YY and Y. Standard requires us to complete YYY, YY and Y with high digits from current year. So, if YY matches 99, then year should be 2099, not 1999. For RR, standard requirements are relaxed. Implementation may choose matching year from range [current_year - 100; current_year + 100]. It looks reasonable to handle RR in the same way we currently handle YY: select appropriate year in [1970; 2069] range. It seems like we select this range to start in the same point as unix timestamp. But nowadays it still looks reasonable: it's about +- 50 from current year. So, years close to the current one are likely completed correctly. In Oracle RR returns year in [1950; 1949] range. So, it seems to be designed near 2000 :). I don't think we need to copy this behavior. Handling YYY and YY in standard way seems quite easy. We can complete them as 2YYY and 20YY. This should be standard conforming till 2100. But handling Y looks problematic. Immutable way of handling this would work only for decade. Current code completes Y as 200Y and it looks pretty "outdated" now in 2019. Using current real year would make conversion timestamp-dependent. This property doesn't look favor for to_date()/to_timestamp() and unacceptable for immutable jsonpath functions (but we can forbid using Y pattern there). Current patch complete Y as 202Y assuming v13 will be released in 2020. But I'm not sure what is better solution here. The bright side is that I haven't seen anybody use Y patten in real life :) ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0001-ff1-ff6-datetime-patterns-5.patch
- 0003-rr-revise-yy-datetime-patterns-5.patch
- 0002-sssss-datetime-pattern-5.patch
- 0005-parse_datetime-function-5.patch
- 0004-standard-datetime-parsing-5.patch
- 0006-error-suppression-for-datetime-5.patch
- 0007-datetime-in-JsonbValue-5.patch
- 0008-implement-jsonpath-datetime-5.patch
On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Revised patchset is attached. It still requires some polishing. But > the most doubtful part is handling of RR, YYY, YY and Y. > > Standard requires us to complete YYY, YY and Y with high digits from > current year. So, if YY matches 99, then year should be 2099, not > 1999. > > For RR, standard requirements are relaxed. Implementation may choose > matching year from range [current_year - 100; current_year + 100]. It > looks reasonable to handle RR in the same way we currently handle YY: > select appropriate year in [1970; 2069] range. It seems like we > select this range to start in the same point as unix timestamp. But > nowadays it still looks reasonable: it's about +- 50 from current > year. So, years close to the current one are likely completed > correctly. In Oracle RR returns year in [1950; 1949] range. So, it > seems to be designed near 2000 :). I don't think we need to copy this > behavior. > > Handling YYY and YY in standard way seems quite easy. We can complete > them as 2YYY and 20YY. This should be standard conforming till 2100. > > But handling Y looks problematic. Immutable way of handling this > would work only for decade. Current code completes Y as 200Y and it > looks pretty "outdated" now in 2019. Using current real year would > make conversion timestamp-dependent. This property doesn't look favor > for to_date()/to_timestamp() and unacceptable for immutable jsonpath > functions (but we can forbid using Y pattern there). Current patch > complete Y as 202Y assuming v13 will be released in 2020. But I'm not > sure what is better solution here. The bright side is that I haven't > seen anybody use Y patten in real life :) Revised patchset is attached. It adds and adjusts commit messages, comments and does other cosmetic improvements. I think 0001 and 0002 are well reviewed already. And these patches are usable not only for jsonpath .datetime(), but contain improvements for existing to_date()/to_timestamp() SQL functions. I'm going to push these two if no objections. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0002-Support-for-SSSSS-datetime-format-pattern-6.patch
- 0001-Support-for-FF1-FF6-datetime-format-patterns-6.patch
- 0004-Implement-standard-datetime-parsing-mode-6.patch
- 0003-Introduce-RRRR-and-RR-revise-YYY-YY-and-Y-datetime-6.patch
- 0005-Implement-parse_datetime-function-6.patch
- 0008-Implement-jsonpath-.datetime-method-6.patch
- 0006-Error-suppression-support-for-upcoming-jsonpath-.d-6.patch
- 0007-Allow-datetime-values-in-JsonbValue-6.patch
On Sat, Sep 14, 2019 at 10:18 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Revised patchset is attached. It still requires some polishing. But > > the most doubtful part is handling of RR, YYY, YY and Y. > > > > Standard requires us to complete YYY, YY and Y with high digits from > > current year. So, if YY matches 99, then year should be 2099, not > > 1999. > > > > For RR, standard requirements are relaxed. Implementation may choose > > matching year from range [current_year - 100; current_year + 100]. It > > looks reasonable to handle RR in the same way we currently handle YY: > > select appropriate year in [1970; 2069] range. It seems like we > > select this range to start in the same point as unix timestamp. But > > nowadays it still looks reasonable: it's about +- 50 from current > > year. So, years close to the current one are likely completed > > correctly. In Oracle RR returns year in [1950; 1949] range. So, it > > seems to be designed near 2000 :). I don't think we need to copy this > > behavior. > > > > Handling YYY and YY in standard way seems quite easy. We can complete > > them as 2YYY and 20YY. This should be standard conforming till 2100. > > > > But handling Y looks problematic. Immutable way of handling this > > would work only for decade. Current code completes Y as 200Y and it > > looks pretty "outdated" now in 2019. Using current real year would > > make conversion timestamp-dependent. This property doesn't look favor > > for to_date()/to_timestamp() and unacceptable for immutable jsonpath > > functions (but we can forbid using Y pattern there). Current patch > > complete Y as 202Y assuming v13 will be released in 2020. But I'm not > > sure what is better solution here. The bright side is that I haven't > > seen anybody use Y patten in real life :) > > Revised patchset is attached. It adds and adjusts commit messages, > comments and does other cosmetic improvements. > > I think 0001 and 0002 are well reviewed already. And these patches > are usable not only for jsonpath .datetime(), but contain improvements > for existing to_date()/to_timestamp() SQL functions. I'm going to > push these two if no objections. Those two patches are pushed. Just before commit I've renamed deciseconds to "tenths of seconds", sentiseconds to "hundredths of seconds" as discussed before [1]. The rest of patchset is attached. Links 1. https://www.postgresql.org/message-id/0409fb42-18d3-bdb7-37ab-d742d5313a40%402ndQuadrant.com ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0002-Implement-standard-datetime-parsing-mode-7.patch
- 0003-Implement-parse_datetime-function-7.patch
- 0001-Introduce-RRRR-and-RR-revise-YYY-YY-and-Y-datetime-7.patch
- 0004-Error-suppression-support-for-upcoming-jsonpath-.d-7.patch
- 0005-Allow-datetime-values-in-JsonbValue-7.patch
- 0006-Implement-jsonpath-.datetime-method-7.patch
On Mon, Sep 16, 2019 at 10:05 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Sat, Sep 14, 2019 at 10:18 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > Revised patchset is attached. It still requires some polishing. But > > > the most doubtful part is handling of RR, YYY, YY and Y. > > > > > > Standard requires us to complete YYY, YY and Y with high digits from > > > current year. So, if YY matches 99, then year should be 2099, not > > > 1999. > > > > > > For RR, standard requirements are relaxed. Implementation may choose > > > matching year from range [current_year - 100; current_year + 100]. It > > > looks reasonable to handle RR in the same way we currently handle YY: > > > select appropriate year in [1970; 2069] range. It seems like we > > > select this range to start in the same point as unix timestamp. But > > > nowadays it still looks reasonable: it's about +- 50 from current > > > year. So, years close to the current one are likely completed > > > correctly. In Oracle RR returns year in [1950; 1949] range. So, it > > > seems to be designed near 2000 :). I don't think we need to copy this > > > behavior. > > > > > > Handling YYY and YY in standard way seems quite easy. We can complete > > > them as 2YYY and 20YY. This should be standard conforming till 2100. > > > > > > But handling Y looks problematic. Immutable way of handling this > > > would work only for decade. Current code completes Y as 200Y and it > > > looks pretty "outdated" now in 2019. Using current real year would > > > make conversion timestamp-dependent. This property doesn't look favor > > > for to_date()/to_timestamp() and unacceptable for immutable jsonpath > > > functions (but we can forbid using Y pattern there). Current patch > > > complete Y as 202Y assuming v13 will be released in 2020. But I'm not > > > sure what is better solution here. The bright side is that I haven't > > > seen anybody use Y patten in real life :) > > > > Revised patchset is attached. It adds and adjusts commit messages, > > comments and does other cosmetic improvements. > > > > I think 0001 and 0002 are well reviewed already. And these patches > > are usable not only for jsonpath .datetime(), but contain improvements > > for existing to_date()/to_timestamp() SQL functions. I'm going to > > push these two if no objections. > > Those two patches are pushed. Just before commit I've renamed > deciseconds to "tenths of seconds", sentiseconds to "hundredths of > seconds" as discussed before [1]. > > The rest of patchset is attached. I've reordered the patchset. I moved the most debatable patch, which introduces RRRR and RR and changes parsing of YYY, YY and Y to the end. I think we have enough of time in this release cycle to decide whether we want this. Patches 0001-0005 looks quite mature for me. I'm going to push this if no objections. After pushing them, I'm going to start discussion related to RR, YY and friends in separate thread. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0003-Error-suppression-support-for-upcoming-jsonpath-datetime-8.patch
- 0004-Allow-datetime-values-in-JsonbValue-8.patch
- 0001-Implement-standard-datetime-parsing-mode-8.patch
- 0002-Implement-parse_datetime-function-8.patch
- 0005-Implement-jsonpath-.datetime-method-8.patch
- 0006-Introduce-RRRR-and-RR-revise-YYY-YY-and-Y-datetime-8.patch
On Mon, Sep 23, 2019 at 10:05 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > I've reordered the patchset. I moved the most debatable patch, which > introduces RRRR and RR and changes parsing of YYY, YY and Y to the > end. I think we have enough of time in this release cycle to decide > whether we want this. > > Patches 0001-0005 looks quite mature for me. I'm going to push this > if no objections. After pushing them, I'm going to start discussion > related to RR, YY and friends in separate thread. Pushed. Remaining patch is attached. I'm going to start the separate thread with its detailed explanation. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 25.09.2019 22:55, Alexander Korotkov wrote: > On Mon, Sep 23, 2019 at 10:05 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >> I've reordered the patchset. I moved the most debatable patch, which >> introduces RRRR and RR and changes parsing of YYY, YY and Y to the >> end. I think we have enough of time in this release cycle to decide >> whether we want this. >> >> Patches 0001-0005 looks quite mature for me. I'm going to push this >> if no objections. After pushing them, I'm going to start discussion >> related to RR, YY and friends in separate thread. > Pushed. Remaining patch is attached. I'm going to start the separate > thread with its detailed explanation. Attached patch with refactoring of compareDatetime() according to the complaints of Tom Lane in [1]: * extracted four subroutines for type conversions * extracted subroutine for error reporting * added default cases to all switches * have_error flag is expected to be not-NULL always * fixed errhint() message style [1] https://www.postgresql.org/message-id/32308.1569455803%40sss.pgh.pa.us -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company