Thread: Support for jsonpath .datetime() method

Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
Hi!

Attached patchset implements jsonpath .datetime() method.

 * 0001-datetime-in-JsonbValue-1.patch
This patch allows JsonbValue struct to hold datetime values.  It
appears to be convenient since jsonpath execution engine uses
JsonbValue to store intermediate calculation results.  On
serialization datetime values are converted into strings.

 * 0002-datetime-conversion-for-jsonpath-1.patch
This patch adds some datetime conversion infrastructure missing
according to SQL/JSON standard. It includes FF1-FF6 format patterns,
runtime identification of datetime type, strict parsing mode.

 * 0003-error-suppression-for-datetime-1.path
As jsonpath supports error suppression in general, it's required for
datetime functions too.  This commit implements it in the same manner
as we did for numerics before.

 * 0004-implement-jsonpath-datetime-1.path
.datetime() method itself and additionally comparison of datetime
values.  Here goes a trick.  Out exising jsonb_path_*() functions are
immutable, while comparison of timezoned and non-timezoned type is
obviously not.  This patch makes existing immutable jsonb_path_*()
functions throw error on non-immutable comparison.  Additionally it
implements stable jsonb_path_*_tz() functions, which support full set
of features.

I was going to discuss this patchset among the other SQL/JSON problems
on PGCon unconference, but I didn't make it there.  I found most
questionable point in this patchset to be two sets of functions:
immutable and stable.  However, I don't see better solution here: we
need immutable functions for expression indexes, and also we need
function with full set of jsonpath features, which are not all
immutable.

Sometimes immutability of jsonpath expression could be determined
runtime.  When .datetime() method is used with template string
argument we may know result type in advance.  Thus, in some times we
may know in advance that given jsonpath is immutable.  So, we could
hack contain_mutable_functions_checker() or something to make an
exclusive heuristics for jsonb_path_*() functions.  But I think it's
better to go with jsonb_path_*() and jsonb_path_*_tz() variants for
now.  We could come back to idea of heuristics during consideration of
standard SQL/JSON clauses.

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Tue, May 28, 2019 at 8:55 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Attached patchset implements jsonpath .datetime() method.

Revised patchset is attached.  Some inconsistencies around
parse_datetime() function are fixed.  Rebased to current master as
well.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Mon, Jul 1, 2019 at 7:28 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, May 28, 2019 at 8:55 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Attached patchset implements jsonpath .datetime() method.
>
> Revised patchset is attached.  Some inconsistencies around
> parse_datetime() function are fixed.  Rebased to current master as
> well.

I found commitfest.cputube.org is unhappy with this patchset because
of gcc warning.  Fixed in attached patchset.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Anastasia Lubennikova
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            not tested

Hi,

In general, the feature looks good. It is consistent with the standard and the code around.
It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented.

Here are also minor questions on implementation and code style:

1) +                    case jbvDatetime:
                         elog(ERROR, "unexpected jbvBinary value");
We should use separate error message for jvbDatetime here.

2)  +                *jentry = JENTRY_ISSTRING | len;
Here we can avoid using JENTRY_ISSTRING since it defined to 0x0. 
I propose to do so to be consistent with jbvString case.

3) 
+ * Default time-zone for tz types is specified with 'tzname'.  If 'tzname' is
+ * NULL and the input string does not contain zone components then "missing tz"
+ * error is thrown.
+ */
+Datum
+parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid,
+               int32 *typmod, int *tz)

The comment about 'tzname' is outdated.

4) Some typos:

+ * Convinience macros for error handling
>  * Convenience macros for error handling

+ * Two macros below helps handling errors in functions, which takes
> * Two macros below help to handle errors in functions, which take

5) + * RETURN_ERROR() macro intended to wrap ereport() calls.  When have_error
+ * argument is provided, then instead of ereport'ing we set *have_error flag 

have_error is not a macro argument, so I suggest to rephrase this comment.

Shouldn't we pass have_error explicitly?
In case someone will change the name of the variable, this macro will work incorrectly.

6) * When no argument is supplied, first fitting ISO format is selected.
+        /* Try to recognize one of ISO formats. */
+        static const char *fmt_str[] =
+        {
+            "yyyy-mm-dd HH24:MI:SS TZH:TZM",
+            "yyyy-mm-dd HH24:MI:SS TZH",
+            "yyyy-mm-dd HH24:MI:SS",
+            "yyyy-mm-dd",
+            "HH24:MI:SS TZH:TZM",
+            "HH24:MI:SS TZH",
+            "HH24:MI:SS"
+        };

How do we choose the order of formats to check? Is it in standard?
Anyway, I think this struct needs a comment that explains that changing of order can affect end-user.

7) +        if (res == jperNotFound)
+            RETURN_ERROR(ereport(ERROR,
+                                 (errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION),
+                                  errmsg("invalid argument for SQL/JSON datetime function"),
+                                  errdetail("unrecognized datetime format"),
+                                  errhint("use datetime template argument for explicit format specification"))));

The hint is confusing. If I understand correctly, no-arg datetime function supports all formats,
so if parsing failed, it must be an invalid argument and providing format explicitly won't help.

The new status of this patch is: Waiting on Author

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
Hi!

Thank you for the review!

Revised version of patch is attached.

On Mon, Jul 15, 2019 at 3:57 PM Anastasia Lubennikova
<lubennikovaav@gmail.com> wrote:
> In general, the feature looks good. It is consistent with the standard and the code around.
> It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented.

Documentation is added for both jsonpath .datetime() method and SQL
jsonb_path_*_tz() functions.

> Here are also minor questions on implementation and code style:
>
> 1) +                    case jbvDatetime:
>                          elog(ERROR, "unexpected jbvBinary value");
> We should use separate error message for jvbDatetime here.

Fixed.

> 2)  +                *jentry = JENTRY_ISSTRING | len;
> Here we can avoid using JENTRY_ISSTRING since it defined to 0x0.
> I propose to do so to be consistent with jbvString case.

Fixed.

> 3)
> + * Default time-zone for tz types is specified with 'tzname'.  If 'tzname' is
> + * NULL and the input string does not contain zone components then "missing tz"
> + * error is thrown.
> + */
> +Datum
> +parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid,
> +               int32 *typmod, int *tz)
>
> The comment about 'tzname' is outdated.

Fixed.

> 4) Some typos:
>
> + * Convinience macros for error handling
> >  * Convenience macros for error handling
>
> + * Two macros below helps handling errors in functions, which takes
> > * Two macros below help to handle errors in functions, which take

Fixed.

> 5) + * RETURN_ERROR() macro intended to wrap ereport() calls.  When have_error
> + * argument is provided, then instead of ereport'ing we set *have_error flag
>
> have_error is not a macro argument, so I suggest to rephrase this comment.
>
> Shouldn't we pass have_error explicitly?
> In case someone will change the name of the variable, this macro will work incorrectly.

Comment about RETURN_ERROR() is updated.  RETURN_ERROR() is just
file-wide macro.  So I think in this case it's ok to pass *have_error
flag implicitly for the sake of brevity.

> 6) * When no argument is supplied, first fitting ISO format is selected.
> +        /* Try to recognize one of ISO formats. */
> +        static const char *fmt_str[] =
> +        {
> +            "yyyy-mm-dd HH24:MI:SS TZH:TZM",
> +            "yyyy-mm-dd HH24:MI:SS TZH",
> +            "yyyy-mm-dd HH24:MI:SS",
> +            "yyyy-mm-dd",
> +            "HH24:MI:SS TZH:TZM",
> +            "HH24:MI:SS TZH",
> +            "HH24:MI:SS"
> +        };
>
> How do we choose the order of formats to check? Is it in standard?
> Anyway, I think this struct needs a comment that explains that changing of order can affect end-user.

Yes, standard defines which order we should try datetime types (and
corresponding ISO formats).  I've updated respectively array, its
comment and docs.

> 7) +            if (res == jperNotFound)
> +                       RETURN_ERROR(ereport(ERROR,
> +
(errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION),
> +                                                                 errmsg("invalid argument for SQL/JSON datetime
function"),
> +                                                                 errdetail("unrecognized datetime format"),
> +                                                                 errhint("use datetime template argument for
explicitformat specification"))));
 
>
> The hint is confusing. If I understand correctly, no-arg datetime function supports all formats,
> so if parsing failed, it must be an invalid argument and providing format explicitly won't help.

Custom format string may define format not enumerated in fmt_str[].
For instance, imagine "dd.mm.yyyy".  In some cases custom format
string can fix the error.  So, ISTM hint is OK.

I'm setting this back to "Needs review" waiting for either you or
Peter Eisentraut provide additional review.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Liudmila Mantrova
Date:
On 7/16/19 6:41 AM, Alexander Korotkov wrote:
> Hi!
>
> Thank you for the review!
>
> Revised version of patch is attached.
>
> On Mon, Jul 15, 2019 at 3:57 PM Anastasia Lubennikova
> <lubennikovaav@gmail.com> wrote:
>> In general, the feature looks good. It is consistent with the standard and the code around.
>> It definitely needs more documentation - datetime() and new jsonb_path_*_tz() functions are not documented.
> Documentation is added for both jsonpath .datetime() method and SQL
> jsonb_path_*_tz() functions.
>
>> Here are also minor questions on implementation and code style:
>>
>> 1) +                    case jbvDatetime:
>>                           elog(ERROR, "unexpected jbvBinary value");
>> We should use separate error message for jvbDatetime here.
> Fixed.
>
>> 2)  +                *jentry = JENTRY_ISSTRING | len;
>> Here we can avoid using JENTRY_ISSTRING since it defined to 0x0.
>> I propose to do so to be consistent with jbvString case.
> Fixed.
>
>> 3)
>> + * Default time-zone for tz types is specified with 'tzname'.  If 'tzname' is
>> + * NULL and the input string does not contain zone components then "missing tz"
>> + * error is thrown.
>> + */
>> +Datum
>> +parse_datetime(text *date_txt, text *fmt, bool strict, Oid *typid,
>> +               int32 *typmod, int *tz)
>>
>> The comment about 'tzname' is outdated.
> Fixed.
>
>> 4) Some typos:
>>
>> + * Convinience macros for error handling
>>>   * Convenience macros for error handling
>> + * Two macros below helps handling errors in functions, which takes
>>> * Two macros below help to handle errors in functions, which take
> Fixed.
>
>> 5) + * RETURN_ERROR() macro intended to wrap ereport() calls.  When have_error
>> + * argument is provided, then instead of ereport'ing we set *have_error flag
>>
>> have_error is not a macro argument, so I suggest to rephrase this comment.
>>
>> Shouldn't we pass have_error explicitly?
>> In case someone will change the name of the variable, this macro will work incorrectly.
> Comment about RETURN_ERROR() is updated.  RETURN_ERROR() is just
> file-wide macro.  So I think in this case it's ok to pass *have_error
> flag implicitly for the sake of brevity.
>
>> 6) * When no argument is supplied, first fitting ISO format is selected.
>> +        /* Try to recognize one of ISO formats. */
>> +        static const char *fmt_str[] =
>> +        {
>> +            "yyyy-mm-dd HH24:MI:SS TZH:TZM",
>> +            "yyyy-mm-dd HH24:MI:SS TZH",
>> +            "yyyy-mm-dd HH24:MI:SS",
>> +            "yyyy-mm-dd",
>> +            "HH24:MI:SS TZH:TZM",
>> +            "HH24:MI:SS TZH",
>> +            "HH24:MI:SS"
>> +        };
>>
>> How do we choose the order of formats to check? Is it in standard?
>> Anyway, I think this struct needs a comment that explains that changing of order can affect end-user.
> Yes, standard defines which order we should try datetime types (and
> corresponding ISO formats).  I've updated respectively array, its
> comment and docs.
>
>> 7) +            if (res == jperNotFound)
>> +                       RETURN_ERROR(ereport(ERROR,
>> +
(errcode(ERRCODE_INVALID_ARGUMENT_FOR_JSON_DATETIME_FUNCTION),
>> +                                                                 errmsg("invalid argument for SQL/JSON datetime
function"),
>> +                                                                 errdetail("unrecognized datetime format"),
>> +                                                                 errhint("use datetime template argument for
explicitformat specification"))));
 
>>
>> The hint is confusing. If I understand correctly, no-arg datetime function supports all formats,
>> so if parsing failed, it must be an invalid argument and providing format explicitly won't help.
> Custom format string may define format not enumerated in fmt_str[].
> For instance, imagine "dd.mm.yyyy".  In some cases custom format
> string can fix the error.  So, ISTM hint is OK.
>
> I'm setting this back to "Needs review" waiting for either you or
> Peter Eisentraut provide additional review.
>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

Hi Alexander,

I had look at the added docs and would like to suggest a couple of 
changes. Please see the attached patches with my my edits for func.sgml 
and some of the comments.

Looks like we also need to change the following entry in 
features-unsupported.sgml, and probably move it to features-supported.sgml?

  <row>
   <entry>T832</entry>
   <entry></entry>
   <entry>SQL/JSON path language: item method</entry>
   <entry>datetime() not yet implemented</entry>
  </row>

-- 
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
Hi, Liudmila!

On Fri, Jul 19, 2019 at 5:30 PM Liudmila Mantrova
<l.mantrova@postgrespro.ru> wrote:
> I had look at the added docs and would like to suggest a couple of
> changes. Please see the attached patches with my my edits for func.sgml
> and some of the comments.

Thank you for your edits, they look good to me.  Attached patchset
contains your edits as well as revised commit messages.

> Looks like we also need to change the following entry in
> features-unsupported.sgml, and probably move it to features-supported.sgml?
>
>   <row>
>    <entry>T832</entry>
>    <entry></entry>
>    <entry>SQL/JSON path language: item method</entry>
>    <entry>datetime() not yet implemented</entry>
>   </row>

Yes, that's it.  Attached patch updates sql_features.txt, which is a
source for generation of both features-unsupported.sgml and
features-supported.sgml.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Peter Eisentraut
Date:
I think the best way forward here is to focus first on patch 0002 and
get the additional format templates in, independent of any surrounding
JSON functionality.

In particular, remove parse_datetime() and all the related API changes,
then it becomes much simpler.

The codes FF1..FF6 that you added appear to be correct, but reading the
spec I find there is more missing, specifically

- RRRR and RR
- SSSSS (currently only SSSS is supported, but that's not standard)

Also in some cases we allow timestamps with seven digits of fractional
precision, so perhaps FF7 should be supported as well.  I'm not quite
sure about the details here.  You tests only cover 6 and 9 digits.  It
would be good to cover 7 and perhaps 8 as well, since those are the
boundary cases.

Some concrete pieces of review:

+       <row>
+        <entry><literal>FF1</literal></entry>
+        <entry>decisecond (0-9)</entry>
+       </row>

Let's not use such weird terms as "deciseconds".  We could say
"fractional seconds, 1 digit" etc. or something like that.

+/* Return flags for DCH_from_char() */
+#define DCH_DATED  0x01
+#define DCH_TIMED  0x02
+#define DCH_ZONED  0x04

I think you mean do_to_timestamp() here.  These terms "dated" etc. are
from the SQL standard text, but they should be explained somewhere for
the readers of the code.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Support for jsonpath .datetime() method

From
Nikita Glukhov
Date:

On 23.07.2019 16:44, Peter Eisentraut wrote:

I think the best way forward here is to focus first on patch 0002 and
get the additional format templates in, independent of any surrounding
JSON functionality.

In particular, remove parse_datetime() and all the related API changes,
then it becomes much simpler.

The codes FF1..FF6 that you added appear to be correct, but reading the
spec I find there is more missing, specifically

- RRRR and RR
It seems that our YY works like RR should:
SELECT to_date('69', 'YY'); to_date   
------------2069-01-01
(1 row)

SELECT to_date('70', 'YY'); to_date   
------------1970-01-01
(1 row)

But by the standard first two digits of current year should be used in YY.


Oracle follows the standard but its implementation has the different 
rounding algorithm:

SELECT TO_CHAR(TO_DATE('99', 'YY'), 'YYYY') from dual;
2099

SELECT TO_CHAR(TO_DATE('49', 'RR'), 'YYYY') from dual;
2049

SELECT TO_CHAR(TO_DATE('50', 'RR'), 'YYYY') from dual;
1950


So it's unclear what we should do: - implement YY and RR strictly following the standard only in .datetime()- fix YY implementation in to_date()/to_timestamp() and implement RR- use our non-standard templates in .datetime()

- SSSSS (currently only SSSS is supported, but that's not standard)
SSSSS template can be easily added as alias to SSSS. 

Also in some cases we allow timestamps with seven digits of fractional
precision, so perhaps FF7 should be supported as well.  I'm not quite
sure about the details here.  You tests only cover 6 and 9 digits.  It
would be good to cover 7 and perhaps 8 as well, since those are the
boundary cases.
FF7-FF9 weer present in earlier versions of the jsonpath patches, but they
had been removed (see [1]) because they were not completely supported due
to the limited precision of timestamp.

Some concrete pieces of review:

+       <row>
+        <entry><literal>FF1</literal></entry>
+        <entry>decisecond (0-9)</entry>
+       </row>

Let's not use such weird terms as "deciseconds".  We could say
"fractional seconds, 1 digit" etc. or something like that.
And what about "tenths of seconds", "hundredths of seconds"?
+/* Return flags for DCH_from_char() */
+#define DCH_DATED  0x01
+#define DCH_TIMED  0x02
+#define DCH_ZONED  0x04

I think you mean do_to_timestamp() here.  These terms "dated" etc. are
from the SQL standard text, but they should be explained somewhere for
the readers of the code.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Wed, Jul 24, 2019 at 1:50 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:
> So it's unclear what we should do:
>  - implement YY and RR strictly following the standard only in .datetime()
>  - fix YY implementation in to_date()/to_timestamp() and implement RR
>  - use our non-standard templates in .datetime()

Also it appears that according to standard .datetime() should treat
spaces and delimiters differently than our to_date()/to_timestamp().
It requires strict matching of spaces and delimiters in input and
format strings.  We don't have such behavior in both non-FX and FX
modes.  Also, standard doesn't define FX mode at all.  This rules
cover jsonpath .datetime() method and CAST(... FORMAT ...) – new cast
clause defined by standard.

So, I think due to reasons of compatibility it doesn't worth trying to
make behavior of our to_date()/to_timestamp() to fit requirements for
jsonpath .datetime() and CAST(... FORMAT ...).  I propose to leave
this functions as is (maybe add new patterns), but introduce another
datetime parsing mode, which would fit to the standard.  Opinions?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Support for jsonpath .datetime() method

From
Peter Eisentraut
Date:
On 2019-07-24 00:48, Nikita Glukhov wrote:
> It seems that our YY works like RR should:
> 
> SELECT to_date('69', 'YY');
>   to_date   
> ------------
>  2069-01-01
> (1 row)
> 
> SELECT to_date('70', 'YY');
>   to_date   
> ------------
>  1970-01-01
> (1 row)
> 
> But by the standard first two digits of current year should be used in YY.

Is this behavior even documented anywhere in our documentation?  I
couldn't find it.  What's the exact specification of what it does in
these cases?

> So it's unclear what we should do: 
>  - implement YY and RR strictly following the standard only in .datetime()
>  - fix YY implementation in to_date()/to_timestamp() and implement RR
>  - use our non-standard templates in .datetime()

I think we definitely should try to use the same template system in both
the general functions and in .datetime().  This might involve some
compromises between existing behavior, Oracle behavior, SQL standard.
So far I'm not worried: If you're using two-digit years like above,
you're playing with fire anyway.  Also some of the other cases like
dealing with trailing spaces are probably acceptable as slight
incompatibilities or extensions.

We should collect a list of test cases that illustrate the differences
and then work out how to deal with them.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Support for jsonpath .datetime() method

From
Andrew Dunstan
Date:
On 7/24/19 4:25 PM, Peter Eisentraut wrote:
> On 2019-07-24 00:48, Nikita Glukhov wrote:
>> It seems that our YY works like RR should:
>>
>> SELECT to_date('69', 'YY');
>>   to_date   
>> ------------
>>  2069-01-01
>> (1 row)
>>
>> SELECT to_date('70', 'YY');
>>   to_date   
>> ------------
>>  1970-01-01
>> (1 row)
>>
>> But by the standard first two digits of current year should be used in YY.
> Is this behavior even documented anywhere in our documentation?  I
> couldn't find it.  What's the exact specification of what it does in
> these cases?
>
>> So it's unclear what we should do: 
>>  - implement YY and RR strictly following the standard only in .datetime()
>>  - fix YY implementation in to_date()/to_timestamp() and implement RR
>>  - use our non-standard templates in .datetime()
> I think we definitely should try to use the same template system in both
> the general functions and in .datetime().



Agreed. It's too hard to maintain otherwise.


>   This might involve some
> compromises between existing behavior, Oracle behavior, SQL standard.
> So far I'm not worried: If you're using two-digit years like above,
> you're playing with fire anyway.  Also some of the other cases like
> dealing with trailing spaces are probably acceptable as slight
> incompatibilities or extensions.


My instict wouyld be to move as close as possible to the standard,
especially if the current behaviour isn't documented.


>
> We should collect a list of test cases that illustrate the differences
> and then work out how to deal with them.
>


Agreed.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Support for jsonpath .datetime() method

From
Andrew Dunstan
Date:
On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> Some concrete pieces of review:
>> +       <row>
>> +        <entry><literal>FF1</literal></entry>
>> +        <entry>decisecond (0-9)</entry>
>> +       </row>
>>
>> Let's not use such weird terms as "deciseconds".  We could say
>> "fractional seconds, 1 digit" etc. or something like that.
> And what about "tenths of seconds", "hundredths of seconds"?



Yes, those are much better.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Support for jsonpath .datetime() method

From
Thomas Munro
Date:
On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
> On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> > Some concrete pieces of review:
> >> +       <row>
> >> +        <entry><literal>FF1</literal></entry>
> >> +        <entry>decisecond (0-9)</entry>
> >> +       </row>
> >>
> >> Let's not use such weird terms as "deciseconds".  We could say
> >> "fractional seconds, 1 digit" etc. or something like that.
> > And what about "tenths of seconds", "hundredths of seconds"?
>
> Yes, those are much better.

I've moved this to the September CF, still in "Waiting on Author" state.

-- 
Thomas Munro
https://enterprisedb.com



Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com> wrote:
> > On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> > > Some concrete pieces of review:
> > >> +       <row>
> > >> +        <entry><literal>FF1</literal></entry>
> > >> +        <entry>decisecond (0-9)</entry>
> > >> +       </row>
> > >>
> > >> Let's not use such weird terms as "deciseconds".  We could say
> > >> "fractional seconds, 1 digit" etc. or something like that.
> > > And what about "tenths of seconds", "hundredths of seconds"?
> >
> > Yes, those are much better.
>
> I've moved this to the September CF, still in "Waiting on Author" state.

I'd like to summarize differences between standard datetime parsing
and our to_timestamp()/to_date().

1) Standard defines much less datetime template parts.  Namely it defines:
YYYY | YYY | YY | Y
RRRR | RR
MM
DD
DDD
HH | HH12
HH24
MI
SS
SSSSS
FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
A.M. | P.M.
TZH
TZM

We support majority of them and much more.  Incompatibilities are:
 * SSSS (our name is SSSSS),
 * We don't support RRRR | RR,
 * Our handling of YYYY | YYY | YY | Y is different.  What we have
here is more like RRRR | RR in standard (Nikita explained that
upthread [1]),
 * We don't support FF[1-9].  FF[1-6] are implemented in patch.  We
can't support FF[7-9], because our binary representation of timestamp
datatype don't have enough of precision.

2) Standard defines only following delimiters: <minus sign>, <period>,
<solidus>, <comma>, <apostrophe>, <semicolon>, <colon>, <space>.  And
it requires strict matching of separators between template and input
strings.  We don't do so either in FX or non-FX mode.

For instance, we allow both to_date('2019/12/31', 'YYYY-MM-DD') and
to_date('2019/12/31', 'FXYYYY-MM-DD').  But according to standard this
date should be written only as '2019-12-31' to match given template
string.

3) Standard prescribes recognition of digits according to \p{Nd}
regex.  \p{Nd} matches to "a digit zero through nine in any script
except ideographic scripts".  As far as I remember, we currently do
recognize only ASCII digits.

4) For non-delimited template parts standard requires matching to
digit sequences of lengths between 1 and maximum number of characters
of that template part.  We don't always do so.  For instance, we allow
more than 4 digits to correspond to YYYY, more than 3 digits to
correspond to YYY and so on.

# select to_date('2019-12-31', 'YYY-MM-DD');
  to_date
------------
 2019-12-31
(1 row)

Links.

1. https://www.postgresql.org/message-id/d6efab15-f3a4-40d6-8ddb-6fd8f64cbc08%40postgrespro.ru

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Tue, Aug 13, 2019 at 12:08 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan
> > <andrew.dunstan@2ndquadrant.com> wrote:
> > > On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> > > > Some concrete pieces of review:
> > > >> +       <row>
> > > >> +        <entry><literal>FF1</literal></entry>
> > > >> +        <entry>decisecond (0-9)</entry>
> > > >> +       </row>
> > > >>
> > > >> Let's not use such weird terms as "deciseconds".  We could say
> > > >> "fractional seconds, 1 digit" etc. or something like that.
> > > > And what about "tenths of seconds", "hundredths of seconds"?
> > >
> > > Yes, those are much better.
> >
> > I've moved this to the September CF, still in "Waiting on Author" state.
>
> I'd like to summarize differences between standard datetime parsing
> and our to_timestamp()/to_date().

Let me describe my proposal to overcome these differences.

> 1) Standard defines much less datetime template parts.  Namely it defines:
> YYYY | YYY | YY | Y
> RRRR | RR
> MM
> DD
> DDD
> HH | HH12
> HH24
> MI
> SS
> SSSSS
> FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
> A.M. | P.M.
> TZH
> TZM
>
> We support majority of them and much more.

Regarding non-contradicting template parts we can support them in
.datetime() method too.  That would be our extension to standard.  See
no problem here.

> Incompatibilities are:
>  * SSSS (our name is SSSSS),

Since SSSS is not reserved, I'd propose to make SSSS an alias for SSSSS.

>  * We don't support RRRR | RR,
>  * Our handling of YYYY | YYY | YY | Y is different.  What we have
> here is more like RRRR | RR in standard (Nikita explained that
> upthread [1]),

I'd like to make YYYY | YYY | YY | Y and RRRR | RR behavior standard
conforming in both to_timestamp()/to_date() and .datetime().  Handling
these template parts differently in different functions would be
confusing for users.

>  * We don't support FF[1-9].  FF[1-6] are implemented in patch.  We
> can't support FF[7-9], because our binary representation of timestamp
> datatype don't have enough of precision.

I propose to postpone implementation of FF[7-9].  We can support them
later once we have precise enough datatypes.

> 2) Standard defines only following delimiters: <minus sign>, <period>,
> <solidus>, <comma>, <apostrophe>, <semicolon>, <colon>, <space>.  And
> it requires strict matching of separators between template and input
> strings.  We don't do so either in FX or non-FX mode.
>
> For instance, we allow both to_date('2019/12/31', 'YYYY-MM-DD') and
> to_date('2019/12/31', 'FXYYYY-MM-DD').  But according to standard this
> date should be written only as '2019-12-31' to match given template
> string.
>
> 4) For non-delimited template parts standard requires matching to
> digit sequences of lengths between 1 and maximum number of characters
> of that template part.  We don't always do so.  For instance, we allow
> more than 4 digits to correspond to YYYY, more than 3 digits to
> correspond to YYY and so on.
>
> # select to_date('2019-12-31', 'YYY-MM-DD');
>   to_date
> ------------
>  2019-12-31
> (1 row)

In order to implement these I'd like to propose introduction of
special do_to_timestamp() flag, which would define standard conforming
parsing.  This flag would be used in .datetime() jsonpath method.
Later we also should use it for CAST(... FORMAT ...) expression, which
should also do standard conforming parsing

> 3) Standard prescribes recognition of digits according to \p{Nd}
> regex.  \p{Nd} matches to "a digit zero through nine in any script
> except ideographic scripts".  As far as I remember, we currently do
> recognize only ASCII digits.

Support all unicode digit scripts would be cool for both
to_timestamp()/to_date() and standard parsing.  However, I think this
could be postponed.  Personally I didn't meet non-ascii digits in
databases yet.  If needed one can implement this later, shouldn't be
hard.

If no objections, Nikita and me will work on revised patchset based on
this proposal.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Mon, Aug 19, 2019 at 1:29 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> If no objections, Nikita and me will work on revised patchset based on
> this proposal.

Revised patchset is attached.  It still requires some polishing.  But
the most doubtful part is handling of RR, YYY, YY and Y.

Standard requires us to complete YYY, YY and Y with high digits from
current year.  So, if YY matches 99, then year should be 2099, not
1999.

For RR, standard requirements are relaxed.  Implementation may choose
matching year from range [current_year - 100; current_year + 100].  It
looks reasonable to handle RR in the same way we currently handle YY:
select appropriate year in [1970; 2069] range.  It seems like we
select this range to start in the same point as unix timestamp.  But
nowadays it still looks reasonable: it's about +- 50 from current
year.  So, years close to the current one are likely completed
correctly.  In Oracle RR returns year in [1950; 1949] range.  So, it
seems to be designed near 2000 :). I don't think we need to copy this
behavior.

Handling YYY and YY in standard way seems quite easy.  We can complete
them as 2YYY and 20YY.  This should be standard conforming till 2100.

But handling Y looks problematic.  Immutable way of handling this
would work only for decade.  Current code completes Y as 200Y and it
looks pretty "outdated" now in 2019.  Using current real year would
make conversion timestamp-dependent.  This property doesn't look favor
for to_date()/to_timestamp() and unacceptable for immutable jsonpath
functions (but we can forbid using Y pattern there).  Current patch
complete Y as 202Y assuming v13 will be released in 2020.  But I'm not
sure what is better solution here.  The bright side is that I haven't
seen anybody use Y patten in real life :)





------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Revised patchset is attached.  It still requires some polishing.  But
> the most doubtful part is handling of RR, YYY, YY and Y.
>
> Standard requires us to complete YYY, YY and Y with high digits from
> current year.  So, if YY matches 99, then year should be 2099, not
> 1999.
>
> For RR, standard requirements are relaxed.  Implementation may choose
> matching year from range [current_year - 100; current_year + 100].  It
> looks reasonable to handle RR in the same way we currently handle YY:
> select appropriate year in [1970; 2069] range.  It seems like we
> select this range to start in the same point as unix timestamp.  But
> nowadays it still looks reasonable: it's about +- 50 from current
> year.  So, years close to the current one are likely completed
> correctly.  In Oracle RR returns year in [1950; 1949] range.  So, it
> seems to be designed near 2000 :). I don't think we need to copy this
> behavior.
>
> Handling YYY and YY in standard way seems quite easy.  We can complete
> them as 2YYY and 20YY.  This should be standard conforming till 2100.
>
> But handling Y looks problematic.  Immutable way of handling this
> would work only for decade.  Current code completes Y as 200Y and it
> looks pretty "outdated" now in 2019.  Using current real year would
> make conversion timestamp-dependent.  This property doesn't look favor
> for to_date()/to_timestamp() and unacceptable for immutable jsonpath
> functions (but we can forbid using Y pattern there).  Current patch
> complete Y as 202Y assuming v13 will be released in 2020.  But I'm not
> sure what is better solution here.  The bright side is that I haven't
> seen anybody use Y patten in real life :)

Revised patchset is attached.  It adds and adjusts commit messages,
comments and does other cosmetic improvements.

I think 0001 and 0002 are well reviewed already.  And these patches
are usable not only for jsonpath .datetime(), but contain improvements
for existing to_date()/to_timestamp() SQL functions.  I'm going to
push these two if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Sat, Sep 14, 2019 at 10:18 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Revised patchset is attached.  It still requires some polishing.  But
> > the most doubtful part is handling of RR, YYY, YY and Y.
> >
> > Standard requires us to complete YYY, YY and Y with high digits from
> > current year.  So, if YY matches 99, then year should be 2099, not
> > 1999.
> >
> > For RR, standard requirements are relaxed.  Implementation may choose
> > matching year from range [current_year - 100; current_year + 100].  It
> > looks reasonable to handle RR in the same way we currently handle YY:
> > select appropriate year in [1970; 2069] range.  It seems like we
> > select this range to start in the same point as unix timestamp.  But
> > nowadays it still looks reasonable: it's about +- 50 from current
> > year.  So, years close to the current one are likely completed
> > correctly.  In Oracle RR returns year in [1950; 1949] range.  So, it
> > seems to be designed near 2000 :). I don't think we need to copy this
> > behavior.
> >
> > Handling YYY and YY in standard way seems quite easy.  We can complete
> > them as 2YYY and 20YY.  This should be standard conforming till 2100.
> >
> > But handling Y looks problematic.  Immutable way of handling this
> > would work only for decade.  Current code completes Y as 200Y and it
> > looks pretty "outdated" now in 2019.  Using current real year would
> > make conversion timestamp-dependent.  This property doesn't look favor
> > for to_date()/to_timestamp() and unacceptable for immutable jsonpath
> > functions (but we can forbid using Y pattern there).  Current patch
> > complete Y as 202Y assuming v13 will be released in 2020.  But I'm not
> > sure what is better solution here.  The bright side is that I haven't
> > seen anybody use Y patten in real life :)
>
> Revised patchset is attached.  It adds and adjusts commit messages,
> comments and does other cosmetic improvements.
>
> I think 0001 and 0002 are well reviewed already.  And these patches
> are usable not only for jsonpath .datetime(), but contain improvements
> for existing to_date()/to_timestamp() SQL functions.  I'm going to
> push these two if no objections.

Those two patches are pushed.  Just before commit I've renamed
deciseconds to "tenths of seconds", sentiseconds to "hundredths of
seconds" as discussed before [1].

The rest of patchset is attached.

Links
1. https://www.postgresql.org/message-id/0409fb42-18d3-bdb7-37ab-d742d5313a40%402ndQuadrant.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Mon, Sep 16, 2019 at 10:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Sat, Sep 14, 2019 at 10:18 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Tue, Aug 27, 2019 at 5:19 AM Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > Revised patchset is attached.  It still requires some polishing.  But
> > > the most doubtful part is handling of RR, YYY, YY and Y.
> > >
> > > Standard requires us to complete YYY, YY and Y with high digits from
> > > current year.  So, if YY matches 99, then year should be 2099, not
> > > 1999.
> > >
> > > For RR, standard requirements are relaxed.  Implementation may choose
> > > matching year from range [current_year - 100; current_year + 100].  It
> > > looks reasonable to handle RR in the same way we currently handle YY:
> > > select appropriate year in [1970; 2069] range.  It seems like we
> > > select this range to start in the same point as unix timestamp.  But
> > > nowadays it still looks reasonable: it's about +- 50 from current
> > > year.  So, years close to the current one are likely completed
> > > correctly.  In Oracle RR returns year in [1950; 1949] range.  So, it
> > > seems to be designed near 2000 :). I don't think we need to copy this
> > > behavior.
> > >
> > > Handling YYY and YY in standard way seems quite easy.  We can complete
> > > them as 2YYY and 20YY.  This should be standard conforming till 2100.
> > >
> > > But handling Y looks problematic.  Immutable way of handling this
> > > would work only for decade.  Current code completes Y as 200Y and it
> > > looks pretty "outdated" now in 2019.  Using current real year would
> > > make conversion timestamp-dependent.  This property doesn't look favor
> > > for to_date()/to_timestamp() and unacceptable for immutable jsonpath
> > > functions (but we can forbid using Y pattern there).  Current patch
> > > complete Y as 202Y assuming v13 will be released in 2020.  But I'm not
> > > sure what is better solution here.  The bright side is that I haven't
> > > seen anybody use Y patten in real life :)
> >
> > Revised patchset is attached.  It adds and adjusts commit messages,
> > comments and does other cosmetic improvements.
> >
> > I think 0001 and 0002 are well reviewed already.  And these patches
> > are usable not only for jsonpath .datetime(), but contain improvements
> > for existing to_date()/to_timestamp() SQL functions.  I'm going to
> > push these two if no objections.
>
> Those two patches are pushed.  Just before commit I've renamed
> deciseconds to "tenths of seconds", sentiseconds to "hundredths of
> seconds" as discussed before [1].
>
> The rest of patchset is attached.

I've reordered the patchset.  I moved the most debatable patch, which
introduces RRRR and RR and changes parsing of YYY, YY and Y to the
end.  I think we have enough of time in this release cycle to decide
whether we want this.

Patches 0001-0005 looks quite mature for me.  I'm going to push this
if no objections.  After pushing them, I'm going to start discussion
related to RR, YY and friends in separate thread.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Alexander Korotkov
Date:
On Mon, Sep 23, 2019 at 10:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I've reordered the patchset.  I moved the most debatable patch, which
> introduces RRRR and RR and changes parsing of YYY, YY and Y to the
> end.  I think we have enough of time in this release cycle to decide
> whether we want this.
>
> Patches 0001-0005 looks quite mature for me.  I'm going to push this
> if no objections.  After pushing them, I'm going to start discussion
> related to RR, YY and friends in separate thread.

Pushed.  Remaining patch is attached.  I'm going to start the separate
thread with its detailed explanation.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Support for jsonpath .datetime() method

From
Nikita Glukhov
Date:
On 25.09.2019 22:55, Alexander Korotkov wrote:

> On Mon, Sep 23, 2019 at 10:05 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> I've reordered the patchset.  I moved the most debatable patch, which
>> introduces RRRR and RR and changes parsing of YYY, YY and Y to the
>> end.  I think we have enough of time in this release cycle to decide
>> whether we want this.
>>
>> Patches 0001-0005 looks quite mature for me.  I'm going to push this
>> if no objections.  After pushing them, I'm going to start discussion
>> related to RR, YY and friends in separate thread.
> Pushed.  Remaining patch is attached.  I'm going to start the separate
> thread with its detailed explanation.

Attached patch with refactoring of compareDatetime() according
to the complaints of Tom Lane in [1]:
  * extracted four subroutines for type conversions
  * extracted subroutine for error reporting
  * added default cases to all switches
  * have_error flag is expected to be not-NULL always
  * fixed errhint() message style

[1] https://www.postgresql.org/message-id/32308.1569455803%40sss.pgh.pa.us

-- 
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment