Re: Support for jsonpath .datetime() method - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Support for jsonpath .datetime() method
Date
Msg-id CAPpHfduqKUgvTX7g89bzcUcBMH5ivbWiYZjdepbadVyvrdkmvg@mail.gmail.com
Whole thread Raw
In response to Re: Support for jsonpath .datetime() method  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Support for jsonpath .datetime() method
List pgsql-hackers
On Tue, Aug 13, 2019 at 12:08 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Aug 1, 2019 at 1:31 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Sat, Jul 27, 2019 at 2:43 AM Andrew Dunstan
> > <andrew.dunstan@2ndquadrant.com> wrote:
> > > On 7/23/19 6:48 PM, Nikita Glukhov wrote:
> > > > Some concrete pieces of review:
> > > >> +       <row>
> > > >> +        <entry><literal>FF1</literal></entry>
> > > >> +        <entry>decisecond (0-9)</entry>
> > > >> +       </row>
> > > >>
> > > >> Let's not use such weird terms as "deciseconds".  We could say
> > > >> "fractional seconds, 1 digit" etc. or something like that.
> > > > And what about "tenths of seconds", "hundredths of seconds"?
> > >
> > > Yes, those are much better.
> >
> > I've moved this to the September CF, still in "Waiting on Author" state.
>
> I'd like to summarize differences between standard datetime parsing
> and our to_timestamp()/to_date().

Let me describe my proposal to overcome these differences.

> 1) Standard defines much less datetime template parts.  Namely it defines:
> YYYY | YYY | YY | Y
> RRRR | RR
> MM
> DD
> DDD
> HH | HH12
> HH24
> MI
> SS
> SSSSS
> FF1 | FF2 | FF3 | FF4 | FF5 | FF6 | FF7 | FF8 | FF9
> A.M. | P.M.
> TZH
> TZM
>
> We support majority of them and much more.

Regarding non-contradicting template parts we can support them in
.datetime() method too.  That would be our extension to standard.  See
no problem here.

> Incompatibilities are:
>  * SSSS (our name is SSSSS),

Since SSSS is not reserved, I'd propose to make SSSS an alias for SSSSS.

>  * We don't support RRRR | RR,
>  * Our handling of YYYY | YYY | YY | Y is different.  What we have
> here is more like RRRR | RR in standard (Nikita explained that
> upthread [1]),

I'd like to make YYYY | YYY | YY | Y and RRRR | RR behavior standard
conforming in both to_timestamp()/to_date() and .datetime().  Handling
these template parts differently in different functions would be
confusing for users.

>  * We don't support FF[1-9].  FF[1-6] are implemented in patch.  We
> can't support FF[7-9], because our binary representation of timestamp
> datatype don't have enough of precision.

I propose to postpone implementation of FF[7-9].  We can support them
later once we have precise enough datatypes.

> 2) Standard defines only following delimiters: <minus sign>, <period>,
> <solidus>, <comma>, <apostrophe>, <semicolon>, <colon>, <space>.  And
> it requires strict matching of separators between template and input
> strings.  We don't do so either in FX or non-FX mode.
>
> For instance, we allow both to_date('2019/12/31', 'YYYY-MM-DD') and
> to_date('2019/12/31', 'FXYYYY-MM-DD').  But according to standard this
> date should be written only as '2019-12-31' to match given template
> string.
>
> 4) For non-delimited template parts standard requires matching to
> digit sequences of lengths between 1 and maximum number of characters
> of that template part.  We don't always do so.  For instance, we allow
> more than 4 digits to correspond to YYYY, more than 3 digits to
> correspond to YYY and so on.
>
> # select to_date('2019-12-31', 'YYY-MM-DD');
>   to_date
> ------------
>  2019-12-31
> (1 row)

In order to implement these I'd like to propose introduction of
special do_to_timestamp() flag, which would define standard conforming
parsing.  This flag would be used in .datetime() jsonpath method.
Later we also should use it for CAST(... FORMAT ...) expression, which
should also do standard conforming parsing

> 3) Standard prescribes recognition of digits according to \p{Nd}
> regex.  \p{Nd} matches to "a digit zero through nine in any script
> except ideographic scripts".  As far as I remember, we currently do
> recognize only ASCII digits.

Support all unicode digit scripts would be cool for both
to_timestamp()/to_date() and standard parsing.  However, I think this
could be postponed.  Personally I didn't meet non-ascii digits in
databases yet.  If needed one can implement this later, shouldn't be
hard.

If no objections, Nikita and me will work on revised patchset based on
this proposal.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Improve search for missing parent downlinks in amcheck
Next
From: Ryan Lambert
Date:
Subject: Re: FETCH FIRST clause PERCENT option