Thread: to_timestamp docs

to_timestamp docs

From
Bruce Momjian
Date:
I don't think the changes made in PG 12 are documented accurately.  It
currently says:

    to_timestamp and to_date matches any single separator in the input
    string or is skipped

However, I think it is more accurate to say _multiple_ whitespace can
also be matched by a single separator:

    SELECT to_timestamp('%1976','_YYYY');
          to_timestamp
    ------------------------
     1976-01-01 00:00:00-05
    
    SELECT to_timestamp('%%1976','_YYYY');
    ERROR:  invalid value "%197" for "YYYY"
    DETAIL:  Value must be an integer.
    
    -- two spaces
-->    SELECT to_timestamp('  1976','_YYYY');
          to_timestamp
    ------------------------
     1976-01-01 00:00:00-05

-->    SELECT to_timestamp(E'\t\t\t1976','_YYYY');
          to_timestamp
    ------------------------
     1976-01-01 00:00:00-05

Proposed patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachment

Re: to_timestamp docs

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I don't think the changes made in PG 12 are documented accurately.

That code is swapped out of my head at the moment, but it looks
to me like the para before the one you changed is where we discuss
the behavior for whitespace.  I'm not sure that this change is
right, or an improvement, in the context of both paras.

            regards, tom lane



Re: to_timestamp docs

From
Bruce Momjian
Date:
On Wed, May  1, 2019 at 10:01:50AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I don't think the changes made in PG 12 are documented accurately.
> 
> That code is swapped out of my head at the moment, but it looks
> to me like the para before the one you changed is where we discuss
> the behavior for whitespace.  I'm not sure that this change is
> right, or an improvement, in the context of both paras.

Thanks.  I think I see the sentence you are thinking of:

       <function>to_timestamp</function> and <function>to_date</function>
       skip multiple blank spaces at the beginning of the input string
       and around date and time values unless the <literal>FX</literal>
       option is used.

However, first, it is unclear what 'skip' means here, i.e., does it mean
multiple blank spaces become a single space, or they are ignored.  That
should be clarified, though I am unclear if that matters based on how
separators are handled.  Also, I think "blank spaces" should be
"whitespace".

Second, I see inconsistent behaviour around the use of FX for various
patterns, e.g.:

    SELECT to_timestamp('5   1976','FXDD_FXYYYY');
          to_timestamp
    ------------------------
     1976-01-05 00:00:00-05
    
    SELECT to_timestamp('JUL JUL','FXMON_FXMON');
              to_timestamp
    ---------------------------------
     0001-07-01 00:00:00-04:56:02 BC

    SELECT to_timestamp('JUL    JUL','FXMON_FXMON');
    ERROR:  invalid value "   " for "MON"
    DETAIL:  The given value did not match any of the allowed values for this field.

It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace,
while MON does not?  Also, I used these queries to determine it is
"trailing" whitespace that "FXMON" controls:

    SELECT to_timestamp('JUL   JUL JUL','MON_FXMON_MON');
              to_timestamp
    ---------------------------------
     0001-07-01 00:00:00-04:56:02 BC
    
    SELECT to_timestamp('JUL JUL   JUL','MON_FXMON_MON');
    ERROR:  invalid value "  J" for "MON"
    DETAIL:  The given value did not match any of the allowed values for this field.

Once we figure out how it is behaving I think we can pull together the
FX text above to reference the separator text below.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: to_timestamp docs

From
Arthur Zakirov
Date:
Hello,

On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote:
> Thanks.  I think I see the sentence you are thinking of:
>
>        <function>to_timestamp</function> and <function>to_date</function>
>        skip multiple blank spaces at the beginning of the input string
>        and around date and time values unless the <literal>FX</literal>
>        option is used.
>
> However, first, it is unclear what 'skip' means here, i.e., does it mean
> multiple blank spaces become a single space, or they are ignored.

I worked at to_timestamp some time ago. In this case multiple bank spaces at
the beginning should be ignored.

> Second, I see inconsistent behaviour around the use of FX for various
> patterns, e.g.:
>
>         SELECT to_timestamp('5   1976','FXDD_FXYYYY');
>               to_timestamp
>         ------------------------
>          1976-01-05 00:00:00-05

Hm, I think strspace_len() is partly to blame here, which is called by
from_char_parse_int_len():

/*
 * Skip any whitespace before parsing the integer.
 */
*src += strspace_len(*src);

But even if you remove this line of code then strtol() will eat
survived whitespaces:

result = strtol(init, src, 10);

Not sure if we need some additional checks here if FX is set.

> It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace,
> while MON does not?  Also, I used these queries to determine it is
> "trailing" whitespace that "FXMON" controls:
>
>         SELECT to_timestamp('JUL   JUL JUL','MON_FXMON_MON');
>                   to_timestamp
>         ---------------------------------
>          0001-07-01 00:00:00-04:56:02 BC
>
>         SELECT to_timestamp('JUL JUL   JUL','MON_FXMON_MON');
>         ERROR:  invalid value "  J" for "MON"
>         DETAIL:  The given value did not match any of the allowed values for this field.

The problem here is that you need to specify FX only once and at beginning of
the format string. It is stated in the documentation:

"FX must be specified as the first item in the template."

It works globally (but only for remaining string if you don't put it
at the beginning)
and you can set it only once. For example:

=# SELECT to_timestamp('JUL   JUL JUL','FXMON_MON_MON');
ERROR:  invalid value "  J" for "MON"
DETAIL:  The given value did not match any of the allowed values for this field.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: to_timestamp docs

From
Bruce Momjian
Date:
On Wed, May  1, 2019 at 11:20:05PM +0300, Arthur Zakirov wrote:
> Hello,
> 
> On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote:
> > Thanks.  I think I see the sentence you are thinking of:
> >
> >        <function>to_timestamp</function> and <function>to_date</function>
> >        skip multiple blank spaces at the beginning of the input string
> >        and around date and time values unless the <literal>FX</literal>
> >        option is used.
> >
> > However, first, it is unclear what 'skip' means here, i.e., does it mean
> > multiple blank spaces become a single space, or they are ignored.
> 
> I worked at to_timestamp some time ago. In this case multiple bank spaces at
> the beginning should be ignored.

OK.

> > Second, I see inconsistent behaviour around the use of FX for various
> > patterns, e.g.:
> >
> >         SELECT to_timestamp('5   1976','FXDD_FXYYYY');
> >               to_timestamp
> >         ------------------------
> >          1976-01-05 00:00:00-05
> 
> Hm, I think strspace_len() is partly to blame here, which is called by
> from_char_parse_int_len():
> 
> /*
>  * Skip any whitespace before parsing the integer.
>  */
> *src += strspace_len(*src);
> 
> But even if you remove this line of code then strtol() will eat
> survived whitespaces:
> 
> result = strtol(init, src, 10);
> 
> Not sure if we need some additional checks here if FX is set.

Yes, I suspected it was part of the input function, but it seems it is
done in two places.  It seems we need the opposite of strspace_len() in
that place to throw an error if we are in FX mode.

> The problem here is that you need to specify FX only once and at beginning of
> the format string. It is stated in the documentation:
> 
> "FX must be specified as the first item in the template."

Uh, FX certainly changes behavior if it isn't the first thing in the
format string.

> It works globally (but only for remaining string if you don't put it
> at the beginning)

Uh, then the documentation is wrong?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: to_timestamp docs

From
Alexander Korotkov
Date:
On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> Hello,
>
> On Wed, May 1, 2019 at 6:05 PM Bruce Momjian <bruce@momjian.us> wrote:
> > Thanks.  I think I see the sentence you are thinking of:
> >
> >        <function>to_timestamp</function> and <function>to_date</function>
> >        skip multiple blank spaces at the beginning of the input string
> >        and around date and time values unless the <literal>FX</literal>
> >        option is used.
> >
> > However, first, it is unclear what 'skip' means here, i.e., does it mean
> > multiple blank spaces become a single space, or they are ignored.
>
> I worked at to_timestamp some time ago. In this case multiple bank spaces at
> the beginning should be ignored.
>
> > Second, I see inconsistent behaviour around the use of FX for various
> > patterns, e.g.:
> >
> >         SELECT to_timestamp('5   1976','FXDD_FXYYYY');
> >               to_timestamp
> >         ------------------------
> >          1976-01-05 00:00:00-05
>
> Hm, I think strspace_len() is partly to blame here, which is called by
> from_char_parse_int_len():
>
> /*
>  * Skip any whitespace before parsing the integer.
>  */
> *src += strspace_len(*src);
>
> But even if you remove this line of code then strtol() will eat
> survived whitespaces:
>
> result = strtol(init, src, 10);
>
> Not sure if we need some additional checks here if FX is set.

I'd like to add that this behavior is not new in 12.  It was the same before.

> > It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace,
> > while MON does not?  Also, I used these queries to determine it is
> > "trailing" whitespace that "FXMON" controls:
> >
> >         SELECT to_timestamp('JUL   JUL JUL','MON_FXMON_MON');
> >                   to_timestamp
> >         ---------------------------------
> >          0001-07-01 00:00:00-04:56:02 BC
> >
> >         SELECT to_timestamp('JUL JUL   JUL','MON_FXMON_MON');
> >         ERROR:  invalid value "  J" for "MON"
> >         DETAIL:  The given value did not match any of the allowed values for this field.
>
> The problem here is that you need to specify FX only once and at beginning of
> the format string. It is stated in the documentation:
>
> "FX must be specified as the first item in the template."
>
> It works globally (but only for remaining string if you don't put it
> at the beginning)
> and you can set it only once. For example:
>
> =# SELECT to_timestamp('JUL   JUL JUL','FXMON_MON_MON');
> ERROR:  invalid value "  J" for "MON"
> DETAIL:  The given value did not match any of the allowed values for this field.

Actually, FX takes effect on subsequent format patterns.  This is not
documented, but it copycats Oracle behavior.  Sure, normally FX should
be specified as the first item.  We could document current behavior or
restrict specifying FX not as first item.  This is also not new in 12,
so documenting current behavior is better for compatibility.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: to_timestamp docs

From
Bruce Momjian
Date:
On Thu, May  2, 2019 at 12:49:23AM +0300, Alexander Korotkov wrote:
> On Wed, May 1, 2019 at 11:20 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> > Hello,
> > Not sure if we need some additional checks here if FX is set.
> 
> I'd like to add that this behavior is not new in 12.  It was the same before.

Agreed, but since we are looking at it, let's document it.

> > > It seems DD and YYYY (as numerics?) in FX mode eat trailing whitespace,
> > > while MON does not?  Also, I used these queries to determine it is
> > > "trailing" whitespace that "FXMON" controls:
> > >
> > >         SELECT to_timestamp('JUL   JUL JUL','MON_FXMON_MON');
> > >                   to_timestamp
> > >         ---------------------------------
> > >          0001-07-01 00:00:00-04:56:02 BC
> > >
> > >         SELECT to_timestamp('JUL JUL   JUL','MON_FXMON_MON');
> > >         ERROR:  invalid value "  J" for "MON"
> > >         DETAIL:  The given value did not match any of the allowed values for this field.
> >
> > The problem here is that you need to specify FX only once and at beginning of
> > the format string. It is stated in the documentation:
> >
> > "FX must be specified as the first item in the template."
> >
> > It works globally (but only for remaining string if you don't put it
> > at the beginning)
> > and you can set it only once. For example:
> >
> > =# SELECT to_timestamp('JUL   JUL JUL','FXMON_MON_MON');
> > ERROR:  invalid value "  J" for "MON"
> > DETAIL:  The given value did not match any of the allowed values for this field.
> 
> Actually, FX takes effect on subsequent format patterns.  This is not
> documented, but it copycats Oracle behavior.  Sure, normally FX should
> be specified as the first item.  We could document current behavior or
> restrict specifying FX not as first item.  This is also not new in 12,
> so documenting current behavior is better for compatibility.

Agreed.  Since is it pre-12 behavior, I suggest we just document it and
not change it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: to_timestamp docs

From
Arthur Zakirov
Date:
On Thu, May 2, 2019 at 12:49 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> > It works globally (but only for remaining string if you don't put it
> > at the beginning)
> > and you can set it only once. For example:
> >
> > =# SELECT to_timestamp('JUL   JUL JUL','FXMON_MON_MON');
> > ERROR:  invalid value "  J" for "MON"
> > DETAIL:  The given value did not match any of the allowed values for this field.
>
> Actually, FX takes effect on subsequent format patterns.  This is not
> documented, but it copycats Oracle behavior.  Sure, normally FX should
> be specified as the first item.  We could document current behavior or
> restrict specifying FX not as first item.  This is also not new in 12,
> so documenting current behavior is better for compatibility.

I went to Oracle's documentation. It seems that the behavior is
slightly different.
Their documentation says:

"A modifier can appear in a format model more than once. In such a case,
each subsequent occurrence toggles the effects of the modifier. Its effects are
enabled for the portion of the model following its first occurrence, and then
disabled for the portion following its second, and then reenabled for
the portion
following its third, and so on."

In PostgreSQL one cannot disable exact mode using second FX. I think we
shouldn't add some restriction for FX. Instead PostgreSQL's documentation
can be fixed. And current explanation in the documentation might be wrong as
Bruce pointed.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: to_timestamp docs

From
Arthur Zakirov
Date:
On Thu, May 2, 2019 at 1:03 AM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> On Thu, May 2, 2019 at 12:49 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Actually, FX takes effect on subsequent format patterns.  This is not
> > documented, but it copycats Oracle behavior.  Sure, normally FX should
> > be specified as the first item.  We could document current behavior or
> > restrict specifying FX not as first item.  This is also not new in 12,
> > so documenting current behavior is better for compatibility.
>
> I went to Oracle's documentation. It seems that the behavior is
> slightly different.
> Their documentation says:
>
> "A modifier can appear in a format model more than once. In such a case,
> each subsequent occurrence toggles the effects of the modifier. Its effects are
> enabled for the portion of the model following its first occurrence, and then
> disabled for the portion following its second, and then reenabled for
> the portion
> following its third, and so on."

What about the patch I attached? It fixes the explanation of FX option a little.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment