Thread: [PATH] Correct negative/zero year in to_date/to_timestamp

[PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
Hello, Hackers!

I'm writing another patch and while I was trying to cover corner cases
I found that to_date and to_timestamp work wrong if year in input
value is zero or negative:

postgres=# SELECT
postgres-#      y || '-06-01' as src
postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
('00'||(-y)||'-06-01 BC') END::date
postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y);
   src    |     date      |    to_date    |       to_timestamp
----------+---------------+---------------+---------------------------
 2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
 1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
 0-06-01  |               | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
 -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
 -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC
(5 rows)

Zero year (and century) is accepted and negative years differs by 1
from what they should be.


I've written a patch fixes that. With it results are correct:
postgres=# SELECT
postgres-#      y || '-06-01' as src
postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
('00'||(-y)||'-06-01 BC') END::date
postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y);
   src    |     date      |    to_date    |       to_timestamp
----------+---------------+---------------+---------------------------
 2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
 1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
 -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
 -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
(4 rows)


When year "0" is given, it raises an ERROR:
postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD');
ERROR:  invalid input string for "YYYY"
DETAIL:  Year cannot be 0.


Also I change behavior for era indicator when negatives century or
year are given. In such case era indicator is ignored (for me it is
obvious signs should be OR-ed):
postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC')
postgres-#       ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC');
       to_timestamp        |       to_timestamp
---------------------------+---------------------------
 0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC
(1 row)


Testings, complains, advice, comment improvements are very appreciated.

--
Best regards,
Vitaly Burovoy

Attachment

Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 2/22/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> Testings, complains, advice, comment improvements are very appreciated.

The patch seems simple, but it can lead to a discussion, so I've added it to CF.

[CF]https://commitfest.postgresql.org/9/533/

-- 
Best regards,
Vitaly Burovoy



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Thomas Munro
Date:
On Tue, Feb 23, 2016 at 11:58 AM, Vitaly Burovoy
<vitaly.burovoy@gmail.com> wrote:
> Hello, Hackers!
>
> I'm writing another patch and while I was trying to cover corner cases
> I found that to_date and to_timestamp work wrong if year in input
> value is zero or negative:
>
> postgres=# SELECT
> postgres-#      y || '-06-01' as src
> postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
> ('00'||(-y)||'-06-01 BC') END::date
> postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
> postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
> postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y);
>    src    |     date      |    to_date    |       to_timestamp
> ----------+---------------+---------------+---------------------------
>  2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
>  1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
>  0-06-01  |               | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
>  -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
>  -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC
> (5 rows)
>
> Zero year (and century) is accepted and negative years differs by 1
> from what they should be.
>
>
> I've written a patch fixes that. With it results are correct:
> postgres=# SELECT
> postgres-#      y || '-06-01' as src
> postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
> ('00'||(-y)||'-06-01 BC') END::date
> postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
> postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
> postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y);
>    src    |     date      |    to_date    |       to_timestamp
> ----------+---------------+---------------+---------------------------
>  2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
>  1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
>  -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
>  -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
> (4 rows)
>
>
> When year "0" is given, it raises an ERROR:
> postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD');
> ERROR:  invalid input string for "YYYY"
> DETAIL:  Year cannot be 0.
>
>
> Also I change behavior for era indicator when negatives century or
> year are given. In such case era indicator is ignored (for me it is
> obvious signs should be OR-ed):
> postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC')
> postgres-#       ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC');
>        to_timestamp        |       to_timestamp
> ---------------------------+---------------------------
>  0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC
> (1 row)
>
>
> Testings, complains, advice, comment improvements are very appreciated.

This seems to be a messy topic.  The usage of "AD" and "BC" imply that
TO_DATE is using the anno domini system which doesn't have a year 0,
but in the DATE type perhaps we are using the ISO 8601 model[2] where
1 BC is represented as 0000, leading to the difference of one in all
years before 1 AD?

[1] https://en.wikipedia.org/wiki/Anno_Domini
[2] https://en.wikipedia.org/wiki/ISO_8601#Years

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 2/22/16, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> On Tue, Feb 23, 2016 at 11:58 AM, Vitaly Burovoy
> <vitaly.burovoy@gmail.com> wrote:
>> Hello, Hackers!
>>
>> I'm writing another patch and while I was trying to cover corner cases
>> I found that to_date and to_timestamp work wrong if year in input
>> value is zero or negative:
>>
>> postgres=# SELECT
>> postgres-#      y || '-06-01' as src
>> postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
>> ('00'||(-y)||'-06-01 BC') END::date
>> postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
>> postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
>> postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y);
>>    src    |     date      |    to_date    |       to_timestamp
>> ----------+---------------+---------------+---------------------------
>>  2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
>>  1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
>>  0-06-01  |               | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
>>  -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
>>  -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC
>> (5 rows)
>>
>> Zero year (and century) is accepted and negative years differs by 1
>> from what they should be.
>>
>>
>> I've written a patch fixes that. With it results are correct:
>> postgres=# SELECT
>> postgres-#      y || '-06-01' as src
>> postgres-#     ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN
>> ('00'||(-y)||'-06-01 BC') END::date
>> postgres-#     ,to_date(y || '-06-01', 'YYYY-MM-DD')
>> postgres-#     ,to_timestamp(y || '-06-01', 'YYYY-MM-DD')
>> postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y);
>>    src    |     date      |    to_date    |       to_timestamp
>> ----------+---------------+---------------+---------------------------
>>  2-06-01  | 0002-06-01    | 0002-06-01    | 0002-06-01 00:00:00+00
>>  1-06-01  | 0001-06-01    | 0001-06-01    | 0001-06-01 00:00:00+00
>>  -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC
>>  -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC
>> (4 rows)
>>
>>
>> When year "0" is given, it raises an ERROR:
>> postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD');
>> ERROR:  invalid input string for "YYYY"
>> DETAIL:  Year cannot be 0.
>>
>>
>> Also I change behavior for era indicator when negatives century or
>> year are given. In such case era indicator is ignored (for me it is
>> obvious signs should be OR-ed):
>> postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC')
>> postgres-#       ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC');
>>        to_timestamp        |       to_timestamp
>> ---------------------------+---------------------------
>>  0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC
>> (1 row)
>>
>>
>> Testings, complains, advice, comment improvements are very appreciated.
>
> This seems to be a messy topic.  The usage of "AD" and "BC" imply that
> TO_DATE is using the anno domini system which doesn't have a year 0,
> but in the DATE type perhaps we are using the ISO 8601 model[2] where
> 1 BC is represented as 0000, leading to the difference of one in all
> years before 1 AD?
>
> [1] https://en.wikipedia.org/wiki/Anno_Domini
> [2] https://en.wikipedia.org/wiki/ISO_8601#Years
>
> --
> Thomas Munro
> http://www.enterprisedb.com

Thank you for fast reply and for the link[2]. Be honest I didn't know
ISO8601 specifies 1 BC as +0000.

But the documentation[3] doesn't points that ISO8601 is used for
"YYYY", but it is mentioned for "IYYY", and it is slightly deceiving.
Also I remember that the other part of the documentation says[4] that
"Keep in mind there is no 0 AD" that's why I decided it is impossible
to pass 0000 for YYYY.

Currently behavior with YYYY=0 is still surprising in some cases:
postgres=# SELECT
postgres-# to_date('20 0000-01-01', 'CC YYYY-MM-DD'),
postgres-# to_date('20 0001-01-01', 'CC YYYY-MM-DD'); to_date   |  to_date
------------+------------1901-01-01 | 0001-01-01
(1 row)

but the documentation[3] says "In conversions from string to timestamp
or date, the CC (century) field is ignored if there is a YYY, YYYY or
Y,YYY field."

So is it shared opinion that ISO8601 is used for "YYYY"?

[3]http://www.postgresql.org/docs/devel/static/functions-formatting.html
[4]http://www.postgresql.org/docs/devel/static/functions-datetime.html
-- 
Best regards,
Vitaly Burovoy



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Ivan Kartyshov
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            tested, failed

Applied this patch, it works well, make what it expected correctly, code style is maintained. Regression tests passed
OK.No documentation or comments. 

Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
"Shulgin, Oleksandr"
Date:
On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:
The following review has been posted through the commitfest application:
 
make installcheck-world:  tested, failed
Implements feature:       tested, failed
Spec compliant:           tested, failed
Documentation:            tested, failed

Applied this patch, it works well, make what it expected correctly, code style is maintained. Regression tests passed OK. No documentation or comments.

Why does it say "tested, failed" for all points above there? ;-)

--
Alex

Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Ivan Kartyshov
Date:
<p dir="ltr">> Why does it say "tested, failed" for all points above there? ;-)<p dir="ltr">Hi,  I just used Web
reviewerform on <a href="https://commitfest.postgresql.org">https://commitfest.postgresql.org</a> to make review on
patch,but form doesn't work properly unlike the patch.))   

Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 2/26/16, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
> On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov
> <i.kartyshov@postgrespro.ru>
> wrote:
>
>> The following review has been posted through the commitfest application:
>
>> make installcheck-world:  tested, failed
>> Implements feature:       tested, failed
>> Spec compliant:           tested, failed
>> Documentation:            tested, failed
>>
>> Applied this patch, it works well, make what it expected correctly, code
>> style is maintained. Regression tests passed OK. No documentation or
>> comments.
>>
>
> Why does it say "tested, failed" for all points above there? ;-)

I hope Ivan misunderstood that "tested" and "passed" are two different
options, not a single "tested, passed" ;-)

Unfortunately, it seems there should be a discussion about meaning of
"YYYY": it seems authors made it as ISO8601-compliant (but there are
still several bugs), but there is no proofs in the documentation[1].

On the one hand there is only "year" mentioned for "YYYY", "YYY",
etc., and "ISO 8601 ... year" is "week-numbering", i.e. "IYYY", "IYY",
etc. only for using with "IDDD", "ID" and "IW".

On the other hand "extract" has two different options: "year" and
"isoyear" and only the second one is ISO8601-compliant (moreover it is
week-numbering at the same time):
postgres=# SELECT y src, EXTRACT(year FROM y) AS year, EXTRACT(isoyear
FROM y) AS isoyear
postgres-# FROM unnest(ARRAY[
postgres(# '4713-01-01 BC','4714-12-31 BC','4714-12-29 BC','4714-12-28
BC']::date[]) y;     src      | year  | isoyear
---------------+-------+---------4713-01-01 BC | -4713 |   -47124714-12-31 BC | -4714 |   -47124714-12-29 BC | -4714 |
-47124714-12-28 BC | -4714 |   -4713
 
(4 rows)

So there is lack of consistency: something should be changed: either
"extract(year..." (and the documentation[2], but there is "Keep in
mind there is no 0 AD, ..." for a long time, so it is a change which
breaks compatibility; and the patch will be completely different), or
the patch (it has an influence on "IYYY", so it is obviously wrong).

Now (after the letter[3] from Thomas Munro) I know the patch is not
ready for committing even with minimal changes. But I'm waiting for a
discussion: what part should be changed?

I would change behavior of "to_date" and "to_timestamp" to match with
extract options "year"/"isoyear", but I think getting decision of the
community before modifying the patch is a better idea.

[1]http://www.postgresql.org/docs/devel/static/functions-formatting.html
[2]http://www.postgresql.org/docs/devel/static/functions-datetime.html
[3]http://www.postgresql.org/message-id/CAEepm=0AzNVy+frtYni04iMdW4TLGZAeLLJqMHMcJBrE+LnWNQ@mail.gmail.com
--
Best regards,
Vitaly Burovoy



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Robert Haas
Date:
On Tue, Feb 23, 2016 at 6:23 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> This seems to be a messy topic.  The usage of "AD" and "BC" imply that
> TO_DATE is using the anno domini system which doesn't have a year 0,
> but in the DATE type perhaps we are using the ISO 8601 model[2] where
> 1 BC is represented as 0000, leading to the difference of one in all
> years before 1 AD?

Well, the way to figure that out, I think, is to look at the
documentation.  I had a look at...

http://www.postgresql.org/docs/9.5/static/functions-formatting.html

...which says...

YYYY year (4 or more digits)
IYYY ISO 8601 week-numbering year (4 or more digits)

I don't really understand ISO 8601, but if IYYY is giving us an ISO
8601 thing, then presumably YYYY is not supposed to be giving us that. The same page elsewhere refers to Gregorian
dates,and other parts
 
of the documentation seem to agree that's what we use.

But having said that, this is kind of a weird situation.  We're
talking about this:

rhaas=# SELECT y || '-06-01', to_date (y || '-06-01', 'YYYY-MM-DD')
FROM (VALUES (2), (1), (0), (-1), (-2)) t(y);?column? |    to_date
----------+---------------2-06-01  | 0002-06-011-06-01  | 0001-06-010-06-01  | 0001-06-01 BC-1-06-01 | 0002-06-01
BC-2-06-01| 0003-06-01 BC
 
(5 rows)

Now, I would be tempted to argue that passing to_date('-1-06-01',
'YYYY-MM-DD') ought to do the same thing as to_date('pickle',
'YYYY-MM-DD') i.e. throw an error.  There's all kinds of what seems to
me to be shoddy error checking in this area:

rhaas=# select to_date('-3', 'YYYY:MM&DD');   to_date
---------------0004-01-01 BC
(1 row)

It's pretty hard for me to swallow the argument that the input matches
the provided format.

However, I'm not sure we ought to tinker with the behavior in this
area.  If YYYY-MM-DD is going to accept things that are not of the
format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format,
then I think it should probably keep doing the same things it's always
done.  If you want to supply a BC date, why not do this:

rhaas=# select to_date('0001-06-01 BC', 'YYYY-MM-DD BC');   to_date
---------------0001-06-01 BC
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATH] Correct negative/zero year into_date/to_timestamp

From
Yury Zhuravlev
Date:
>But I'm waiting for a discussion: what part should be changed?

I for compliance with the standard (all ISO). In addition Oracle uses
"IYYY" format.
Standards allow to reduce liability. But I think someone like Tom Lane can
have the final say because we break backward compatibility.
Options "year"/"isoyear" may confuse the users.

Thanks.

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 2/27/16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 23, 2016 at 6:23 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> This seems to be a messy topic.  The usage of "AD" and "BC" imply that
>> TO_DATE is using the anno domini system which doesn't have a year 0,
>> but in the DATE type perhaps we are using the ISO 8601 model[2] where
>> 1 BC is represented as 0000, leading to the difference of one in all
>> years before 1 AD?
>
> Well, the way to figure that out, I think, is to look at the
> documentation.  I had a look at...
>
> http://www.postgresql.org/docs/9.5/static/functions-formatting.html
>
> ...which says...
>
> YYYY year (4 or more digits)
> IYYY ISO 8601 week-numbering year (4 or more digits)
>
> I don't really understand ISO 8601, but if IYYY is giving us an ISO
> 8601 thing, then presumably YYYY is not supposed to be giving us that.
>   The same page elsewhere refers to Gregorian dates, and other parts
> of the documentation seem to agree that's what we use.
>
> But having said that, this is kind of a weird situation.  We're
> talking about this:
>
> rhaas=# SELECT y || '-06-01', to_date (y || '-06-01', 'YYYY-MM-DD')
> FROM (VALUES (2), (1), (0), (-1), (-2)) t(y);
>  ?column? |    to_date
> ----------+---------------
>  2-06-01  | 0002-06-01
>  1-06-01  | 0001-06-01
>  0-06-01  | 0001-06-01 BC
>  -1-06-01 | 0002-06-01 BC
>  -2-06-01 | 0003-06-01 BC
> (5 rows)
>
> Now, I would be tempted to argue that passing to_date('-1-06-01',
> 'YYYY-MM-DD') ought to do the same thing as to_date('pickle',
> 'YYYY-MM-DD') i.e. throw an error.  There's all kinds of what seems to
> me to be shoddy error checking in this area:
>
> rhaas=# select to_date('-3', 'YYYY:MM&DD');
>     to_date
> ---------------
>  0004-01-01 BC
> (1 row)
>
> It's pretty hard for me to swallow the argument that the input matches
> the provided format.
>
> However, I'm not sure we ought to tinker with the behavior in this
> area.  If YYYY-MM-DD is going to accept things that are not of the
> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format,

It is not about format, it is about values.

> then I think it should probably keep doing the same things it's always
> done.  If you want to supply a BC date, why not do

Because it is inconvenient a little. If one value ("-2345") is passed,
another one ("2346 BC") is got. In the other case a programmer must
check for negative value, and if so change a sign and add "BC" to the
format. Moreover the programmer must keep in mind that it is not
enough to have usual date format "DD/MM/YYYY", because sometimes there
can be "BC" part.

> this:
>
> rhaas=# select to_date('0001-06-01 BC', 'YYYY-MM-DD BC');
>     to_date
> ---------------
>  0001-06-01 BC
> (1 row)

Also because of:

postgres=# SELECT EXTRACT(year FROM to_date('-3', 'YYYY'));date_part
-----------       -4
(1 row)

Note that the documentation[1] says "Keep in mind there is no 0 AD".
More examples by the link[2].

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

[1]http://www.postgresql.org/docs/devel/static/functions-datetime.html
[2]http://www.postgresql.org/message-id/CAKOSWNn6KpfAbmrsHzyN8+2NHpyfVBdMPHYa5pQgUNy8LP2L2Q@mail.gmail.com
-- 
Best regards,
Vitaly Burovoy



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Robert Haas
Date:
On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com> wrote:
>> However, I'm not sure we ought to tinker with the behavior in this
>> area.  If YYYY-MM-DD is going to accept things that are not of the
>> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format,
>
> It is not about format, it is about values.

I disagree.  In a format like "-1-06-01", you want the first minus to
indicate negation and the other two to be a separator.  That's not
very far away from wanting the database to read your mind.

> Because it is inconvenient a little. If one value ("-2345") is passed,
> another one ("2346 BC") is got. In the other case a programmer must
> check for negative value, and if so change a sign and add "BC" to the
> format. Moreover the programmer must keep in mind that it is not
> enough to have usual date format "DD/MM/YYYY", because sometimes there
> can be "BC" part.

Yeah, well, that's life.  You can write an alternative function to
construct dates that works the way you like, and that may well be a
good idea.  But I think *this* change is not a good idea, and
accordingly I vote we reject this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 3/11/16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy
> <vitaly.burovoy@gmail.com> wrote:
>>> However, I'm not sure we ought to tinker with the behavior in this
>>> area.  If YYYY-MM-DD is going to accept things that are not of the
>>> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format,
>>
>> It is not about format, it is about values.
>
> I disagree.  In a format like "-1-06-01", you want the first minus to
> indicate negation and the other two to be a separator.  That's not
> very far away from wanting the database to read your mind.

It is not my wish. The database does it just now:
postgres=# SELECT to_date('-1-06-01', 'YYYY');   to_date
---------------0002-01-01 BC
(1 row)

>> Because it is inconvenient a little. If one value ("-2345") is passed,
>> another one ("2346 BC") is got. In the other case a programmer must
>> check for negative value, and if so change a sign and add "BC" to the
>> format. Moreover the programmer must keep in mind that it is not
>> enough to have usual date format "DD/MM/YYYY", because sometimes there
>> can be "BC" part.
>
> Yeah, well, that's life.  You can write an alternative function to
> construct dates that works the way you like, and that may well be a
> good idea.  But I think *this* change is not a good idea, and
> accordingly I vote we reject this patch.

My wish is to make the behavior be consistent.
Since there are two reverse functions ("extract" and "to_date"
["to_timestamp" in fact is the same]), I expect that is described as
"year" ("year"-"YYYY") means the same thing in both of them, the same
with pairs "isoyear"-"IYYY", "dow"-"DDD", "isodow"-"IDDD", etc.

Now "year" is _not_ the same as "YYYY" (but it cat be so according to
the documentation: there is no mentioning of any ISO standard),
whereas "isoyear" _is_ the same:
postgres=# SELECT y, to_date(y, 'YYYY')YYYY,to_date(y, 'IYYY')IYYY
postgres-# FROM(VALUES('-1-06-01'))t(y);   y     |     yyyy      |     iyyy
----------+---------------+----------------1-06-01 | 0002-01-01 BC | 0002-01-01 BC
(1 row)

and
postgres=# SELECT y, date_part('year', y)YYYY,date_part('isoyear', y)IYYY
postgres-# FROM(VALUES('0002-06-01 BC'::date))t(y);      y       | yyyy | iyyy
---------------+------+------0002-06-01 BC |   -2 |   -1
(1 row)

P.S.: proposed patch changes IYYY as well, but it is easy to fix it
and I'm ready to do it after finding a consensus.
-- 
Best regards,
Vitaly Burovoy



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy
> <vitaly.burovoy@gmail.com> wrote:
>>> However, I'm not sure we ought to tinker with the behavior in this
>>> area.  If YYYY-MM-DD is going to accept things that are not of the
>>> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format,

>> It is not about format, it is about values.

> I disagree.  In a format like "-1-06-01", you want the first minus to
> indicate negation and the other two to be a separator.  That's not
> very far away from wanting the database to read your mind.

[ catches up with thread... ]

Yes.  It would be more reasonable IMO for to_date to throw an error
because this is bad input.  On the other hand, to_date mostly doesn't
throw an error no matter how bad the input is.  I think that may have
been intentional, although its habit of producing garbage output
instead (and not, say, NULL) is certainly not very happy-making.

It's a bit schizophrenic for this patch to be both adding ereport's
for year zero (thereby breaking the no-failure-on-bad-input policy)
*and* trying to produce sane output for arguably-insane input.

I don't really see an argument why '0001-00-00' should be accepted
but '0000-01-01' should throw an error, but that would be the result
if we take this patch.  And I quite agree with Robert that it's insane
to consider '-2-06-01' as satisfying the format 'YYYY-MM-DD'.  The
fact that it even appears to do something related to a BC year is
an implementation artifact, and not a very nice one.

I would be in favor of a ground-up rewrite of to_date and friends, with
some better-stated principles (in particular, a rationale why they even
exist when date_in and friends usually do it better) and crisper error
detection.  But I'm not seeing the argument that hacking at the margins
like this moves us forward on either point; what it does do is create
another backward-compatibility hazard for any such rewrite.

In short, I vote with Robert to reject this patch.

BTW, the context for the original report wasn't clear, but I wonder how
much of the actual problem could be addressed by teaching make_date()
and friends to accept negative year values as meaning BC.
        regards, tom lane



Re: [PATH] Correct negative/zero year in to_date/to_timestamp

From
Vitaly Burovoy
Date:
On 3/11/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ catches up with thread... ]
>
> Yes.  It would be more reasonable IMO for to_date to throw an error
> because this is bad input.  On the other hand, to_date mostly doesn't
> throw an error no matter how bad the input is.  I think that may have
> been intentional, although its habit of producing garbage output
> instead (and not, say, NULL) is certainly not very happy-making.
>
> It's a bit schizophrenic for this patch to be both adding ereport's
> for year zero (thereby breaking the no-failure-on-bad-input policy)
> *and* trying to produce sane output for arguably-insane input.
>
> I don't really see an argument why '0001-00-00' should be accepted
> but '0000-01-01' should throw an error, but that would be the result
> if we take this patch.

Well. In case of zero year it could return the first year instead of
an exception by the same way as "MM" and "DD" do it...

> And I quite agree with Robert that it's insane
> to consider '-2-06-01' as satisfying the format 'YYYY-MM-DD'.  The
> fact that it even appears to do something related to a BC year is
> an implementation artifact, and not a very nice one.
>
> I would be in favor of a ground-up rewrite of to_date and friends, with
> some better-stated principles (in particular, a rationale why they even
> exist when date_in and friends usually do it better)

I think they exist because date_in can't convert something like
"IYYY-IDDD" (I wonder if date_in can do so) or to parse dates/stamps
independent from the "DateStyle" parameter.

> and crisper error
> detection.  But I'm not seeing the argument that hacking at the margins
> like this moves us forward on either point; what it does do is create
> another backward-compatibility hazard for any such rewrite.
>
> In short, I vote with Robert to reject this patch.

Accepted. Let's agree it is a case "garbage in, garbage out" and "an
implementation artifact".

> BTW, the context for the original report wasn't clear,

The context was to make "extract" and "to_date"/"to_timestamp" be
consistently reversible for "year"/"YYYY" for negative values (since
both of them support ones).

> but I wonder how
> much of the actual problem could be addressed by teaching make_date()
> and friends to accept negative year values as meaning BC.
>
>             regards, tom lane

Thank Thomas, Robert and Tom very much for an interesting (but short)
discussion.
-- 
Best regards,
Vitaly Burovoy