Thread: [PATH] Correct negative/zero year in to_date/to_timestamp
Hello, Hackers! I'm writing another patch and while I was trying to cover corner cases I found that to_date and to_timestamp work wrong if year in input value is zero or negative: postgres=# SELECT postgres-# y || '-06-01' as src postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN ('00'||(-y)||'-06-01 BC') END::date postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y); src | date | to_date | to_timestamp ----------+---------------+---------------+--------------------------- 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 0-06-01 | | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC (5 rows) Zero year (and century) is accepted and negative years differs by 1 from what they should be. I've written a patch fixes that. With it results are correct: postgres=# SELECT postgres-# y || '-06-01' as src postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN ('00'||(-y)||'-06-01 BC') END::date postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y); src | date | to_date | to_timestamp ----------+---------------+---------------+--------------------------- 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC (4 rows) When year "0" is given, it raises an ERROR: postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD'); ERROR: invalid input string for "YYYY" DETAIL: Year cannot be 0. Also I change behavior for era indicator when negatives century or year are given. In such case era indicator is ignored (for me it is obvious signs should be OR-ed): postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC') postgres-# ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC'); to_timestamp | to_timestamp ---------------------------+--------------------------- 0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC (1 row) Testings, complains, advice, comment improvements are very appreciated. -- Best regards, Vitaly Burovoy
Attachment
On 2/22/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > Testings, complains, advice, comment improvements are very appreciated. The patch seems simple, but it can lead to a discussion, so I've added it to CF. [CF]https://commitfest.postgresql.org/9/533/ -- Best regards, Vitaly Burovoy
On Tue, Feb 23, 2016 at 11:58 AM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: > Hello, Hackers! > > I'm writing another patch and while I was trying to cover corner cases > I found that to_date and to_timestamp work wrong if year in input > value is zero or negative: > > postgres=# SELECT > postgres-# y || '-06-01' as src > postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN > ('00'||(-y)||'-06-01 BC') END::date > postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') > postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') > postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y); > src | date | to_date | to_timestamp > ----------+---------------+---------------+--------------------------- > 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 > 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 > 0-06-01 | | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC > -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC > -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC > (5 rows) > > Zero year (and century) is accepted and negative years differs by 1 > from what they should be. > > > I've written a patch fixes that. With it results are correct: > postgres=# SELECT > postgres-# y || '-06-01' as src > postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN > ('00'||(-y)||'-06-01 BC') END::date > postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') > postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') > postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y); > src | date | to_date | to_timestamp > ----------+---------------+---------------+--------------------------- > 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 > 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 > -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC > -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC > (4 rows) > > > When year "0" is given, it raises an ERROR: > postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD'); > ERROR: invalid input string for "YYYY" > DETAIL: Year cannot be 0. > > > Also I change behavior for era indicator when negatives century or > year are given. In such case era indicator is ignored (for me it is > obvious signs should be OR-ed): > postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC') > postgres-# ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC'); > to_timestamp | to_timestamp > ---------------------------+--------------------------- > 0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC > (1 row) > > > Testings, complains, advice, comment improvements are very appreciated. This seems to be a messy topic. The usage of "AD" and "BC" imply that TO_DATE is using the anno domini system which doesn't have a year 0, but in the DATE type perhaps we are using the ISO 8601 model[2] where 1 BC is represented as 0000, leading to the difference of one in all years before 1 AD? [1] https://en.wikipedia.org/wiki/Anno_Domini [2] https://en.wikipedia.org/wiki/ISO_8601#Years -- Thomas Munro http://www.enterprisedb.com
On 2/22/16, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Tue, Feb 23, 2016 at 11:58 AM, Vitaly Burovoy > <vitaly.burovoy@gmail.com> wrote: >> Hello, Hackers! >> >> I'm writing another patch and while I was trying to cover corner cases >> I found that to_date and to_timestamp work wrong if year in input >> value is zero or negative: >> >> postgres=# SELECT >> postgres-# y || '-06-01' as src >> postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN >> ('00'||(-y)||'-06-01 BC') END::date >> postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') >> postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') >> postgres-# FROM (VALUES(2),(1),(0),(-1),(-2))t(y); >> src | date | to_date | to_timestamp >> ----------+---------------+---------------+--------------------------- >> 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 >> 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 >> 0-06-01 | | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC >> -1-06-01 | 0001-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC >> -2-06-01 | 0002-06-01 BC | 0003-06-01 BC | 0003-06-01 00:00:00+00 BC >> (5 rows) >> >> Zero year (and century) is accepted and negative years differs by 1 >> from what they should be. >> >> >> I've written a patch fixes that. With it results are correct: >> postgres=# SELECT >> postgres-# y || '-06-01' as src >> postgres-# ,CASE WHEN y>0 THEN ('00'||y||'-06-01') WHEN y<0 THEN >> ('00'||(-y)||'-06-01 BC') END::date >> postgres-# ,to_date(y || '-06-01', 'YYYY-MM-DD') >> postgres-# ,to_timestamp(y || '-06-01', 'YYYY-MM-DD') >> postgres-# FROM (VALUES(2),(1),(-1),(-2))t(y); >> src | date | to_date | to_timestamp >> ----------+---------------+---------------+--------------------------- >> 2-06-01 | 0002-06-01 | 0002-06-01 | 0002-06-01 00:00:00+00 >> 1-06-01 | 0001-06-01 | 0001-06-01 | 0001-06-01 00:00:00+00 >> -1-06-01 | 0001-06-01 BC | 0001-06-01 BC | 0001-06-01 00:00:00+00 BC >> -2-06-01 | 0002-06-01 BC | 0002-06-01 BC | 0002-06-01 00:00:00+00 BC >> (4 rows) >> >> >> When year "0" is given, it raises an ERROR: >> postgres=# SELECT to_timestamp('0000*01*01', 'YYYY*MM*DD'); >> ERROR: invalid input string for "YYYY" >> DETAIL: Year cannot be 0. >> >> >> Also I change behavior for era indicator when negatives century or >> year are given. In such case era indicator is ignored (for me it is >> obvious signs should be OR-ed): >> postgres=# SELECT to_timestamp('-0010*01*01 BC', 'YYYY*MM*DD BC') >> postgres-# ,to_timestamp(' 0010*01*01 BC', 'YYYY*MM*DD BC'); >> to_timestamp | to_timestamp >> ---------------------------+--------------------------- >> 0010-01-01 00:00:00+00 BC | 0010-01-01 00:00:00+00 BC >> (1 row) >> >> >> Testings, complains, advice, comment improvements are very appreciated. > > This seems to be a messy topic. The usage of "AD" and "BC" imply that > TO_DATE is using the anno domini system which doesn't have a year 0, > but in the DATE type perhaps we are using the ISO 8601 model[2] where > 1 BC is represented as 0000, leading to the difference of one in all > years before 1 AD? > > [1] https://en.wikipedia.org/wiki/Anno_Domini > [2] https://en.wikipedia.org/wiki/ISO_8601#Years > > -- > Thomas Munro > http://www.enterprisedb.com Thank you for fast reply and for the link[2]. Be honest I didn't know ISO8601 specifies 1 BC as +0000. But the documentation[3] doesn't points that ISO8601 is used for "YYYY", but it is mentioned for "IYYY", and it is slightly deceiving. Also I remember that the other part of the documentation says[4] that "Keep in mind there is no 0 AD" that's why I decided it is impossible to pass 0000 for YYYY. Currently behavior with YYYY=0 is still surprising in some cases: postgres=# SELECT postgres-# to_date('20 0000-01-01', 'CC YYYY-MM-DD'), postgres-# to_date('20 0001-01-01', 'CC YYYY-MM-DD'); to_date | to_date ------------+------------1901-01-01 | 0001-01-01 (1 row) but the documentation[3] says "In conversions from string to timestamp or date, the CC (century) field is ignored if there is a YYY, YYYY or Y,YYY field." So is it shared opinion that ISO8601 is used for "YYYY"? [3]http://www.postgresql.org/docs/devel/static/functions-formatting.html [4]http://www.postgresql.org/docs/devel/static/functions-datetime.html -- Best regards, Vitaly Burovoy
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation: tested, failed Applied this patch, it works well, make what it expected correctly, code style is maintained. Regression tests passed OK.No documentation or comments.
On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov <i.kartyshov@postgrespro.ru> wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed
Applied this patch, it works well, make what it expected correctly, code style is maintained. Regression tests passed OK. No documentation or comments.
Why does it say "tested, failed" for all points above there? ;-)
--
Alex
<p dir="ltr">> Why does it say "tested, failed" for all points above there? ;-)<p dir="ltr">Hi, I just used Web reviewerform on <a href="https://commitfest.postgresql.org">https://commitfest.postgresql.org</a> to make review on patch,but form doesn't work properly unlike the patch.))
On 2/26/16, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote: > On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov > <i.kartyshov@postgrespro.ru> > wrote: > >> The following review has been posted through the commitfest application: > >> make installcheck-world: tested, failed >> Implements feature: tested, failed >> Spec compliant: tested, failed >> Documentation: tested, failed >> >> Applied this patch, it works well, make what it expected correctly, code >> style is maintained. Regression tests passed OK. No documentation or >> comments. >> > > Why does it say "tested, failed" for all points above there? ;-) I hope Ivan misunderstood that "tested" and "passed" are two different options, not a single "tested, passed" ;-) Unfortunately, it seems there should be a discussion about meaning of "YYYY": it seems authors made it as ISO8601-compliant (but there are still several bugs), but there is no proofs in the documentation[1]. On the one hand there is only "year" mentioned for "YYYY", "YYY", etc., and "ISO 8601 ... year" is "week-numbering", i.e. "IYYY", "IYY", etc. only for using with "IDDD", "ID" and "IW". On the other hand "extract" has two different options: "year" and "isoyear" and only the second one is ISO8601-compliant (moreover it is week-numbering at the same time): postgres=# SELECT y src, EXTRACT(year FROM y) AS year, EXTRACT(isoyear FROM y) AS isoyear postgres-# FROM unnest(ARRAY[ postgres(# '4713-01-01 BC','4714-12-31 BC','4714-12-29 BC','4714-12-28 BC']::date[]) y; src | year | isoyear ---------------+-------+---------4713-01-01 BC | -4713 | -47124714-12-31 BC | -4714 | -47124714-12-29 BC | -4714 | -47124714-12-28 BC | -4714 | -4713 (4 rows) So there is lack of consistency: something should be changed: either "extract(year..." (and the documentation[2], but there is "Keep in mind there is no 0 AD, ..." for a long time, so it is a change which breaks compatibility; and the patch will be completely different), or the patch (it has an influence on "IYYY", so it is obviously wrong). Now (after the letter[3] from Thomas Munro) I know the patch is not ready for committing even with minimal changes. But I'm waiting for a discussion: what part should be changed? I would change behavior of "to_date" and "to_timestamp" to match with extract options "year"/"isoyear", but I think getting decision of the community before modifying the patch is a better idea. [1]http://www.postgresql.org/docs/devel/static/functions-formatting.html [2]http://www.postgresql.org/docs/devel/static/functions-datetime.html [3]http://www.postgresql.org/message-id/CAEepm=0AzNVy+frtYni04iMdW4TLGZAeLLJqMHMcJBrE+LnWNQ@mail.gmail.com -- Best regards, Vitaly Burovoy
On Tue, Feb 23, 2016 at 6:23 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > This seems to be a messy topic. The usage of "AD" and "BC" imply that > TO_DATE is using the anno domini system which doesn't have a year 0, > but in the DATE type perhaps we are using the ISO 8601 model[2] where > 1 BC is represented as 0000, leading to the difference of one in all > years before 1 AD? Well, the way to figure that out, I think, is to look at the documentation. I had a look at... http://www.postgresql.org/docs/9.5/static/functions-formatting.html ...which says... YYYY year (4 or more digits) IYYY ISO 8601 week-numbering year (4 or more digits) I don't really understand ISO 8601, but if IYYY is giving us an ISO 8601 thing, then presumably YYYY is not supposed to be giving us that. The same page elsewhere refers to Gregorian dates,and other parts of the documentation seem to agree that's what we use. But having said that, this is kind of a weird situation. We're talking about this: rhaas=# SELECT y || '-06-01', to_date (y || '-06-01', 'YYYY-MM-DD') FROM (VALUES (2), (1), (0), (-1), (-2)) t(y);?column? | to_date ----------+---------------2-06-01 | 0002-06-011-06-01 | 0001-06-010-06-01 | 0001-06-01 BC-1-06-01 | 0002-06-01 BC-2-06-01| 0003-06-01 BC (5 rows) Now, I would be tempted to argue that passing to_date('-1-06-01', 'YYYY-MM-DD') ought to do the same thing as to_date('pickle', 'YYYY-MM-DD') i.e. throw an error. There's all kinds of what seems to me to be shoddy error checking in this area: rhaas=# select to_date('-3', 'YYYY:MM&DD'); to_date ---------------0004-01-01 BC (1 row) It's pretty hard for me to swallow the argument that the input matches the provided format. However, I'm not sure we ought to tinker with the behavior in this area. If YYYY-MM-DD is going to accept things that are not of the format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format, then I think it should probably keep doing the same things it's always done. If you want to supply a BC date, why not do this: rhaas=# select to_date('0001-06-01 BC', 'YYYY-MM-DD BC'); to_date ---------------0001-06-01 BC (1 row) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>But I'm waiting for a discussion: what part should be changed? I for compliance with the standard (all ISO). In addition Oracle uses "IYYY" format. Standards allow to reduce liability. But I think someone like Tom Lane can have the final say because we break backward compatibility. Options "year"/"isoyear" may confuse the users. Thanks. -- Yury Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 2/27/16, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Feb 23, 2016 at 6:23 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> This seems to be a messy topic. The usage of "AD" and "BC" imply that >> TO_DATE is using the anno domini system which doesn't have a year 0, >> but in the DATE type perhaps we are using the ISO 8601 model[2] where >> 1 BC is represented as 0000, leading to the difference of one in all >> years before 1 AD? > > Well, the way to figure that out, I think, is to look at the > documentation. I had a look at... > > http://www.postgresql.org/docs/9.5/static/functions-formatting.html > > ...which says... > > YYYY year (4 or more digits) > IYYY ISO 8601 week-numbering year (4 or more digits) > > I don't really understand ISO 8601, but if IYYY is giving us an ISO > 8601 thing, then presumably YYYY is not supposed to be giving us that. > The same page elsewhere refers to Gregorian dates, and other parts > of the documentation seem to agree that's what we use. > > But having said that, this is kind of a weird situation. We're > talking about this: > > rhaas=# SELECT y || '-06-01', to_date (y || '-06-01', 'YYYY-MM-DD') > FROM (VALUES (2), (1), (0), (-1), (-2)) t(y); > ?column? | to_date > ----------+--------------- > 2-06-01 | 0002-06-01 > 1-06-01 | 0001-06-01 > 0-06-01 | 0001-06-01 BC > -1-06-01 | 0002-06-01 BC > -2-06-01 | 0003-06-01 BC > (5 rows) > > Now, I would be tempted to argue that passing to_date('-1-06-01', > 'YYYY-MM-DD') ought to do the same thing as to_date('pickle', > 'YYYY-MM-DD') i.e. throw an error. There's all kinds of what seems to > me to be shoddy error checking in this area: > > rhaas=# select to_date('-3', 'YYYY:MM&DD'); > to_date > --------------- > 0004-01-01 BC > (1 row) > > It's pretty hard for me to swallow the argument that the input matches > the provided format. > > However, I'm not sure we ought to tinker with the behavior in this > area. If YYYY-MM-DD is going to accept things that are not of the > format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format, It is not about format, it is about values. > then I think it should probably keep doing the same things it's always > done. If you want to supply a BC date, why not do Because it is inconvenient a little. If one value ("-2345") is passed, another one ("2346 BC") is got. In the other case a programmer must check for negative value, and if so change a sign and add "BC" to the format. Moreover the programmer must keep in mind that it is not enough to have usual date format "DD/MM/YYYY", because sometimes there can be "BC" part. > this: > > rhaas=# select to_date('0001-06-01 BC', 'YYYY-MM-DD BC'); > to_date > --------------- > 0001-06-01 BC > (1 row) Also because of: postgres=# SELECT EXTRACT(year FROM to_date('-3', 'YYYY'));date_part ----------- -4 (1 row) Note that the documentation[1] says "Keep in mind there is no 0 AD". More examples by the link[2]. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company [1]http://www.postgresql.org/docs/devel/static/functions-datetime.html [2]http://www.postgresql.org/message-id/CAKOSWNn6KpfAbmrsHzyN8+2NHpyfVBdMPHYa5pQgUNy8LP2L2Q@mail.gmail.com -- Best regards, Vitaly Burovoy
On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote: >> However, I'm not sure we ought to tinker with the behavior in this >> area. If YYYY-MM-DD is going to accept things that are not of the >> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format, > > It is not about format, it is about values. I disagree. In a format like "-1-06-01", you want the first minus to indicate negation and the other two to be a separator. That's not very far away from wanting the database to read your mind. > Because it is inconvenient a little. If one value ("-2345") is passed, > another one ("2346 BC") is got. In the other case a programmer must > check for negative value, and if so change a sign and add "BC" to the > format. Moreover the programmer must keep in mind that it is not > enough to have usual date format "DD/MM/YYYY", because sometimes there > can be "BC" part. Yeah, well, that's life. You can write an alternative function to construct dates that works the way you like, and that may well be a good idea. But I think *this* change is not a good idea, and accordingly I vote we reject this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/11/16, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy > <vitaly.burovoy@gmail.com> wrote: >>> However, I'm not sure we ought to tinker with the behavior in this >>> area. If YYYY-MM-DD is going to accept things that are not of the >>> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format, >> >> It is not about format, it is about values. > > I disagree. In a format like "-1-06-01", you want the first minus to > indicate negation and the other two to be a separator. That's not > very far away from wanting the database to read your mind. It is not my wish. The database does it just now: postgres=# SELECT to_date('-1-06-01', 'YYYY'); to_date ---------------0002-01-01 BC (1 row) >> Because it is inconvenient a little. If one value ("-2345") is passed, >> another one ("2346 BC") is got. In the other case a programmer must >> check for negative value, and if so change a sign and add "BC" to the >> format. Moreover the programmer must keep in mind that it is not >> enough to have usual date format "DD/MM/YYYY", because sometimes there >> can be "BC" part. > > Yeah, well, that's life. You can write an alternative function to > construct dates that works the way you like, and that may well be a > good idea. But I think *this* change is not a good idea, and > accordingly I vote we reject this patch. My wish is to make the behavior be consistent. Since there are two reverse functions ("extract" and "to_date" ["to_timestamp" in fact is the same]), I expect that is described as "year" ("year"-"YYYY") means the same thing in both of them, the same with pairs "isoyear"-"IYYY", "dow"-"DDD", "isodow"-"IDDD", etc. Now "year" is _not_ the same as "YYYY" (but it cat be so according to the documentation: there is no mentioning of any ISO standard), whereas "isoyear" _is_ the same: postgres=# SELECT y, to_date(y, 'YYYY')YYYY,to_date(y, 'IYYY')IYYY postgres-# FROM(VALUES('-1-06-01'))t(y); y | yyyy | iyyy ----------+---------------+----------------1-06-01 | 0002-01-01 BC | 0002-01-01 BC (1 row) and postgres=# SELECT y, date_part('year', y)YYYY,date_part('isoyear', y)IYYY postgres-# FROM(VALUES('0002-06-01 BC'::date))t(y); y | yyyy | iyyy ---------------+------+------0002-06-01 BC | -2 | -1 (1 row) P.S.: proposed patch changes IYYY as well, but it is easy to fix it and I'm ready to do it after finding a consensus. -- Best regards, Vitaly Burovoy
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy > <vitaly.burovoy@gmail.com> wrote: >>> However, I'm not sure we ought to tinker with the behavior in this >>> area. If YYYY-MM-DD is going to accept things that are not of the >>> format YYYY-MM-DD, and I'd argue that -1-06-01 is not in that format, >> It is not about format, it is about values. > I disagree. In a format like "-1-06-01", you want the first minus to > indicate negation and the other two to be a separator. That's not > very far away from wanting the database to read your mind. [ catches up with thread... ] Yes. It would be more reasonable IMO for to_date to throw an error because this is bad input. On the other hand, to_date mostly doesn't throw an error no matter how bad the input is. I think that may have been intentional, although its habit of producing garbage output instead (and not, say, NULL) is certainly not very happy-making. It's a bit schizophrenic for this patch to be both adding ereport's for year zero (thereby breaking the no-failure-on-bad-input policy) *and* trying to produce sane output for arguably-insane input. I don't really see an argument why '0001-00-00' should be accepted but '0000-01-01' should throw an error, but that would be the result if we take this patch. And I quite agree with Robert that it's insane to consider '-2-06-01' as satisfying the format 'YYYY-MM-DD'. The fact that it even appears to do something related to a BC year is an implementation artifact, and not a very nice one. I would be in favor of a ground-up rewrite of to_date and friends, with some better-stated principles (in particular, a rationale why they even exist when date_in and friends usually do it better) and crisper error detection. But I'm not seeing the argument that hacking at the margins like this moves us forward on either point; what it does do is create another backward-compatibility hazard for any such rewrite. In short, I vote with Robert to reject this patch. BTW, the context for the original report wasn't clear, but I wonder how much of the actual problem could be addressed by teaching make_date() and friends to accept negative year values as meaning BC. regards, tom lane
On 3/11/16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ catches up with thread... ] > > Yes. It would be more reasonable IMO for to_date to throw an error > because this is bad input. On the other hand, to_date mostly doesn't > throw an error no matter how bad the input is. I think that may have > been intentional, although its habit of producing garbage output > instead (and not, say, NULL) is certainly not very happy-making. > > It's a bit schizophrenic for this patch to be both adding ereport's > for year zero (thereby breaking the no-failure-on-bad-input policy) > *and* trying to produce sane output for arguably-insane input. > > I don't really see an argument why '0001-00-00' should be accepted > but '0000-01-01' should throw an error, but that would be the result > if we take this patch. Well. In case of zero year it could return the first year instead of an exception by the same way as "MM" and "DD" do it... > And I quite agree with Robert that it's insane > to consider '-2-06-01' as satisfying the format 'YYYY-MM-DD'. The > fact that it even appears to do something related to a BC year is > an implementation artifact, and not a very nice one. > > I would be in favor of a ground-up rewrite of to_date and friends, with > some better-stated principles (in particular, a rationale why they even > exist when date_in and friends usually do it better) I think they exist because date_in can't convert something like "IYYY-IDDD" (I wonder if date_in can do so) or to parse dates/stamps independent from the "DateStyle" parameter. > and crisper error > detection. But I'm not seeing the argument that hacking at the margins > like this moves us forward on either point; what it does do is create > another backward-compatibility hazard for any such rewrite. > > In short, I vote with Robert to reject this patch. Accepted. Let's agree it is a case "garbage in, garbage out" and "an implementation artifact". > BTW, the context for the original report wasn't clear, The context was to make "extract" and "to_date"/"to_timestamp" be consistently reversible for "year"/"YYYY" for negative values (since both of them support ones). > but I wonder how > much of the actual problem could be addressed by teaching make_date() > and friends to accept negative year values as meaning BC. > > regards, tom lane Thank Thomas, Robert and Tom very much for an interesting (but short) discussion. -- Best regards, Vitaly Burovoy