Thread: to_date() and to_timestamp() with negative years
I'm quoting here from "Usage notes for date/time formatting" just below "Table 9.25. Template Pattern Modifiers for Date/Time Formatting" here:
on the page "9.8. Data Type Formatting Functions". Find this:
«In to_timestamp and to_date, negative years are treated as signifying BC. If you write both a negative year and an explicit BC field, you get AD again. An input of year zero is treated as 1 BC.»
This seems to be a suspect solution looking for a problem for these reasons:
1. Nobody ever talks about dates by saying "the year minus 42". It's always "42 BC". Nor do they talk about "year zero" in the Gregorian calendar 'cos there's no such year.
2. If you invoke "make_timestamp()" or "make_timestamptz()" with a negative (or zero) argument for "year", then you get the "22008" error.
3. The text-to-date typecast « '-2021-06-15'::date » gets the "22009" error, as it does with "0000".
4. The unary "minus" operator is shorthand for "subtract the operand from zero" — i.e. "-x" means "0 - x". But there is no year zero. And anyway, the difference between two "date" values is an "integer" value; and the difference between two "timestamp[tz]" values is an "interval" value.
Finally, the implementation seems to be buggy (as I observed it in PG 13.4 on my MacBook with up-to-date macOS). Try this:
\x on
select
to_date( '15/06/-2021', 'DD/MM/YYYY' ) as a1,
to_date( '15/06/-2021 BC', 'DD/MM/YYYY/AD' ) as a2,
'' as "-",
to_date( '15 06 -2021', 'DD MM YYYY' ) as b1,
to_date( '15 06 -2021 BC', 'DD MM YYYY AD' ) as b2;
\x off
to_date( '15/06/-2021', 'DD/MM/YYYY' ) as a1,
to_date( '15/06/-2021 BC', 'DD/MM/YYYY/AD' ) as a2,
'' as "-",
to_date( '15 06 -2021', 'DD MM YYYY' ) as b1,
to_date( '15 06 -2021 BC', 'DD MM YYYY AD' ) as b2;
\x off
Notice that the difference between the first two expressions (that produce the values "a1" and "a2") and the second two expressions (that produce the values "b1" and "b2") is how the to-be-converted substrings for "DD", "MM", and "YYYY" are separated. Otherwise, they express the same intention. So "b1" should be the same as "a1" and "b2" should be the same as "a2".
This is the result:
a3 | 2021-06-15 BC
a4 | 2021-06-15
- |
b3 | 2021-06-15
b4 | 2021-06-15 BC
a4 | 2021-06-15
- |
b3 | 2021-06-15
b4 | 2021-06-15 BC
The "a" values are right (by the rule in the PG doc)—and the "b" values are wrong.
If you express the same intention without the complication of using "-2021", then you get the right results throughout (and for any other way that you care to separate the to-be-interpreted substrings).
Moreover, I have an old PG 11.9 in a Ubuntu VM. The same test gets these results:
a1 | 2022-06-15 BC
a2 | 2022-06-15 BC
- |
b1 | 2022-06-15 BC
b2 | 2022-06-15 BC
a2 | 2022-06-15 BC
- |
b1 | 2022-06-15 BC
b2 | 2022-06-15 BC
Also buggy. But in a different way. And moreover (again) — I tride the test in our current latest YugabyteDB. This uses the PG 11.2 C code as is. (You'll have to trust me when I say that the test executes entirely here and goes nowhere near our own code,) This is the result (again on my MacBook).
a1 | 2022-06-15 BC
a2 | 2022-06-15 BC
- |
b1 | 2021-06-15
b2 | 2021-06-15 BC
a2 | 2022-06-15 BC
- |
b1 | 2021-06-15
b2 | 2021-06-15 BC
Also buggy. But in YET a different way.
It looks like you have a tranche of baroque C code that wrongly implements a misguided idea. And that it's been tinkered around with between at least a couple of release boundaries without proper testing.
Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and "to_timestamp()" — just as they already are in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts to date-time moment values?
Bryn Llewellyn <bryn@yugabyte.com> writes: > Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and "to_timestamp()"— just as they already are in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts todate-time moment values? Uh, what? regression=# select make_timestamp(-44,3,15,0,0,0); make_timestamp ------------------------ 0044-03-15 00:00:00 BC (1 row) The other stuff you are talking about looks like confusion around which characters are minus signs in the data and which ones are field separators. Given the very squishy definitions of to_date/to_timestamp, I'm not surprised if that works only with carefully chosen field layouts --- but it does work for me with all of these cases: regression=# select to_date('-0044-03-15', 'YYYY-MM-DD'); to_date --------------- 0044-03-15 BC (1 row) regression=# select to_date('03-15--0044', 'MM-DD-YYYY'); to_date --------------- 0044-03-15 BC (1 row) regression=# select to_date('03/15/-0044', 'MM/DD/YYYY'); to_date --------------- 0044-03-15 BC (1 row) I'd be the first to agree that that code is a mess and could stand to be rewritten --- but I seriously doubt that we'd take a patch that intentionally breaks cases that work fine today. There's also the angle that these are supposed to be Oracle-compatible, so I wonder what Oracle does with such input. regards, tom lane
> tgl@sss.pgh.pa.us wrote: > > Bryn Llewellyn <bryn@yugabyte.com> writes: >> Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and "to_timestamp()"— just as they already are in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts todate-time moment values? > > Uh, what? > > regression=# select make_timestamp(-44,3,15,0,0,0); > make_timestamp > ------------------------ > 0044-03-15 00:00:00 BC > (1 row) > > The other stuff you are talking about looks like confusion around which > characters are minus signs in the data and which ones are field > separators. Given the very squishy definitions of to_date/to_timestamp, > I'm not surprised if that works only with carefully chosen field > layouts --- but it does work for me with all of these cases: > > regression=# select to_date('-0044-03-15', 'YYYY-MM-DD'); > to_date > --------------- > 0044-03-15 BC > (1 row) > > regression=# select to_date('03-15--0044', 'MM-DD-YYYY'); > to_date > --------------- > 0044-03-15 BC > (1 row) > > regression=# select to_date('03/15/-0044', 'MM/DD/YYYY'); > to_date > --------------- > 0044-03-15 BC > (1 row) > > I'd be the first to agree that that code is a mess and could stand to > be rewritten --- but I seriously doubt that we'd take a patch that > intentionally breaks cases that work fine today. There's also the > angle that these are supposed to be Oracle-compatible, so I wonder > what Oracle does with such input. I just ran your test: select make_timestamp(-44,3,15,0,0,0); in each of the three environments that I mentioned—but especially, therefore, in PG 13.4 (on macOS). It cased the error that I mentioned: ERROR: 22008: date field value out of range: -44-03-15 It's the same with "make_timestamp()". In what PG version did you run your test? I'm not surprised that grandfathered-in "solution" that I described will never go away—despite its questionable conceptualbasis.
On 11/3/21 11:18, Bryn Llewellyn wrote: >> tgl@sss.pgh.pa.us wrote: >> >> Bryn Llewellyn <bryn@yugabyte.com> writes: >>> Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and "to_timestamp()"— just as they already are in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts todate-time moment values? >> >> Uh, what? >> >> regression=# select make_timestamp(-44,3,15,0,0,0); >> make_timestamp >> ------------------------ >> 0044-03-15 00:00:00 BC >> (1 row) >> >> The other stuff you are talking about looks like confusion around which >> characters are minus signs in the data and which ones are field >> separators. Given the very squishy definitions of to_date/to_timestamp, >> I'm not surprised if that works only with carefully chosen field >> layouts --- but it does work for me with all of these cases: >> >> regression=# select to_date('-0044-03-15', 'YYYY-MM-DD'); >> to_date >> --------------- >> 0044-03-15 BC >> (1 row) >> >> regression=# select to_date('03-15--0044', 'MM-DD-YYYY'); >> to_date >> --------------- >> 0044-03-15 BC >> (1 row) >> >> regression=# select to_date('03/15/-0044', 'MM/DD/YYYY'); >> to_date >> --------------- >> 0044-03-15 BC >> (1 row) >> >> I'd be the first to agree that that code is a mess and could stand to >> be rewritten --- but I seriously doubt that we'd take a patch that >> intentionally breaks cases that work fine today. There's also the >> angle that these are supposed to be Oracle-compatible, so I wonder >> what Oracle does with such input. > > I just ran your test: > > select make_timestamp(-44,3,15,0,0,0); > > in each of the three environments that I mentioned—but especially, therefore, in PG 13.4 (on macOS). > > It cased the error that I mentioned: > > ERROR: 22008: date field value out of range: -44-03-15 > > It's the same with "make_timestamp()". https://www.postgresql.org/docs/14/release-14.html#id-1.11.6.5.3 " Allow make_timestamp()/make_timestamptz() to accept negative years (Peter Eisentraut) Negative values are interpreted as BC years. " > In what PG version did you run your test? > > I'm not surprised that grandfathered-in "solution" that I described will never go away—despite its questionable conceptualbasis. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
adrian.klaver@aklaver.com wrote:
Thanks, Adrian. What results do these queries give in Version 14? (Forgive me, I'll make time to install v14 very soon.)
Bryn wrote:tgl@sss.pgh.pa.us wrote:I just ran your test:
Bryn Llewellyn <bryn@yugabyte.com> writes:Is there any chance that you might be bold and simply make negative "year" values illegal in "to_date()" and "to_timestamp()" — just as they already are in "make_timestamp()", "make_timestamptz()", and the "from text" typecasts to date-time moment values?
Uh, what?
regression=# select make_timestamp(-44,3,15,0,0,0);
make_timestamp
------------------------
0044-03-15 00:00:00 BC
(1 row)
The other stuff you are talking about looks like confusion around which
characters are minus signs in the data and which ones are field
separators. Given the very squishy definitions of to_date/to_timestamp,
I'm not surprised if that works only with carefully chosen field
layouts --- but it does work for me with all of these cases:
regression=# select to_date('-0044-03-15', 'YYYY-MM-DD');
to_date
---------------
0044-03-15 BC
(1 row)
regression=# select to_date('03-15--0044', 'MM-DD-YYYY');
to_date
---------------
0044-03-15 BC
(1 row)
regression=# select to_date('03/15/-0044', 'MM/DD/YYYY');
to_date
---------------
0044-03-15 BC
(1 row)
I'd be the first to agree that that code is a mess and could stand to
be rewritten --- but I seriously doubt that we'd take a patch that
intentionally breaks cases that work fine today. There's also the
angle that these are supposed to be Oracle-compatible, so I wonder
what Oracle does with such input.
select make_timestamp(-44,3,15,0,0,0);
in each of the three environments that I mentioned—but especially, therefore, in PG 13.4 (on macOS).
It cased the error that I mentioned:
ERROR: 22008: date field value out of range: -44-03-15
It's the same with "make_timestamp()".
https://www.postgresql.org/docs/14/release-14.html#id-1.11.6.5.3
"
Allow make_timestamp()/make_timestamptz() to accept negative years (Peter Eisentraut)
Negative values are interpreted as BC years.
"In what PG version did you run your test?
I'm not surprised that grandfathered-in "solution" that I described will never go away—despite its questionable conceptual basis.
Thanks, Adrian. What results do these queries give in Version 14? (Forgive me, I'll make time to install v14 very soon.)
select make_timestamp(0,1,1,0,0,0);
select make_timestamp(-0,1,1,0,0,0);
select '-0001-01-01'::date; -- and similar for typecasts to timestamp and timestamptz
select '0000-01-01'::date;
select '-0000-01-01'::date;
select
to_date( '15/06/-2021', 'DD/MM/YYYY' ) as a1,
to_date( '15/06/-2021 BC', 'DD/MM/YYYY/AD' ) as a2,
'' as "-",
to_date( '15 06 -2021', 'DD MM YYYY' ) as b1,
to_date( '15 06 -2021 BC', 'DD MM YYYY AD' ) as b2;
to_date( '15/06/-2021', 'DD/MM/YYYY' ) as a1,
to_date( '15/06/-2021 BC', 'DD/MM/YYYY/AD' ) as a2,
'' as "-",
to_date( '15 06 -2021', 'DD MM YYYY' ) as b1,
to_date( '15 06 -2021 BC', 'DD MM YYYY AD' ) as b2;
On 11/3/21 1:39 PM, Bryn Llewellyn wrote: > /adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote:/ > >> /Bryn wrote:/ >> >>> In what PG version did you run your test? >>> I'm not surprised that grandfathered-in "solution" that I described >>> will never go away—despite its questionable conceptual basis. > > Thanks, Adrian. What results do these queries give in Version 14? > (Forgive me, I'll make time to install v14 very soon.) select version(); version ------------------------------------------------------------------------------------ PostgreSQL 14.0 on x86_64-pc-linux-gnu, compiled by gcc (SUSE Linux) 7.5.0, 64-bit > > select make_timestamp(0,1,1,0,0,0); ERROR: date field value out of range: 0-01-01 > select make_timestamp(-0,1,1,0,0,0); ERROR: date field value out of range: 0-01-01 > > select '-0001-01-01'::date; -- and similar for typecasts to timestamp > and timestamptz ERROR: invalid input syntax for type date: "-0001-01-01" LINE 1: select '-0001-01-01'::date; > > select '0000-01-01'::date; ERROR: date/time field value out of range: "0000-01-01" LINE 1: select '0000-01-01'::date; > > select '-0000-01-01'::date; ERROR: invalid input syntax for type date: "-0000-01-01" LINE 1: select '-0000-01-01'::date; > > select > to_date( '15/06/-2021', 'DD/MM/YYYY' ) as a1, > to_date( '15/06/-2021 BC', 'DD/MM/YYYY/AD' ) as a2, > '' as "-", > to_date( '15 06 -2021', 'DD MM YYYY' ) as b1, > to_date( '15 06 -2021 BC', 'DD MM YYYY AD' ) as b2; > a1 | a2 | - | b1 | b2 ---------------+------------+---+------------+--------------- 2021-06-15 BC | 2021-06-15 | | 2021-06-15 | 2021-06-15 BC Haven't had time to work through what the above is actually doing. -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > Haven't had time to work through what the above is actually doing. I think the first two are explained by 489c9c340: Also, arrange for the combination of a negative year and an explicit "BC" marker to cancel out and produce AD. This is how the negative-century case works, so it seems sane to do likewise. The last two look like a parsing issue: with no field separator (nope, the space doesn't count), the code is taking the dash as a field separator. regards, tom lane
> tgl@sss.pgh.pa.us wrote: > > Adrian Klaver <adrian.klaver@aklaver.com> writes: >> Haven't had time to work through what the above is actually doing. > > I think the first two are explained by 489c9c340: > > Also, arrange for the combination of a negative year and an > explicit "BC" marker to cancel out and produce AD. This is how > the negative-century case works, so it seems sane to do likewise. > > The last two look like a parsing issue: with no field separator (nope, the space doesn't count), the code is taking thedash as a field separator. Thanks, Adrian, for trying those tests. So if PG is aiming for one consistent story for years that are less than one, inall three APIs (to_date() and its cousins, make_timestamp[tz](), and text-to-date-time typecasts), then work still remains. The parsing problem seems to be a separable annoyance. I assume that Tom’s "nope, the space doesn't count” is a narrow commenton this corner of the implementation. It definitely counts here: select to_date('12345 01 01', 'YYYY MM DD'); And you have to have some kind of separator between the years substring and the adjacent one(s) even to succeed with yearsthat have more than four digits. Another usage note stresses that while this is OK: select to_date('12340101', 'YYYYMMDD'); this isn't: select to_date('123450101', 'YYYYMMDD'); and nor is anything else that you might dream up that does not have a separator as mentioned above. Tom asked about Oracle Database. I have a still-working Version 18.0 in a VM on my laptop. The query can't be so terse therebecause there's no implicit typecast from date to text. And there's the famous annoyance of "dual". I tried this first: select to_char(to_date('1234/01/01', 'YYYY/MM/DD'), 'YYYY/MM/DD') from dual; It gets the same output back as the input you gave. So far so good. Then I changed the input to '-1234/01/01'. It causedthis error: ORA-01841: (full) year must be between -4713 and +9999, and not be 0 So it seems that Oracle Database has its own problems. But at least the wording "must… not be 0" is clear—and not what PGwants to support.
On 11/3/21 15:56, Bryn Llewellyn wrote: >> tgl@sss.pgh.pa.us wrote: >> >> Adrian Klaver <adrian.klaver@aklaver.com> writes: >>> Haven't had time to work through what the above is actually doing. >> >> I think the first two are explained by 489c9c340: >> >> Also, arrange for the combination of a negative year and an >> explicit "BC" marker to cancel out and produce AD. This is how >> the negative-century case works, so it seems sane to do likewise. >> >> The last two look like a parsing issue: with no field separator (nope, the space doesn't count), the code is taking thedash as a field separator. > > Thanks, Adrian, for trying those tests. So if PG is aiming for one consistent story for years that are less than one, inall three APIs (to_date() and its cousins, make_timestamp[tz](), and text-to-date-time typecasts), then work still remains. > > The parsing problem seems to be a separable annoyance. I assume that Tom’s "nope, the space doesn't count” is a narrowcomment on this corner of the implementation. It definitely counts here: > > select to_date('12345 01 01', 'YYYY MM DD'); > > And you have to have some kind of separator between the years substring and the adjacent one(s) even to succeed with yearsthat have more than four digits. Another usage note stresses that while this is OK: > > select to_date('12340101', 'YYYYMMDD'); > > this isn't: > > select to_date('123450101', 'YYYYMMDD'); It does with a tweak: select to_date('12345 0101', 'FMYYYYMMDD'); to_date ------------- 12345-01-01 > > and nor is anything else that you might dream up that does not have a separator as mentioned above. > > Tom asked about Oracle Database. I have a still-working Version 18.0 in a VM on my laptop. The query can't be so tersethere because there's no implicit typecast from date to text. And there's the famous annoyance of "dual". If you go here: https://www.postgresql.org/docs/current/functions-formatting.html and search on Oracle you will see that there are other exceptions. Like most things there is not complete agreement on how closely to follow someone else's code. > > I tried this first: > > select to_char(to_date('1234/01/01', 'YYYY/MM/DD'), 'YYYY/MM/DD') from dual; > > It gets the same output back as the input you gave. So far so good. Then I changed the input to '-1234/01/01'. It causedthis error: > > ORA-01841: (full) year must be between -4713 and +9999, and not be 0 > > So it seems that Oracle Database has its own problems. But at least the wording "must… not be 0" is clear—and not whatPG wants to support. > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/3/21 17:00, Adrian Klaver wrote: > On 11/3/21 15:56, Bryn Llewellyn wrote: >>> tgl@sss.pgh.pa.us wrote: >>> >> And you have to have some kind of separator between the years >> substring and the adjacent one(s) even to succeed with years that have >> more than four digits. Another usage note stresses that while this is OK: >> >> select to_date('12340101', 'YYYYMMDD'); >> >> this isn't: >> >> select to_date('123450101', 'YYYYMMDD'); > > It does with a tweak: > > select to_date('12345 0101', 'FMYYYYMMDD'); > to_date > ------------- > 12345-01-01 Well that was just plain wrong. I was not paying attention. -- Adrian Klaver adrian.klaver@aklaver.com
adrian.klaver@aklaver.com wrote:On 11/3/21 17:00, Adrian Klaver wrote:On 11/3/21 15:56, Bryn Llewellyn wrote:tgl@sss.pgh.pa.us wrote:And you have to have some kind of separator between the years substring and the adjacent one(s) even to succeed with years that have more than four digits. Another usage note stresses that while this is OK:It does with a tweak:
select to_date('12340101', 'YYYYMMDD');
this isn't:
select to_date('123450101', 'YYYYMMDD');
select to_date('12345 0101', 'FMYYYYMMDD');
to_date
-------------
12345-01-01
Well that was just plain wrong. I was not paying attention.
I'm missing your point, Adrian. But I must confess that I'm guilty of not expressing myself clearly with my reference to years with more than four digits. I meant that reference only to make the point that, in that use-case, a space CAN act as a separator.
By the way, I'd been studying the "Usage notes for date/time formatting" section in the Version 11 PG docs because that's the PG version that YugabyteDB uses. It has eleven bullets. The corresponding section in the "Current" PG docs has just eight bullets and the overall wording of the section is quite different. However, the bullet, in the Version 11 docs, that started me on the testing that spurred my first email in this thread is identically worded in both versions. It says this:
«
In to_timestamp and to_date, negative years are treated as signifying BC. If you write both a negative year and an explicit BC field, you get AD again. An input of year zero is treated as 1 BC.
»
Back to the point about separators, the "Current" doc has this bullet:
«
A separator (a space or non-letter/non-digit character) in the template string of to_timestamp and to_date matches any single separator in the input string or is skipped, unless the FX option is used...
»
(There's no such explanation in the Version 11 doc—but never mind that.) I read this to mean that a space IS a viable separator. And yet Tom's "nope, the space doesn't count [as a separator]" claims the opposite. The bullet's wording, by the way, says that the minus sign is a separator. But yet it can also be taken to confer the meaning "BC" to a year. No wonder I'm confused.
Elsewhere the "Current" doc says that runs of two or more spaces have the same effect as a single space (in the absence of FX or FM complexity).
No wonder that examples like I showed sometimes produce the wrong results, even after more than one round of tinkering with the C implementation. Try this test (the to-be-converted text has runs of five spaces, and the template has runs of ten spaces):
select to_date(' 1950 02 14', 'YYYY MM DD');
select to_date(' -1950 02 14', 'YYYY MM DD');
select to_date(' 14 02 1950', 'DD MM YYYY');
select to_date(' 14 02 -1950', 'DD MM YYYY');
select to_date(' -1950 02 14', 'YYYY MM DD');
select to_date(' 14 02 1950', 'DD MM YYYY');
select to_date(' 14 02 -1950', 'DD MM YYYY');
"-1950" is taken as "BC" in the second to_date() but it is not so taken in the fourth to_date().
On 11/3/21 19:12, Bryn Llewellyn wrote: >> /adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote:/ >> > Back to the point about separators, the "Current" doc has this bullet: > > « > A separator (a space or non-letter/non-digit character) in the template > string of to_timestamp and to_date matches any single separator in > the input string or is skipped, unless the FX option is used... > » > > (There's no such explanation in the Version 11 doc—but never mind that.) > I read this to mean that a space IS a viable separator. And yet Tom's > "nope, the space doesn't count [as a separator]" claims the opposite. > The bullet's wording, by the way, says that the minus sign is a > separator. But yet it can also be taken to confer the meaning "BC" to a > year. No wonder I'm confused. > > Elsewhere the "Current" doc says that runs of two or more spaces have > the same effect as a single space (in the absence of FX or FM complexity). > > No wonder that examples like I showed sometimes produce the wrong > results, even after more than one round of tinkering with the C > implementation. Try this test (the to-be-converted text has runs of five > spaces, and the template has runs of ten spaces): It comes down to determining what is a separator and what is the negative sign. > > select to_date(' 1950 02 14', 'YYYY MM DD'); > select to_date(' -1950 02 14', 'YYYY MM DD'); The row above returns: select to_date(' -1950 02 14', 'YYYY MM DD'); to_date --------------- 1950-02-14 BC As you state below. Change it to: select to_date(' -1950 02 14', ' YYYY MM DD'); to_date ------------ 1950-02-14 and you don't get the BC as -1950 is taken as separator(-)1950 not negative(-)1950. > > select to_date(' 14 02 1950', 'DD MM YYYY'); > select to_date(' 14 02 -1950', 'DD MM YYYY'); The above returns: select to_date(' 14 02 -1950', 'DD MM YYYY'); to_date ------------ 1950-02-14 Change it to: select to_date(' 14 02 -1950', 'DD MMYYYY'); to_date --------------- 1950-02-14 BC and you get BC as -1950 is taken as negative(-)1950 not separator(-)1950. Not sure how this can be handled in a totally predictable way given the unpredictable ways in which datetime strings are formatted? The only thing I can say it is it points out that when working with datetimes settling on a standard format is your best defense against unpredictable results. > > "-1950" is taken as "BC" in the second to_date() but it is not so taken > in the fourth to_date(). > -- Adrian Klaver adrian.klaver@aklaver.com
adrian.klaver@aklaver.com wrote:
Not sure how this can be handled in a totally predictable way given the unpredictable ways in which datetime strings are formatted?
The only thing I can say it is it points out that when working with datetimes settling on a standard format is your best defense against unpredictable results.
Thank you very much, again, for your help with my seemingly endless nitpicking questions on this matter, Adrian. Here's the most favorable conclusion that I can draw:
1. A precedent has been set by The SQL Standard folks together with the PostgreSQL implementation and other implementations like Oracle Database. All this stretches back a long time—to more than four decades ago.
2. This has brought us in PG to the rules that "Table 9.26. Template Patterns for Date/Time Formatting", "Table 9.27. Template Pattern Modifiers for Date/Time Formatting", and "Usage notes for date/time formatting" set out and interpret.
3. The rules are hard to understand and the PG doc gives insufficient detail to allow the outcomes in corner cases like you just showed us to be predicted confidently. Some users seek to understand the rules by reading PG's source code.
4. Certainly, the rules set a significant parsing challenge. You hint that they might even prevent a complete practical solution to be implemented.
5. None of this matters when the app designer has the freedom to define how date-time values will be provided, as text values, by user interfaces or external systems. In these cases, the complexity can be controlled by edict and correct solutions can be painlessly implemented and tested. Not a day goes by that I don't have to enter a date value at a UI. And in every case, a UI gadget constrains my input and makes its individual fields available to the programmer without the need for parsing—so there's reason to be optimistic. The programmer can easily build the text representation of the date-time value to conform to the simple rules that the overall application design specified.
6. In some cases text data that's to be converted arrives in a format that cannot be influenced. And it might contain locutions like we've been discussing ("zero" meaning "1 BC", unfavorable paradigms for separating tokens, and the like). In these cases, the diligent programmer might, just, be able to use the full arsenal of available tools to implement a scheme that faultlessly parses the input. But the likelihood of bugs seems to be pretty big.
I'll say "case closed, now" — from my side, at least.
On 11/4/21 10:29 AM, Bryn Llewellyn wrote: >> /adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote:/ >> >> Not sure how this can be handled in a totally predictable way given >> the unpredictable ways in which datetime strings are formatted? >> >> The only thing I can say it is it points out that when working with >> datetimes settling on a standard format is your best defense against >> unpredictable results. > > Thank you very much, again, for your help with my seemingly endless > nitpicking questions on this matter, Adrian. Here's the most favorable > conclusion that I can draw: > > 3. The rules are hard to understand and the PG doc gives insufficient > detail to allow the outcomes in corner cases like you just showed us to > be predicted confidently. Some users seek to understand the rules by > reading PG's source code. I would say that is because datetimes in string formats are often hard to understand as a result of folks inventing their own formats. As an example a commit message of mine from some years ago: " Fix issue with date parsing of Javascript dates coming from browser on Windows. This occurred in both Firefox and Chrome. The issue being that the date had a timezone of Pacific Standard Time instead of PST like it does from a Linux machine. dateutils choked on that timezone and therefore the date was not parsed. The fix was to add fuzzy=True to the dateutils.parse(). This allows dateutils to skip over anything it does not understand and parse the rest. This works as the date does include the correct tz offset. For the record the date format of concern is-Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time) " Can't remember what version of Windows this was. The dates ended up in a Postgres database via Python code in a Django app. This fix is one of the reasons I really like the Python dateutils library. The solution being: from dateutil.parser import parse parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)', fuzzy=True) datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800)) To see what it is doing: parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)', fuzzy_with_tokens=True) (datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800)), (' ', ' ', ' ', ' (Pacific Standard Time)')) where everything after the datetime are tokens it ignored. I include this mainly as illustration that data clean up maybe more practical before it ever hits the database and in a library that is specialized for the task at hand. > > 4. Certainly, the rules set a significant parsing challenge. You hint > that they might even prevent a complete practical solution to be > implemented. Yes that would depend on someone coding an AI that can fully understand people and what there intentions where from a string value. Given the answers I get when asking people directly what they intended, I'm not holding my breath. > > 5. None of this matters when the app designer has the freedom to define > how date-time values will be provided, as text values, by user > interfaces or external systems. In these cases, the complexity can be > controlled by edict and correct solutions can be painlessly implemented > and tested. Not a day goes by that I don't have to enter a date value at > a UI. And in every case, a UI gadget constrains my input and makes its > individual fields available to the programmer without the need for > parsing—so there's reason to be optimistic. The programmer can easily > build the text representation of the date-time value to conform to the > simple rules that the overall application design specified. Yep, constraining the imagination of the end user solves a lot of problems. > > 6. In some cases text data that's to be converted arrives in a format > that cannot be influenced. And it might contain locutions like we've > been discussing ("zero" meaning "1 BC", unfavorable paradigms for > separating tokens, and the like). In these cases, the diligent > programmer might, just, be able to use the full arsenal of available > tools to implement a scheme that faultlessly parses the input. But the > likelihood of bugs seems to be pretty big. From my experience that pretty much defines all aspects of programming. > > I'll say "case closed, now" — from my side, at least. -- Adrian Klaver adrian.klaver@aklaver.com