Thread: BUG #17390: Function, to_date() -- unexpected values and a request

BUG #17390: Function, to_date() -- unexpected values and a request

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17390
Logged by:          Murthy Kambhampaty
Email address:      smk_va@yahoo.com
PostgreSQL version: 10.7
Operating system:   Windows
Description:

The to_date() function seems not to fully test out-of-range values. An
example query and output are listed below:

postgres=> select to_date('1999-00-01', 'YYYY"-"MM"-"DD')
        , to_date('1999.M00', 'YYYY".M"MM')
postgres-> ;
  to_date   |  to_date
------------+------------
 1999-01-01 | 1999-01-01
(1 row)
(also tested under with 13 and 14)

On the other hand, a too-high value for the month, such as 25, does result
in the expected out-of-range error. The above seems a bug.

The request has to do with processing of quarterly dates using the
'YYYY":Q"q' format string, and the like. The documentation presently notes
that, "In to_timestamp and to_date, weekday names or numbers (DAY, D, and
related field types) are accepted but are ignored for purposes of computing
the result. The same is true for quarter (Q) fields." First, it is easy to
miss this in the documentation. So a "CAUTION" notice, above the "Tip" would
be helpful; something to the effect that, "to_timestamp and to_date exist to
handle input formats that cannot be converted by simple casting. However,
certain input format specifiers are ignored, and thus the result may be
unexpected."

I submit, however, that returning '1999-01-01' in response to select
to_date('1999:Q3', 'YYYY":Q"q') is buggy -- the input is unambiguous in the
same sense that to_date('1999.M7', 'FXyyyy".M"mm') is, but the output is not
the same in both instances! Thus, the preferred approach here, IMHO, is to
return an error when the format string is ambiguous, and to return
consistent values when the format is unambiguous, with "consistent values"
observing the same rules as in make_date() and make_timestamp(). Thus, the
'DDD'-specifier in 'YYYY"-"DDD', would *not* be ignored, and the
'Q'-specifier in 'YYYY":Q"Q' would also not be ignored, whereas 'YYYY"-"DD'
and 'YYYY"-"D", and 'YYYY"-"WW' would return errors flagging the formats as
ambiguous or invalid. (It would also help to have clarity whether any format
specifiers other than 'yyyy' are case-insensitive.)

In my experience, monthly, quarterly, and weekly dates appear in government
statistics. (I have deployed PostgreSQL as part of a data-analysis stack
since version 7.2 or so, because I find it superior to many alternatives
because PostgreSQL returns errors when fed ambiguous or out-of-range data in
an ETL pipeline, rather than storing unexpected values; the above
inconsistencies in to_date() processing came as a mild shock.)


BUG #17390: Function, to_date() -- unexpected values and a request

From
"David G. Johnston"
Date:
On Monday, January 31, 2022, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17390
Logged by:          Murthy Kambhampaty
Email address:      smk_va@yahoo.com
PostgreSQL version: 10.7
Operating system:   Windows
Description:       

The to_date() function seems not to fully test out-of-range values. An
example query and output are listed below:

postgres=> select to_date('1999-00-01', 'YYYY"-"MM"-"DD')
        , to_date('1999.M00', 'YYYY".M"MM')
postgres-> ;
  to_date   |  to_date
------------+------------
 1999-01-01 | 1999-01-01
(1 row)
(also tested under with 13 and 14)

On the other hand, a too-high value for the month, such as 25, does result
in the expected out-of-range error. The above seems a bug.


IIRC, you are correct but we choose not to break existing code in this case.

However,
certain input format specifiers are ignored, and thus the result may be
unexpected.

I would suggest we update the table to note which fields are to be used in to_char() only.


I submit, however, that returning '1999-01-01' in response to select
to_date('1999:Q3', 'YYYY":Q"q') is buggy


Well, we did say we ignore “quarter” so we have to output a valid date given only a year, so 1/1 of that year it is, 

In my experience, monthly, quarterly, and weekly dates appear in government
statistics. (I have deployed PostgreSQL as part of a data-analysis stack
since version 7.2 or so, because I find it superior to many alternatives
because PostgreSQL returns errors when fed ambiguous or out-of-range data in
an ETL pipeline, rather than storing unexpected values; the above
inconsistencies in to_date() processing came as a mild shock.)

Pretend to_date doesn’t exist and just write a function that checks for valid inputs via RegEx and then parse it.  Maybe some day someone will develop a new conversion function that has considerably stricter, and self-defined, behavior (but given that to_date is basically “close enough to get the job done” I’m not optimistic).  Until then, protect yourself.

David J.

Re: BUG #17390: Function, to_date() -- unexpected values and a request

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Pretend to_date doesn’t exist and just write a function that checks for
> valid inputs via RegEx and then parse it.  Maybe some day someone will
> develop a new conversion function that has considerably stricter, and
> self-defined, behavior (but given that to_date is basically “close enough
> to get the job done” I’m not optimistic).  Until then, protect yourself.

There are various things going on here:

* To the extent that to_date() has an accepted charter, it's "work
like the Oracle function of the same name".  Thus, arguments like
"it'd be better for my use-case if it did X" are basically not going
to find any traction.  Arguments like "Oracle's to_date() does X,
so we should too" have a better chance.  Unfortunately, most of us
lack access to an Oracle instance, making it hard to investigate
such details.

* The formatting.c code has recently been hacked up to also support
jsonpath datetime conversions, meaning that there's a second source
of truth involved; any proposed change would need to also be measured
against whether it moves us closer to or further away from the relevant
jsonpath standard.  That's another hard-to-get-hold-of reference, and
I'm not sure how detailed its answers would be anyway.

* There's also a fairly strong argument for maintaining backwards
compatibility, even if the existing behavior is arguably wrong per
the above points.

* formatting.c is an ill-documented, unmaintainable mess that nobody
is eager to touch.  (The original author is long gone.)

These things combine to encourage leaving it at the status quo.
Somebody who was really motivated and willing to get their hands
dirty could perhaps get something done, but I don't think any of
the current crop of hackers are very interested.

            regards, tom lane



Re: BUG #17390: Function, to_date() -- unexpected values and a request

From
"smk_va@yahoo.com"
Date:
David and Tom, thanks for your thoughtful responses. I accept that with resource constraints, backward compatibility/status quo reigns. Hopefully, this discussion will trigger someone's interest, and the team will give the go ahead for moving toward a more consistent implementation of to_date() and its ilk. Accordingly, I propose some guideposts.

As a basic premise, it seems fair to say that if the documentation specifies a range for a given field, that is a contract that the stated range is enforced in the implementation. I recognize that refining the documentation may be easier than modifying the implementation, so perhaps (<= 12) rather than (01--12) as the range for 'MM', in Table 9.26 of https://www.postgresql.org/docs/current/functions-formatting.html.

Searching the web for "Oracle to_date()" yields a link to the reference documentation from Oracle, which then includes a link to a detailed reference on "Datetime Format Models". I'll skip the link, as the link may change before this gets a look ;-) (It appears that JSON datetime conversions follow the ISO 8601 standards by convention. See, e.g., https://www.w3docs.com/snippets/javascript/the-right-json-date-format.html. So no direct impact of "jsonpath date conversions" on this discussion other than that the specified ranges be enforced on corresponding fields, near as I can tell.)

There seems to be a mismatch between the PostgreSQL implementation here, which ignores specified values for certain fields, versus the Oracle implementation, which specifies that certain fields at the end of input strings can be omitted. If the 'Q' field is omitted, and the implementation stores the date corresponding to the first quarter, that is one thing; but if the 'Q' field is specified as '3' and the implementation supplies '1', because it is "ignoring" the 'Q' field, that is an altogether different matter, and the fact that the latter produces inconsistencies argues against the "ignoring" approach. The Oracle documentation does detail which formatting fields can be specified in the "to_* datetime functions", and which cannot, which is an approach David suggests, and which works well, provided that the implementation flags the invalid format specification as an error rather than storing an inconsistent/unexpected value.

I understand that changes in implementation have to go through a deprecation cycle, and appreciate that someone will have to step up to get it done (I'm afraid my hacking is limited to high-level languages). PostgreSQL has a well-deserved reputation for standards compliance and code quality, and it would be great if the formatting functions could be brought up to snuff.

Thanks, 
    Murthy

On Tuesday, February 1, 2022, 10:47:19 AM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote:


"David G. Johnston" <david.g.johnston@gmail.com> writes:

> Pretend to_date doesn’t exist and just write a function that checks for
> valid inputs via RegEx and then parse it.  Maybe some day someone will
> develop a new conversion function that has considerably stricter, and
> self-defined, behavior (but given that to_date is basically “close enough
> to get the job done” I’m not optimistic).  Until then, protect yourself.


There are various things going on here:

* To the extent that to_date() has an accepted charter, it's "work
like the Oracle function of the same name".  Thus, arguments like
"it'd be better for my use-case if it did X" are basically not going
to find any traction.  Arguments like "Oracle's to_date() does X,
so we should too" have a better chance.  Unfortunately, most of us
lack access to an Oracle instance, making it hard to investigate
such details.

* The formatting.c code has recently been hacked up to also support
jsonpath datetime conversions, meaning that there's a second source
of truth involved; any proposed change would need to also be measured
against whether it moves us closer to or further away from the relevant
jsonpath standard.  That's another hard-to-get-hold-of reference, and
I'm not sure how detailed its answers would be anyway.

* There's also a fairly strong argument for maintaining backwards
compatibility, even if the existing behavior is arguably wrong per
the above points.

* formatting.c is an ill-documented, unmaintainable mess that nobody
is eager to touch.  (The original author is long gone.)

These things combine to encourage leaving it at the status quo.
Somebody who was really motivated and willing to get their hands
dirty could perhaps get something done, but I don't think any of
the current crop of hackers are very interested.

            regards, tom lane