Re: BUG #17390: Function, to_date() -- unexpected values and a request - Mailing list pgsql-bugs
From | smk_va@yahoo.com |
---|---|
Subject | Re: BUG #17390: Function, to_date() -- unexpected values and a request |
Date | |
Msg-id | 248542100.2597403.1643747736697@mail.yahoo.com Whole thread Raw |
In response to | Re: BUG #17390: Function, to_date() -- unexpected values and a request (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
David and Tom, thanks for your thoughtful responses. I accept that with resource constraints, backward compatibility/status quo reigns. Hopefully, this discussion will trigger someone's interest, and the team will give the go ahead for moving toward a more consistent implementation of to_date() and its ilk. Accordingly, I propose some guideposts.
As a basic premise, it seems fair to say that if the documentation specifies a range for a given field, that is a contract that the stated range is enforced in the implementation. I recognize that refining the documentation may be easier than modifying the implementation, so perhaps (<= 12) rather than (01--12) as the range for 'MM', in Table 9.26 of https://www.postgresql.org/docs/current/functions-formatting.html.
Searching the web for "Oracle to_date()" yields a link to the reference documentation from Oracle, which then includes a link to a detailed reference on "Datetime Format Models". I'll skip the link, as the link may change before this gets a look ;-) (It appears that JSON datetime conversions follow the ISO 8601 standards by convention. See, e.g., https://www.w3docs.com/snippets/javascript/the-right-json-date-format.html. So no direct impact of "jsonpath date conversions" on this discussion other than that the specified ranges be enforced on corresponding fields, near as I can tell.)
There seems to be a mismatch between the PostgreSQL implementation here, which ignores specified values for certain fields, versus the Oracle implementation, which specifies that certain fields at the end of input strings can be omitted. If the 'Q' field is omitted, and the implementation stores the date corresponding to the first quarter, that is one thing; but if the 'Q' field is specified as '3' and the implementation supplies '1', because it is "ignoring" the 'Q' field, that is an altogether different matter, and the fact that the latter produces inconsistencies argues against the "ignoring" approach. The Oracle documentation does detail which formatting fields can be specified in the "to_* datetime functions", and which cannot, which is an approach David suggests, and which works well, provided that the implementation flags the invalid format specification as an error rather than storing an inconsistent/unexpected value.
I understand that changes in implementation have to go through a deprecation cycle, and appreciate that someone will have to step up to get it done (I'm afraid my hacking is limited to high-level languages). PostgreSQL has a well-deserved reputation for standards compliance and code quality, and it would be great if the formatting functions could be brought up to snuff.
Thanks,
Murthy
On Tuesday, February 1, 2022, 10:47:19 AM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Pretend to_date doesn’t exist and just write a function that checks for
> valid inputs via RegEx and then parse it. Maybe some day someone will
> develop a new conversion function that has considerably stricter, and
> self-defined, behavior (but given that to_date is basically “close enough
> to get the job done” I’m not optimistic). Until then, protect yourself.
There are various things going on here:
* To the extent that to_date() has an accepted charter, it's "work
like the Oracle function of the same name". Thus, arguments like
"it'd be better for my use-case if it did X" are basically not going
to find any traction. Arguments like "Oracle's to_date() does X,
so we should too" have a better chance. Unfortunately, most of us
lack access to an Oracle instance, making it hard to investigate
such details.
* The formatting.c code has recently been hacked up to also support
jsonpath datetime conversions, meaning that there's a second source
of truth involved; any proposed change would need to also be measured
against whether it moves us closer to or further away from the relevant
jsonpath standard. That's another hard-to-get-hold-of reference, and
I'm not sure how detailed its answers would be anyway.
* There's also a fairly strong argument for maintaining backwards
compatibility, even if the existing behavior is arguably wrong per
the above points.
* formatting.c is an ill-documented, unmaintainable mess that nobody
is eager to touch. (The original author is long gone.)
These things combine to encourage leaving it at the status quo.
Somebody who was really motivated and willing to get their hands
dirty could perhaps get something done, but I don't think any of
the current crop of hackers are very interested.
regards, tom lane
> Pretend to_date doesn’t exist and just write a function that checks for
> valid inputs via RegEx and then parse it. Maybe some day someone will
> develop a new conversion function that has considerably stricter, and
> self-defined, behavior (but given that to_date is basically “close enough
> to get the job done” I’m not optimistic). Until then, protect yourself.
There are various things going on here:
* To the extent that to_date() has an accepted charter, it's "work
like the Oracle function of the same name". Thus, arguments like
"it'd be better for my use-case if it did X" are basically not going
to find any traction. Arguments like "Oracle's to_date() does X,
so we should too" have a better chance. Unfortunately, most of us
lack access to an Oracle instance, making it hard to investigate
such details.
* The formatting.c code has recently been hacked up to also support
jsonpath datetime conversions, meaning that there's a second source
of truth involved; any proposed change would need to also be measured
against whether it moves us closer to or further away from the relevant
jsonpath standard. That's another hard-to-get-hold-of reference, and
I'm not sure how detailed its answers would be anyway.
* There's also a fairly strong argument for maintaining backwards
compatibility, even if the existing behavior is arguably wrong per
the above points.
* formatting.c is an ill-documented, unmaintainable mess that nobody
is eager to touch. (The original author is long gone.)
These things combine to encourage leaving it at the status quo.
Somebody who was really motivated and willing to get their hands
dirty could perhaps get something done, but I don't think any of
the current crop of hackers are very interested.
regards, tom lane
pgsql-bugs by date: