BUG #17390: Function, to_date() -- unexpected values and a request - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17390: Function, to_date() -- unexpected values and a request
Date
Msg-id 17390-3aea53f5c03c0e4a@postgresql.org
Whole thread Raw
Responses BUG #17390: Function, to_date() -- unexpected values and a request  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17390
Logged by:          Murthy Kambhampaty
Email address:      smk_va@yahoo.com
PostgreSQL version: 10.7
Operating system:   Windows
Description:

The to_date() function seems not to fully test out-of-range values. An
example query and output are listed below:

postgres=> select to_date('1999-00-01', 'YYYY"-"MM"-"DD')
        , to_date('1999.M00', 'YYYY".M"MM')
postgres-> ;
  to_date   |  to_date
------------+------------
 1999-01-01 | 1999-01-01
(1 row)
(also tested under with 13 and 14)

On the other hand, a too-high value for the month, such as 25, does result
in the expected out-of-range error. The above seems a bug.

The request has to do with processing of quarterly dates using the
'YYYY":Q"q' format string, and the like. The documentation presently notes
that, "In to_timestamp and to_date, weekday names or numbers (DAY, D, and
related field types) are accepted but are ignored for purposes of computing
the result. The same is true for quarter (Q) fields." First, it is easy to
miss this in the documentation. So a "CAUTION" notice, above the "Tip" would
be helpful; something to the effect that, "to_timestamp and to_date exist to
handle input formats that cannot be converted by simple casting. However,
certain input format specifiers are ignored, and thus the result may be
unexpected."

I submit, however, that returning '1999-01-01' in response to select
to_date('1999:Q3', 'YYYY":Q"q') is buggy -- the input is unambiguous in the
same sense that to_date('1999.M7', 'FXyyyy".M"mm') is, but the output is not
the same in both instances! Thus, the preferred approach here, IMHO, is to
return an error when the format string is ambiguous, and to return
consistent values when the format is unambiguous, with "consistent values"
observing the same rules as in make_date() and make_timestamp(). Thus, the
'DDD'-specifier in 'YYYY"-"DDD', would *not* be ignored, and the
'Q'-specifier in 'YYYY":Q"Q' would also not be ignored, whereas 'YYYY"-"DD'
and 'YYYY"-"D", and 'YYYY"-"WW' would return errors flagging the formats as
ambiguous or invalid. (It would also help to have clarity whether any format
specifiers other than 'yyyy' are case-insensitive.)

In my experience, monthly, quarterly, and weekly dates appear in government
statistics. (I have deployed PostgreSQL as part of a data-analysis stack
since version 7.2 or so, because I find it superior to many alternatives
because PostgreSQL returns errors when fed ambiguous or out-of-range data in
an ETL pipeline, rather than storing unexpected values; the above
inconsistencies in to_date() processing came as a mild shock.)


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17389: pg_repack creates race conditions on streaming replicas
Next
From: Devrim Gündüz
Date:
Subject: Re: BUG #17388: postgis cant be installed on postgres 11 and centos 7