Thread: BUG #15141: Faulty ISO 8601 parsing
The following bug has been logged on the website: Bug reference: 15141 Logged by: defanor Email address: defanor@uberspace.net PostgreSQL version: 10.0 Operating system: Any, apparently Description: The time parsing fails on some valid ISO times, with some locales, e.g.: # select to_timestamp('2018-04-03T01:45:00,728456785+0000')::timestamp with time zone; ERROR: invalid input syntax for type double precision: "2018-04-03T01:45:00,728456785+0000" LINE 1: select to_timestamp('2018-04-03T01:45:00,728456785+0000')::t... ^ Apparently the parsing is locale-dependent (using the locale-dependent strtod function), while ISO 8601 permits both comma and full stop, with a preference for comma (and without mentioning locales). Would be nice to handle both, so that any valid ISO times would get parsed.
The following bug has been logged on the website:
Bug reference: 15141
Logged by: defanor
Email address: defanor@uberspace.net
PostgreSQL version: 10.0
Operating system: Any, apparently
Description:
The time parsing fails on some valid ISO times, with some locales, e.g.:
# select to_timestamp('2018-04-03T01:45:00,728456785+0000'):: timestamp with
time zone;
ERROR: invalid input syntax for type double precision:
"2018-04-03T01:45:00,728456785+0000"
LINE 1: select to_timestamp('2018-04-03T01:45:00,728456785+0000')::t...
^
Apparently the parsing is locale-dependent (using the locale-dependent
strtod function), while ISO 8601 permits both comma and full stop, with a
preference for comma (and without mentioning locales). Would be nice to
handle both, so that any valid ISO times would get parsed.
The observed problem here is that you've called the single-argument version of to_timestamp, which takes a double, and the literal that you've supplied doesn't look like a double (i.e., it contains hyphens, the letter T, colons, a comma, and a plus sign). IOW, you've implicitly asked PostgreSQL to do: "SELECT ' 45:00,728456785+0000 '::double" and it rightly complains that it cannot.
2018-04-03T01:If you want to covert a string literal to a timestamp you need to use the two-argument version of the to_timestamp function and pass a format string that looks like the ISO 8601 standard.
David J.
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > The time parsing fails on some valid ISO times, with some locales, e.g.: > # select to_timestamp('2018-04-03T01:45:00,728456785+0000')::timestamp with > time zone; > ERROR: invalid input syntax for type double precision: > "2018-04-03T01:45:00,728456785+0000" This is confused: the single-argument form of to_timestamp() takes a float8 argument, not a timestamp, which is why the error message is phrased the way it is. I think you meant that this fails: # select '2018-04-03T01:45:00,728456785+0000'::timestamp with time zone; ERROR: invalid input syntax for type timestamp with time zone: "2018-04-03T01:45:00,728456785+0000" which it does, but I don't think we should do anything about it. There is not and never has been any dependency on LC_TIME properties in PG's timestamp I/O. Considering that we also have DateStyle to cope with, as well as a lot more flexibility in the input parser than ISO 8601 contemplates, I think allowing a comma instead of decimal point here would probably create more confusion than benefit. (Also, this is not the only aspect of 8601 that we don't support; for instance the option to specify fractional minutes or hours instead of fractional seconds. There again, I think the potential for confusion and error outweighs any benefit from being nominally more standards-compliant.) You probably could read data formatted this way using the two-argument form of to_timestamp() with a suitable format string. regards, tom lane
PG Bug reportingform <noreply@postgresql.org> writes:
> The time parsing fails on some valid ISO times, with some locales, e.g.:
> # select to_timestamp('2018-04-03T01:45:00,728456785+0000'):: timestamp with
> time zone;
> ERROR: invalid input syntax for type double precision:
> "2018-04-03T01:45:00,728456785+0000"
This is confused: the single-argument form of to_timestamp() takes a
float8 argument, not a timestamp, which is why the error message is
phrased the way it is.
I think you meant that this fails:
# select '2018-04-03T01:45:00,728456785+0000'::timestamp with time zone;
ERROR: invalid input syntax for type timestamp with time zone: "2018-04-03T01:45:00,728456785+0000"
which it does, but I don't think we should do anything about it.
There is not and never has been any dependency on LC_TIME properties in
PG's timestamp I/O. Considering that we also have DateStyle to cope with,
as well as a lot more flexibility in the input parser than ISO 8601
contemplates, I think allowing a comma instead of decimal point here
would probably create more confusion than benefit.
Ideally it would be as simple as:
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 8375b93c39..4a3c7382f2 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -653,8 +653,8 @@ ParseDateTime(const char *timestr, char *workbuf, size_t buflen,
else
ftype[nf] = DTK_NUMBER;
}
- /* Leading decimal point? Then fractional seconds... */
- else if (*cp == '.')
+ /* Leading decimal point or comma? Then fractional seconds... */
+ else if (*cp == '.' || *cp == ',')
{
APPEND_CHAR(bufp, bufend, *cp++);
while (isdigit((unsigned char) *cp))
But then one needs to contemplate the impact that has on:
/* ignore other punctuation but use as delimiter */
else if (ispunct((unsigned char) *cp))
{
cp++;
continue;
}
Its not LC_TIME dependent but a bit more complex implementation wise to detect commas used as separators in custom formats and a comma used as a fractional separator.
I don't foresee much end-user confusion involved here - especially if we only allow for fractional seconds at a relatively fixed location.
David J.
> I think you meant that this fails: > > # select '2018-04-03T01:45:00,728456785+0000'::timestamp with time zone; > ERROR: invalid input syntax for type timestamp with time zone: "2018-04-03T01:45:00,728456785+0000" Indeed, sorry – got that mixed up, copied a wrong part.
> (Also, this is not the only aspect of 8601 that we don't support; > for instance the option to specify fractional minutes or hours > instead of fractional seconds. There again, I think the potential > for confusion and error outweighs any benefit from being nominally > more standards-compliant.) > > You probably could read data formatted this way using the two-argument > form of to_timestamp() with a suitable format string. FWIW, the use case on which I've stumbled onto that was use of COPY: as I understand it, to_timestamp() won't help there, and it's not just about standards compliance for the sake of it; when reading (particularly test data) from stdin or a program, it is tempting to use date(1) from a shell script, or arbitrary ISO 8601 formatting functions. It's not hard to get around that either, but still would be nice to be able to rely on standard formats being handled (or perhaps to see it mentioned in the docs that the support is incomplete).
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Mon, Apr 2, 2018 at 5:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think you meant that this fails: >> # select '2018-04-03T01:45:00,728456785+0000'::timestamp with time zone; >> ERROR: invalid input syntax for type timestamp with time zone: >> "2018-04-03T01:45:00,728456785+0000" >> which it does, but I don't think we should do anything about it. >> There is not and never has been any dependency on LC_TIME properties in >> PG's timestamp I/O. Considering that we also have DateStyle to cope with, >> as well as a lot more flexibility in the input parser than ISO 8601 >> contemplates, I think allowing a comma instead of decimal point here >> would probably create more confusion than benefit. > Ideally it would be as simple as: > - /* Leading decimal point? Then fractional seconds... */ > - else if (*cp == '.') > + /* Leading decimal point or comma? Then fractional seconds... */ > + else if (*cp == '.' || *cp == ',') Well, it's not. The core problem here is that the datetime input parser treats commas as ignorable punctuation, equivalent to whitespace. For instance, we accept # select 'sunday, april 1, 2018'::timestamp; timestamp --------------------- 2018-04-01 00:00:00 (1 row) but not # select 'sunday. april 1, 2018'::timestamp; ERROR: invalid input syntax for type timestamp: "sunday. april 1, 2018" But surely it would not be a good idea to consider "T01:45:00,728456785" the same as "T01:45:00 728456785". Changing this would entail all sorts of side-effects on non-ISO data formats. Considering that this code is about old enough to vote, and this is the first complaint on this specific issue that I can recall hearing, I don't even think it's worth the time to investigate just what the side-effects would be ... there's no way that any proposed change would get past the backwards-compatibility hurdles. regards, tom lane