Thread: BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

BUG #17795: Erroneous parsing of floating-poing components in DecodeISO8601Interval()

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17795
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 15.2
Operating system:   Ubuntu 22.04
Description:

I've encountered a pair of relatively new anomalies when using ISO-8601
intervals.
Firstly, a float value passed as a component of the interval
can produce an overflow (I performed the following exercises with clang 14
and gcc 11.3).
Let's begin with an integer component:
select interval 'P178956970Y';
178956970 years -- OK (178956970 * 12 == 2 ^ 31 - 8)
select interval 'P178956971Y';
interval out of range -- OK (178956971 * 12 == 2 ^ 31 + 4)

Compare with a float value:
select interval 'P.178956970e9Y';
178956970 years -- OK
And:
select interval 'P.178956970e9Y6M';
178956970 years 6 mons -- OK
But:
select interval 'P.178956971e9Y';
-178956970 years -8 mons -- not OK, previously: interval out of range

Though:
select interval 'P.178956970e9Y8M';
interval field value out of range -- OK
But:
select interval 'P.178956971e9Y8M';
-178956970 years  -- not OK, previously: interval out of range

As I can see, this is explained by the following code:
int         extra_months = (int) rint(frac * scale * MONTHS_PER_YEAR);
return !pg_add_s32_overflow(itm_in->tm_mon, extra_months,
&itm_in->tm_mon);

Here, when calculating extra_months with an out-of-range float,
we get the sentinel value: -2147483648 == -0x80000000.
And then pg_add_s32_overflow(0, -0x80000000, ...) (internally
__builtin_add_overflow() for me)
happily returns "no overflow".
For the case 'P.178956970e9Y8M' an overflow detected because extra_months
is
near the upper limit, but is not the sentinel value.
(BTW, clang' ubsan doesn't like conversion
"(int)out_of_int_range_number".)

And for the sake of completeness, the upper limit of the double type:
select interval 'P.17976931348623158e309Y';
-178956969 years -8 mons -- not OK, previously: interval field value out of
range
select interval 'P.17976931348623159e309Y';
invalid input syntax for type interval -- OK (out of the double range)

Here "previously" is the behavior observed on e39f99046~1.

The second anomaly is with parsing float values with an integer part:
select interval 'P.0e100Y';
00:00:00 -- OK
But:
select interval 'P1.0e100Y';
1 year -- previously: invalid input syntax for type interval
select interval 'P1.0e-100Y';
1 year -- previously: invalid input syntax for type interval

IIUC, the integer part is just added to the result of parsing a rest with
the scientific notation.

Similar anomaly can be seen with the D part:
select interval 'P.1e9D';
2400000000:00:00 -- OK, previously: 100000000 days
But
select interval 'P.1e10D';
-2562047788:00:54.775807 -- not OK, previously: 1000000000 days

select interval 'P1.1e9D';
1 day 2400000000:00:00 -- not OK, previously: 1100000000 days

Probably all these anomalies can be eliminated by disabling
the scientific notation (though it more or less worked previously),
just as it is not supported for ordinary (not ISO-8601) intervals.


PG Bug reporting form <noreply@postgresql.org> writes:
> I've encountered a pair of relatively new anomalies when using ISO-8601
> intervals.

Thanks for the report!  All of these seem to trace to the fact that
ParseISO8601Number isn't considering the possibility that the input
string ends with "eNN".  This allows the reported "fraction" output
to have values greater than 1, which breaks some calling code as you
show, and it also fails to apply that scale factor to the integer
part of the output, which accounts for some of the other cases.

On the whole I think the right fix is to go back to using strtod()
to parse the whole input string, as we did before e39f99046.  This
results in some inaccuracy if there's more than 15 digits in the
input, but I don't care enough about that scenario to work harder.

There are a couple of other places where datetime.c is using strtod()
to parse what it expects to be a fractional value.  I'm inclined to
just barf if the result is out of range.  Maybe we shouldn't back-patch
that aspect into v15, not sure.

Anyway I propose the attached.

            regards, tom lane

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index b74889039d..245651a778 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -698,6 +698,9 @@ ParseFraction(char *cp, double *frac)
         /* check for parse failure */
         if (*cp != '\0' || errno != 0)
             return DTERR_BAD_FORMAT;
+        /* check for bogus input such as .1e10 */
+        if (*frac >= 1.0)
+            return DTERR_BAD_FORMAT;
     }
     return 0;
 }
@@ -3080,6 +3083,9 @@ DecodeNumberField(int len, char *str, int fmask,
             frac = strtod(cp, NULL);
             if (errno != 0)
                 return DTERR_BAD_FORMAT;
+            /* check for bogus input such as .1e10 */
+            if (frac >= 1.0)
+                return DTERR_BAD_FORMAT;
             *fsec = rint(frac * 1000000);
         }
         /* Now truncate off the fraction for further processing */
@@ -3702,46 +3708,38 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
 static int
 ParseISO8601Number(char *str, char **endptr, int64 *ipart, double *fpart)
 {
-    int            sign = 1;
-
-    *ipart = 0;
-    *fpart = 0.0;
-
-    /* Parse sign if there is any */
-    if (*str == '-')
-    {
-        sign = -1;
-        str++;
-    }
+    double        val;

-    *endptr = str;
+    /*
+     * Historically this has accepted anything that strtod() would take,
+     * notably including "e" notation, so continue doing that.  This is
+     * slightly annoying because the precision of double is less than that of
+     * int64, so we would lose accuracy for inputs larger than 2^53 or so.
+     * However, historically we rejected inputs outside the int32 range,
+     * making that concern moot.  What we do now is reject abs(val) above
+     * 1.0e15 (a round number a bit less than 2^50), so that any accepted
+     * value will have an exact integer part, and thereby a fraction part with
+     * abs(*fpart) less than 1.  In the absence of field complaints it doesn't
+     * seem worth working harder.
+     */
+    if (!(isdigit((unsigned char) *str) || *str == '-' || *str == '.'))
+        return DTERR_BAD_FORMAT;
     errno = 0;
-
-    /* Parse int64 part if there is any */
-    if (isdigit((unsigned char) **endptr))
-        *ipart = strtoi64(*endptr, endptr, 10) * sign;
-
-    /* Parse fractional part if there is any */
-    if (**endptr == '.')
-    {
-        /*
-         * As in ParseFraction, some versions of strtod insist on seeing some
-         * digits after '.', but some don't.  We want to allow zero digits
-         * after '.' as long as there were some before it.
-         */
-        if (isdigit((unsigned char) *(*endptr + 1)))
-            *fpart = strtod(*endptr, endptr) * sign;
-        else
-        {
-            (*endptr)++;        /* advance over '.' */
-            str++;                /* so next test will fail if no digits */
-        }
-    }
-
-    /* did we not see anything that looks like a number? */
+    val = strtod(str, endptr);
+    /* did we not see anything that looks like a double? */
     if (*endptr == str || errno != 0)
         return DTERR_BAD_FORMAT;
-
+    /* watch out for overflow, including infinities; reject NaN too */
+    if (isnan(val) || val < -1.0e15 || val > 1.0e15)
+        return DTERR_FIELD_OVERFLOW;
+    /* be very sure we truncate towards zero (cf dtrunc()) */
+    if (val >= 0)
+        *ipart = (int64) floor(val);
+    else
+        *ipart = (int64) -floor(-val);
+    *fpart = val - *ipart;
+    /* Callers expect this to hold */
+    Assert(*fpart > -1.0 && *fpart < 1.0);
     return 0;
 }

diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index c7ac408bec..a154840c85 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -975,6 +975,12 @@ select interval 'P.1Y0M3DT4H5M6S';
  1 mon 3 days 04:05:06
 (1 row)

+select interval 'P10.5e4Y';  -- not per spec, but we've historically taken it
+   interval
+--------------
+ 105000 years
+(1 row)
+
 select interval 'P.Y0M3DT4H5M6S';  -- error
 ERROR:  invalid input syntax for type interval: "P.Y0M3DT4H5M6S"
 LINE 1: select interval 'P.Y0M3DT4H5M6S';
@@ -1081,13 +1087,13 @@ select interval 'PT2562047788:00:54.775807';
 select interval 'PT2562047788.0152155019444';
              interval
 -----------------------------------
- @ 2562047788 hours 54.775807 secs
+ @ 2562047788 hours 54.775429 secs
 (1 row)

 select interval 'PT-2562047788.0152155022222';
                interval
 ---------------------------------------
- @ 2562047788 hours 54.775808 secs ago
+ @ 2562047788 hours 54.775429 secs ago
 (1 row)

 -- overflow each date/time field
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 54745c40d6..af8e1ca0f4 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -328,6 +328,7 @@ select interval 'P1.0Y0M3DT4H5M6S';
 select interval 'P1.1Y0M3DT4H5M6S';
 select interval 'P1.Y0M3DT4H5M6S';
 select interval 'P.1Y0M3DT4H5M6S';
+select interval 'P10.5e4Y';  -- not per spec, but we've historically taken it
 select interval 'P.Y0M3DT4H5M6S';  -- error

 -- test a couple rounding cases that changed since 8.3 w/ HAVE_INT64_TIMESTAMP.

20.02.2023 02:40, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> I've encountered a pair of relatively new anomalies when using ISO-8601
>> intervals.
>>
>>
>> On the whole I think the right fix is to go back to using strtod()
>> to parse the whole input string, as we did before e39f99046.  This
>> results in some inaccuracy if there's more than 15 digits in the
>> input, but I don't care enough about that scenario to work harder.
This approach seems less fragile to me. All my examples shown before
work as expected now.
Thanks!
>> There are a couple of other places where datetime.c is using strtod()
>> to parse what it expects to be a fractional value.  I'm inclined to
>> just barf if the result is out of range.  Maybe we shouldn't back-patch
>> that aspect into v15, not sure.
I could not find a way to pass the e-notation there.
Looking at the callers of the ParseFraction() and DecodeNumberField()
I see only DecodeDate(), DecodeDateTime(), DecodeTimeOnly(),
and DecodeInterval(). All these functions pass to ParseFraction()
and DecodeNumberField() not full "fractional" strings, but fields, that
were extracted before. For example, with "SELECT time 'J0.5e1';"
there are three fields  "j", "0.5", "e1" passed to DecodeTimeOnly(),
or with "SELECT interval '1.5e1';" I see fields "1.5" and "e1".
(Maybe I miss something.)

Also, the comment added makes me wonder, whether ".1e-10"
(e.g., in "1.1e-10") can be considered bogus too. I would say so
(if we are going to just add a fraction produced by ParseFraction()
to some integer part later, we still get the wrong result
(for the scientific notation)), and if we want to have
a consistent syntax, maybe it's worth to check an input string for 'e'/'E',
but if not, then maybe leave it alone. I would prefer the latter for now.

Best regards,
Alexander



Alexander Lakhin <exclusion@gmail.com> writes:
> 20.02.2023 02:40, Tom Lane wrote:
>> There are a couple of other places where datetime.c is using strtod()
>> to parse what it expects to be a fractional value.  I'm inclined to
>> just barf if the result is out of range.  Maybe we shouldn't back-patch
>> that aspect into v15, not sure.

> I could not find a way to pass the e-notation there.

True, I was thinking of this as future-proofing not as closing any
live loophole.

> Also, the comment added makes me wonder, whether ".1e-10"
> (e.g., in "1.1e-10") can be considered bogus too. I would say so
> (if we are going to just add a fraction produced by ParseFraction()
> to some integer part later, we still get the wrong result
> (for the scientific notation)), and if we want to have
> a consistent syntax, maybe it's worth to check an input string for 'e'/'E',
> but if not, then maybe leave it alone. I would prefer the latter for now.

Fair enough.  I pushed the ParseISO8601Number fix without
touching the other two places.

            regards, tom lane