Re: Infinite Interval - Mailing list pgsql-hackers
From | Joseph Koshakow |
---|---|
Subject | Re: Infinite Interval |
Date | |
Msg-id | CAAvxfHdC06PmLPDdHGBPTZ52VbO4ZQ+2s1Do08sw9TuV0o=F9A@mail.gmail.com Whole thread Raw |
In response to | Re: Infinite Interval (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Infinite Interval
Re: Infinite Interval |
List | pgsql-hackers |
> > The problem is that `secs = rint(secs)` rounds the seconds too early
> > and loses any fractional seconds. Do we have an overflow detecting
> > multiplication function for floats?
>
> We have float8_mul() which checks for overflow. typedef double float8;
I've updated patch 2 to use this. I also realized that the implicit
cast from double to int64 can also result in an overflow. For example,
even after adding float8_mul() we can see this:
SELECT make_interval(0, 0, 0, 0, 0, 0,17976931348623157000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000);
make_interval
--------------------------
-2562047788:00:54.775808
(1 row)
So I added a check for FLOAT8_FITS_IN_INT64() and a test with this
scenario.
> Without this patch
> postgres@64807=#SELECT INTERVAL '2147483646 months' + INTERVAL '1 month';
> ?column?
> ------------------------
> 178956970 years 7 mons
> (1 row)
>
> That result looks correct
>
> postgres@64807=#select 178956970 * 12 + 7;
> ?column?
> ------------
> 2147483647
>
> (1 row)
>
> So some backward compatibility break. I don't think we can avoid the
> backward compatibility break without expanding interval structure and
> thus causing on-disk breakage. But we can reduce the chances of
> breaking, if we change INTERVAL_NOT_FINITE to check all the three
> fields, instead of just month.
For what it's worth I think that 2147483647 months only became a valid
interval in v15 as part of this commit [0]. It's also outside of the
documented valid range [1], which is
[-178000000 years, 178000000 years] or
[-14833333 months, 14833333 months].
The rationale for only checking the month's field is that it's faster
than checking all three fields, though I'm not entirely sure if it's
the right trade-off. Any thoughts on this?
> >
> >
> > > + else
> > > + {
> > > + result->time = -interval->time;
> > > + result->day = -interval->day;
> > > + result->month = -interval->month;
> > > +
> > > + if (INTERVAL_NOT_FINITE(result))
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> > > + errmsg("interval out of range")));
> > >
> > > If this error ever gets to the user, it could be confusing. Can we elaborate by
> > > adding context e.g. errcontext("while negating an interval") or some such?
> >
> > Done.
>
> Thanks. Can we add relevant contexts at similar other places?
I've added an errcontext to all the errors of the form "X out of
range". My one concern is that some of the messages can be slightly
confusing. For example date arithmetic is converted to timestamp
arithmetic, so the errcontext talks about timestamps even though the
actual operation used dates. For example,
SELECT date 'infinity' + interval '-infinity';
ERROR: interval out of range
CONTEXT: while adding an interval and timestamp
> Also if we use all the three fields, we will need to add such checks
> in interval_justify_hours()
I added these for now because even if we stick to just using the month
field, it will be good future proofing.
> @@ -3047,7 +3180,30 @@ timestamptz_pl_interval_internal(TimestampTz timestamp,
> TimestampTz result;
> int tz;
>
> - if (TIMESTAMP_NOT_FINITE(timestamp))
> + /*
> + * Adding two infinites with the same sign results in an infinite
> + * timestamp with the same sign. Adding two infintes with different signs
> + * results in an error.
> + */
> + if (INTERVAL_IS_NOBEGIN(span))
> + {
> + if TIMESTAMP_IS_NOEND(timestamp)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
> + else
> + TIMESTAMP_NOBEGIN(result);
> + }
> + else if (INTERVAL_IS_NOEND(span))
> + {
> + if TIMESTAMP_IS_NOBEGIN(timestamp)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
> + else
> + TIMESTAMP_NOEND(result);
> + }
> + else if (TIMESTAMP_NOT_FINITE(timestamp))
>
> This code is duplicated in timestamp_pl_interval(). We could create > a function
> to encode the infinity handling rules and then call it in these two > places. The
> argument types are different, Timestamp and TimestampTz viz. which map to in64,
> so shouldn't be a problem. But it will be slightly unreadable. Or use macros
> but then it will be difficult to debug.
>
> What do you think?
I was hoping that I could come up with a macro that we could re-use for
all the similar logic. If that doesn't work then I'll try the helper
> time_mi_time - do we want to add an Assert to make sure that this
> function does not produce an Interval structure which looks like
> non-finite interval?
Since the month and day field of the interval result is hard-coded as
0, it's not possible to produce a non-finite interval result, but I
don't think it would hurt. I've added an assert to the end.
> There's some way to avoid separate checks for infinite-ness of
> interval and factor and use a single block using some integer
> arithmetic. But I think this is more readable. So I avoided doing
> that. Let me know if this works for you.
I think the patch looks good, I've combined it with the existing patch.
> Also added some test cases.
I didn't see any tests in the patch, did you forget to include it?
I've attached the updated patches. I've also rebased them against main.
[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e39f9904671082c5ad3a2c5acbdbd028fa93bf35
[1] https://www.postgresql.org/docs/15/datatype-datetime.html
> > and loses any fractional seconds. Do we have an overflow detecting
> > multiplication function for floats?
>
> We have float8_mul() which checks for overflow. typedef double float8;
I've updated patch 2 to use this. I also realized that the implicit
cast from double to int64 can also result in an overflow. For example,
even after adding float8_mul() we can see this:
SELECT make_interval(0, 0, 0, 0, 0, 0,17976931348623157000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000);
make_interval
--------------------------
-2562047788:00:54.775808
(1 row)
So I added a check for FLOAT8_FITS_IN_INT64() and a test with this
scenario.
> Without this patch
> postgres@64807=#SELECT INTERVAL '2147483646 months' + INTERVAL '1 month';
> ?column?
> ------------------------
> 178956970 years 7 mons
> (1 row)
>
> That result looks correct
>
> postgres@64807=#select 178956970 * 12 + 7;
> ?column?
> ------------
> 2147483647
>
> (1 row)
>
> So some backward compatibility break. I don't think we can avoid the
> backward compatibility break without expanding interval structure and
> thus causing on-disk breakage. But we can reduce the chances of
> breaking, if we change INTERVAL_NOT_FINITE to check all the three
> fields, instead of just month.
For what it's worth I think that 2147483647 months only became a valid
interval in v15 as part of this commit [0]. It's also outside of the
documented valid range [1], which is
[-178000000 years, 178000000 years] or
[-14833333 months, 14833333 months].
The rationale for only checking the month's field is that it's faster
than checking all three fields, though I'm not entirely sure if it's
the right trade-off. Any thoughts on this?
> >
> >
> > > + else
> > > + {
> > > + result->time = -interval->time;
> > > + result->day = -interval->day;
> > > + result->month = -interval->month;
> > > +
> > > + if (INTERVAL_NOT_FINITE(result))
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> > > + errmsg("interval out of range")));
> > >
> > > If this error ever gets to the user, it could be confusing. Can we elaborate by
> > > adding context e.g. errcontext("while negating an interval") or some such?
> >
> > Done.
>
> Thanks. Can we add relevant contexts at similar other places?
I've added an errcontext to all the errors of the form "X out of
range". My one concern is that some of the messages can be slightly
confusing. For example date arithmetic is converted to timestamp
arithmetic, so the errcontext talks about timestamps even though the
actual operation used dates. For example,
SELECT date 'infinity' + interval '-infinity';
ERROR: interval out of range
CONTEXT: while adding an interval and timestamp
> Also if we use all the three fields, we will need to add such checks
> in interval_justify_hours()
I added these for now because even if we stick to just using the month
field, it will be good future proofing.
> @@ -3047,7 +3180,30 @@ timestamptz_pl_interval_internal(TimestampTz timestamp,
> TimestampTz result;
> int tz;
>
> - if (TIMESTAMP_NOT_FINITE(timestamp))
> + /*
> + * Adding two infinites with the same sign results in an infinite
> + * timestamp with the same sign. Adding two infintes with different signs
> + * results in an error.
> + */
> + if (INTERVAL_IS_NOBEGIN(span))
> + {
> + if TIMESTAMP_IS_NOEND(timestamp)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
> + else
> + TIMESTAMP_NOBEGIN(result);
> + }
> + else if (INTERVAL_IS_NOEND(span))
> + {
> + if TIMESTAMP_IS_NOBEGIN(timestamp)
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
> + else
> + TIMESTAMP_NOEND(result);
> + }
> + else if (TIMESTAMP_NOT_FINITE(timestamp))
>
> This code is duplicated in timestamp_pl_interval(). We could create > a function
> to encode the infinity handling rules and then call it in these two > places. The
> argument types are different, Timestamp and TimestampTz viz. which map to in64,
> so shouldn't be a problem. But it will be slightly unreadable. Or use macros
> but then it will be difficult to debug.
>
> What do you think?
I was hoping that I could come up with a macro that we could re-use for
all the similar logic. If that doesn't work then I'll try the helper
functions. I'll update the patch in a follow-up email to give myself some
time to think about this.
> time_mi_time - do we want to add an Assert to make sure that this
> function does not produce an Interval structure which looks like
> non-finite interval?
Since the month and day field of the interval result is hard-coded as
0, it's not possible to produce a non-finite interval result, but I
don't think it would hurt. I've added an assert to the end.
> There's some way to avoid separate checks for infinite-ness of
> interval and factor and use a single block using some integer
> arithmetic. But I think this is more readable. So I avoided doing
> that. Let me know if this works for you.
I think the patch looks good, I've combined it with the existing patch.
> Also added some test cases.
I didn't see any tests in the patch, did you forget to include it?
I've attached the updated patches. I've also rebased them against main.
Thanks,
Joe Koshakow
[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e39f9904671082c5ad3a2c5acbdbd028fa93bf35
[1] https://www.postgresql.org/docs/15/datatype-datetime.html
On Fri, Mar 31, 2023 at 8:46 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
I hurried too much on the previous patch. It introduced other
problems. Attached is a better patch and also fixes problem below
#select 'infinity'::interval * 0;
?column?
----------
infinity
(1 row)
with the patch we see
#select 'infinity'::interval * 0;
2023-03-31 18:00:43.131 IST [240892] ERROR: interval out of range
2023-03-31 18:00:43.131 IST [240892] STATEMENT: select
'infinity'::interval * 0;
ERROR: interval out of range
which looks more appropriate given 0 * inf = Nan for float.
There's some way to avoid separate checks for infinite-ness of
interval and factor and use a single block using some integer
arithmetic. But I think this is more readable. So I avoided doing
that. Let me know if this works for you.
Also added some test cases.
--
Best Wishes,
Ashutosh Bapat
On Fri, Mar 31, 2023 at 3:46 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Mar 28, 2023 at 7:17 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > make sure that every
> > operator that interval as one of its operands or the result has been
> > covered in the code.
>
> time_mi_time - do we want to add an Assert to make sure that this
> function does not produce an Interval structure which looks like
> non-finite interval?
>
> multiplying an interval by infinity throws an error
> #select '5 days'::interval * 'infinity'::float8;
> 2023-03-29 19:40:15.797 IST [136240] ERROR: interval out of range
> 2023-03-29 19:40:15.797 IST [136240] STATEMENT: select '5
> days'::interval * 'infinity'::float8;
> ERROR: interval out of range
>
> I think this should produce an infinite interval now. Attached patch
> to fix this, to be applied on top of your patch. With the patch
> #select '5 days'::interval * 'infinity'::float8;
> ?column?
> ----------
> infinity
> (1 row)
>
> Going through the tests now.
>
> --
> Best Wishes,
> Ashutosh Bapat
Attachment
pgsql-hackers by date: