Thread: Infinite loop for generate_series with timestamp arguments

Infinite loop for generate_series with timestamp arguments

From

Jakob Teuber

Date:

03 March, 19:05:10

Hi all,

I noticed, that the following query will send Postgres into an infinite 
loop:

(a)  select generate_series(timestamp '2025-01-30', timestamp 
'2025-02-01', interval '1 month -29 days');

Adding one month to 2023-01-30 will result in 2023-02-28 due to February 
having no 31st day, and then subtracting 29 days will result in 
2025-01-30 again, resulting in a repetition from there. Postgres does 
not detect this, and gets caught in an infinite loop. This query is not 
the only way this can happen, one can imagine many different setups, 
that, at some point, will lead to this problem.

One could certainly argue that “go and do an infinite loop” is plainly 
the intended semantics of this query, but in other cases, Postgres tries 
to guard against these types of looping generate_series statements:

(b)  select generate_series(timestamp '2025-03-31', timestamp 
'2025-04-01', interval '1 month -30 days');
         → ERROR:  step size cannot equal zero

(c)  select generate_series(timestamp '2024-12-31', timestamp 
'2025-01-01', interval '1 month -31 days');
         → (0 rows)

It is a bit surprising to me, that all these three cases are handled 
differently, as they seem exactly analogous. The issue with all the 
expression is that the specified interval, when added to the start date, 
results in the start date again.

     select (timestamp '2025-01-30' + interval '1 month -29 days') as A,
       (timestamp '2025-03-31' + interval '1 month -30 days') as B,
       (timestamp '2024-12-31' + interval '1 month -31 days') as C;
         →
                       a          |          b |          c
---------------------+---------------------+---------------------
              2025-01-31 00:00:00 | 2025-03-31 00:00:00 | 2024-12-31 
00:00:00

Query (b) seems to be the most straight forward: Postgres seems to flag 
the interval '1 month -30 days' as “empty” no matter the context.

     select generate_series(timestamp '2024-12-31', timestamp 
'2025-01-01', interval '1 month -30 days');
         → psql:commands.sql:1: ERROR:  step size cannot equal zero
     select timestamp '2024-12-31' + interval '1 month -30 days';
         → 2025-01-01 00:00:00

This should theoretically be an unproblematic query resulting in 
[2024-12-31 00:00:00, 2025-01-01 00:00:00], but it seems like Postgres 
treats '1 month -30 days' as an “zero-size interval”, despite there 
being contexts, where it is not of size zero, and contexts, where other 
intervals that aren’t flagged, are of size zero.

For query (c), the empty result makes it seem like Postgres interprets 
it as having a negative step size for a date in the future; the same 
result would happen with e. g. the interval '1 month -40 days' or 
generate_series(10, 5, +1). This would fit with the above, where 
Postgres assumes that a month always has 30 days for the purpose of 
determining how generate_series should run.

Is this behavior intentional or should it be considered to be erroneous? 
Naturally, working with human time representation can introduce many 
subtleties and maybe it is beyond the scope of Postgres to correctly 
handle them (whatever that means), but at least failing in a consistent 
way shouldn’t be to much to ask for? If this is intentional, is there a 
system to this that I have failed to see?

Best,
Jakob

(PS: as this is not really a bug report, but rather an inquiry to what 
extent this behavior is intended, I hope `postgres-general` is the 
correct choice of lists. If not, feel free to redirect me to a more 
appropriate one.)


---
Jakob Teuber
jakob.teuber@tum.de

Re: Infinite loop for generate_series with timestamp arguments

From

Tom Lane

Date:

03 March, 22:10:46

Jakob Teuber <jakob.teuber@tum.de> writes:
> I noticed, that the following query will send Postgres into an infinite 
> loop:
> (a)  select generate_series(timestamp '2025-01-30', timestamp 
> '2025-02-01', interval '1 month -29 days');
> One could certainly argue that “go and do an infinite loop” is plainly 
> the intended semantics of this query, but in other cases, Postgres tries 
> to guard against these types of looping generate_series statements:
> (b)  select generate_series(timestamp '2025-03-31', timestamp 
> '2025-04-01', interval '1 month -30 days');
>          → ERROR:  step size cannot equal zero

It does what it can, but there's no chance of being entirely perfect.
The core problem here is to figure out which direction the series
is intended to advance, so as to know whether the termination test
should be "less than" or "greater than" the end timestamp.  The
way generate_series_timestamp does that is to see whether
interval comparison thinks the interval value is less than or
greater than a zero interval --- and if it happens to be exactly
equal, you get the whine about zero step size.  But interval
comparison is far from bright:

 * Interval comparison is based on converting interval values to a linear
 * representation expressed in the units of the time field (microseconds,
 * in the case of integer timestamps) with days assumed to be always 24 hours
 * and months assumed to be always 30 days.

We could perhaps do better by doing the initial addition of the
interval and seeing if that produces a value greater than, less
than, or equal to the start timestamp.  But I'm afraid that
doesn't move the goalposts very far, because as this example
shows, we might get different results in different months.

Another idea is to check, after doing each addition, to make
sure that the timestamp actually advanced in the expected
direction.  But should we error out if not, or just stop?

            regards, tom lane

Re: Infinite loop for generate_series with timestamp arguments

From

Tom Lane

Date:

04 March, 00:30:21

I wrote:
> We could perhaps do better by doing the initial addition of the
> interval and seeing if that produces a value greater than, less
> than, or equal to the start timestamp.  But I'm afraid that
> doesn't move the goalposts very far, because as this example
> shows, we might get different results in different months.

> Another idea is to check, after doing each addition, to make
> sure that the timestamp actually advanced in the expected
> direction.  But should we error out if not, or just stop?

Here's a very POC-y patch using both of these ideas (and
choosing to error out if the interval changes sign).
If we go this way, generate_series_timestamptz would need
similar changes, and some regression test cases would be
appropriate.  I'm not sure if the documentation needs
adjustment; it doesn't talk about the difficulty of
identifying the sign of an interval.

            regards, tom lane

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 9682f9dbdca..202bbd1edcd 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -6622,6 +6622,7 @@ generate_series_timestamp(PG_FUNCTION_ARGS)
     FuncCallContext *funcctx;
     generate_series_timestamp_fctx *fctx;
     Timestamp    result;
+    Timestamp    nextval = 0;

     /* stuff done only on the first call of the function */
     if (SRF_IS_FIRSTCALL())
@@ -6651,18 +6652,25 @@ generate_series_timestamp(PG_FUNCTION_ARGS)
         fctx->finish = finish;
         fctx->step = *step;

-        /* Determine sign of the interval */
-        fctx->step_sign = interval_sign(&fctx->step);
-
-        if (fctx->step_sign == 0)
+        if (INTERVAL_NOT_FINITE((&fctx->step)))
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("step size cannot equal zero")));
+                     errmsg("step size cannot be infinite")));

-        if (INTERVAL_NOT_FINITE((&fctx->step)))
+        /*
+         * Compute the next series value so that we can identify the step
+         * direction.  This seems more reliable than trusting interval_sign().
+         */
+        nextval =
+            DatumGetTimestamp(DirectFunctionCall2(timestamp_pl_interval,
+                                                  TimestampGetDatum(start),
+                                                  PointerGetDatum(&fctx->step)));
+        fctx->step_sign = timestamp_cmp_internal(nextval, start);
+
+        if (fctx->step_sign == 0)
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("step size cannot be infinite")));
+                     errmsg("step size cannot equal zero")));

         funcctx->user_fctx = fctx;
         MemoryContextSwitchTo(oldcontext);
@@ -6682,9 +6690,18 @@ generate_series_timestamp(PG_FUNCTION_ARGS)
         timestamp_cmp_internal(result, fctx->finish) >= 0)
     {
         /* increment current in preparation for next iteration */
-        fctx->current = DatumGetTimestamp(DirectFunctionCall2(timestamp_pl_interval,
-                                                              TimestampGetDatum(fctx->current),
-                                                              PointerGetDatum(&fctx->step)));
+        if (nextval)
+            fctx->current = nextval;    /* already calculated it */
+        else
+            fctx->current =
+                DatumGetTimestamp(DirectFunctionCall2(timestamp_pl_interval,
+                                                      TimestampGetDatum(result),
+                                                      PointerGetDatum(&fctx->step)));
+        /* check for directional instability */
+        if (fctx->step_sign != timestamp_cmp_internal(fctx->current, result))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("step size changed sign")));

         /* do when there is more left to send */
         SRF_RETURN_NEXT(funcctx, TimestampGetDatum(result));