Thread: Re: Query generates infinite loop

Re: Query generates infinite loop

From
Jeff Janes
Date:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> it's true that infinities as generate_series endpoints are going
> to work pretty oddly, so I agree with the idea of forbidding 'em.

> Numeric has infinity as of late, so the numeric variant would
> need to do this too.

Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.

The regression test you added for this change causes an infinite loop when run against an unpatched server with --install-check.  That is a bit unpleasant.  Is there something we can and should do about that?  I was expecting regression test failures of course but not an infinite loop leading towards disk exhaustion.

Cheers,

Jeff

Re: Query generates infinite loop

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> The regression test you added for this change causes an infinite loop when
> run against an unpatched server with --install-check.  That is a bit
> unpleasant.  Is there something we can and should do about that?  I was
> expecting regression test failures of course but not an infinite loop
> leading towards disk exhaustion.

We very often add regression test cases that will cause unpleasant
failures on unpatched code.  I categorically reject the idea that
that's not a good thing, and question why you think that running
known-broken code against a regression suite is an important use case.

            regards, tom lane



Re: Query generates infinite loop

From
Corey Huinker
Date:
On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> it's true that infinities as generate_series endpoints are going
> to work pretty oddly, so I agree with the idea of forbidding 'em.

> Numeric has infinity as of late, so the numeric variant would
> need to do this too.

Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.

The regression test you added for this change causes an infinite loop when run against an unpatched server with --install-check.  That is a bit unpleasant.  Is there something we can and should do about that?  I was expecting regression test failures of course but not an infinite loop leading towards disk exhaustion.

Cheers,

Jeff

Re: Query generates infinite loop

From
Tom Lane
Date:
Corey Huinker <corey.huinker@gmail.com> writes:
> On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Oh --- looks like numeric generate_series() already throws error for
>>> this, so we should just make the timestamp variants do the same.

> This came up once before
> https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com

Oh!  I'd totally forgotten that thread, but given that discussion,
and particularly the counterexample at

https://www.postgresql.org/message-id/16807.1456091547%40sss.pgh.pa.us

it now feels to me like maybe this change was a mistake.  Perhaps
instead of the committed change, we ought to go the other way and
rip out the infinity checks in numeric generate_series().

In view of tomorrow's minor-release wrap, there is not time for
the sort of more leisured discussion that I now think this topic
needs.  I propose to revert eafdf9de0 et al before the wrap,
and think about this at more length before doing anything.

            regards, tom lane



Re: Query generates infinite loop

From
Corey Huinker
Date:
On Mon, May 9, 2022 at 12:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Oh --- looks like numeric generate_series() already throws error for
>>> this, so we should just make the timestamp variants do the same.

> This came up once before
> https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com

Oh!  I'd totally forgotten that thread, but given that discussion,
and particularly the counterexample at

https://www.postgresql.org/message-id/16807.1456091547%40sss.pgh.pa.us

it now feels to me like maybe this change was a mistake.  Perhaps
instead of the committed change, we ought to go the other way and
rip out the infinity checks in numeric generate_series().

The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but seems like we're using generate_series() only because we lack a function that generates a series of N elements, without a specified upper bound, something like

     generate_finite_series( start, step, num_elements )

And if we did that, I'd lobby that we have one that takes dates as well as one that takes timestamps, because that was my reason for starting the thread above.

Re: Query generates infinite loop

From
Tom Lane
Date:
Corey Huinker <corey.huinker@gmail.com> writes:
> The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but
> seems like we're using generate_series() only because we lack a function
> that generates a series of N elements, without a specified upper bound,
> something like

>      generate_finite_series( start, step, num_elements )

Yeah, that could be a reasonable thing to add.

> And if we did that, I'd lobby that we have one that takes dates as well as
> one that takes timestamps, because that was my reason for starting the
> thread above.

Less sure about that.  ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments.  Wouldn't the same problems arise here?

            regards, tom lane



Re: Query generates infinite loop

From
Corey Huinker
Date:
Less sure about that.  ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments.  Wouldn't the same problems arise here?

If I recall, the problem was that the lack of a date-specific generate_series function would result in a date value being coerced to timestamp, and thus adding generate_series(date, date, step) would change behavior of existing code, and that was a POLA violation (among other bad things).

By adding a different function, there is no prior behavior to worry about. So we should be safe with the following signatures doing the right thing, yes?:
    generate_finite_series(start timestamp, step interval, num_elements integer)
    generate_finite_series(start date, step integer, num_elements integer)
    generate_finite_series(start date, step interval year to month, num_elements integer) 

Re: Query generates infinite loop

From
Tom Lane
Date:
Corey Huinker <corey.huinker@gmail.com> writes:
>> Less sure about that.  ISTM the reason that the previous proposal failed
>> was that it introduced too much ambiguity about how to resolve
>> unknown-type arguments.  Wouldn't the same problems arise here?

> By adding a different function, there is no prior behavior to worry about.

True, that's one less thing to worry about.

> So we should be safe with the following signatures doing the right thing,
> yes?:
>     generate_finite_series(start timestamp, step interval, num_elements
> integer)
>     generate_finite_series(start date, step integer, num_elements integer)
>     generate_finite_series(start date, step interval year to month,
> num_elements integer)

No.  You can experiment with it easily enough using stub functions:

regression=# create function generate_finite_series(start timestamp, step interval, num_elements
regression(# integer) returns timestamp as 'select $1' language sql;
CREATE FUNCTION
regression=# create function generate_finite_series(start date, step integer, num_elements integer) returns timestamp
as'select $1' language sql; 
CREATE FUNCTION
regression=# create function generate_finite_series(start date, step interval year to month,
regression(# num_elements integer) returns timestamp as 'select $1' language sql;;
CREATE FUNCTION

regression=# select generate_finite_series(current_date, '1 day', 10);
ERROR:  function generate_finite_series(date, unknown, integer) is not unique
LINE 1: select generate_finite_series(current_date, '1 day', 10);
               ^
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.

It's even worse if the first argument is also an unknown-type literal.
Sure, you could add explicit casts to force the choice of variant,
but then ease of use went out the window somewhere --- and IMO this
proposal is mostly about ease of use, since there's no fundamentally
new functionality.

It looks like you could make it work with just these three variants:

regression=# \df generate_finite_series
                                                               List of functions
 Schema |          Name          |      Result data type       |                          Argument data types
               | Type  

--------+------------------------+-----------------------------+------------------------------------------------------------------------+------
 public | generate_finite_series | timestamp without time zone | start date, step interval, num_elements integer
               | func 
 public | generate_finite_series | timestamp with time zone    | start timestamp with time zone, step interval,
num_elementsinteger    | func 
 public | generate_finite_series | timestamp without time zone | start timestamp without time zone, step interval,
num_elementsinteger | func 
(3 rows)

I get non-error results with these:

regression=# select generate_finite_series(current_date, '1 day', 10);
 generate_finite_series
------------------------
 2022-05-10 00:00:00
(1 row)

regression=# select generate_finite_series('now', '1 day', 10);
    generate_finite_series
-------------------------------
 2022-05-10 19:35:33.773738-04
(1 row)

That shows that an unknown-type literal in the first argument will default
to timestamptz given these choices, which seems like a sane default.

BTW, you don't get to say "interval year to month" as a function argument,
or at least it won't do anything useful.  If you want to restrict the
contents of the interval it'll have to be a runtime check inside the
function.

            regards, tom lane