Thread: Re: Query generates infinite loop
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> it's true that infinities as generate_series endpoints are going
> to work pretty oddly, so I agree with the idea of forbidding 'em.
> Numeric has infinity as of late, so the numeric variant would
> need to do this too.
Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.
The regression test you added for this change causes an infinite loop when run against an unpatched server with --install-check. That is a bit unpleasant. Is there something we can and should do about that? I was expecting regression test failures of course but not an infinite loop leading towards disk exhaustion.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > The regression test you added for this change causes an infinite loop when > run against an unpatched server with --install-check. That is a bit > unpleasant. Is there something we can and should do about that? I was > expecting regression test failures of course but not an infinite loop > leading towards disk exhaustion. We very often add regression test cases that will cause unpleasant failures on unpatched code. I categorically reject the idea that that's not a good thing, and question why you think that running known-broken code against a regression suite is an important use case. regards, tom lane
On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:I wrote:
> it's true that infinities as generate_series endpoints are going
> to work pretty oddly, so I agree with the idea of forbidding 'em.
> Numeric has infinity as of late, so the numeric variant would
> need to do this too.
Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.The regression test you added for this change causes an infinite loop when run against an unpatched server with --install-check. That is a bit unpleasant. Is there something we can and should do about that? I was expecting regression test failures of course but not an infinite loop leading towards disk exhaustion.Cheers,Jeff
Corey Huinker <corey.huinker@gmail.com> writes: > On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote: >> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Oh --- looks like numeric generate_series() already throws error for >>> this, so we should just make the timestamp variants do the same. > This came up once before > https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com Oh! I'd totally forgotten that thread, but given that discussion, and particularly the counterexample at https://www.postgresql.org/message-id/16807.1456091547%40sss.pgh.pa.us it now feels to me like maybe this change was a mistake. Perhaps instead of the committed change, we ought to go the other way and rip out the infinity checks in numeric generate_series(). In view of tomorrow's minor-release wrap, there is not time for the sort of more leisured discussion that I now think this topic needs. I propose to revert eafdf9de0 et al before the wrap, and think about this at more length before doing anything. regards, tom lane
On Mon, May 9, 2022 at 12:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Oh --- looks like numeric generate_series() already throws error for
>>> this, so we should just make the timestamp variants do the same.
> This came up once before
> https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com
Oh! I'd totally forgotten that thread, but given that discussion,
and particularly the counterexample at
https://www.postgresql.org/message-id/16807.1456091547%40sss.pgh.pa.us
it now feels to me like maybe this change was a mistake. Perhaps
instead of the committed change, we ought to go the other way and
rip out the infinity checks in numeric generate_series().
The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but seems like we're using generate_series() only because we lack a function that generates a series of N elements, without a specified upper bound, something like
generate_finite_series( start, step, num_elements )
And if we did that, I'd lobby that we have one that takes dates as well as one that takes timestamps, because that was my reason for starting the thread above.
generate_finite_series( start, step, num_elements )
And if we did that, I'd lobby that we have one that takes dates as well as one that takes timestamps, because that was my reason for starting the thread above.
Corey Huinker <corey.huinker@gmail.com> writes: > The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but > seems like we're using generate_series() only because we lack a function > that generates a series of N elements, without a specified upper bound, > something like > generate_finite_series( start, step, num_elements ) Yeah, that could be a reasonable thing to add. > And if we did that, I'd lobby that we have one that takes dates as well as > one that takes timestamps, because that was my reason for starting the > thread above. Less sure about that. ISTM the reason that the previous proposal failed was that it introduced too much ambiguity about how to resolve unknown-type arguments. Wouldn't the same problems arise here? regards, tom lane
Less sure about that. ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments. Wouldn't the same problems arise here?
If I recall, the problem was that the lack of a date-specific generate_series function would result in a date value being coerced to timestamp, and thus adding generate_series(date, date, step) would change behavior of existing code, and that was a POLA violation (among other bad things).
By adding a different function, there is no prior behavior to worry about. So we should be safe with the following signatures doing the right thing, yes?:
generate_finite_series(start timestamp, step interval, num_elements integer)
generate_finite_series(start date, step integer, num_elements integer)
By adding a different function, there is no prior behavior to worry about. So we should be safe with the following signatures doing the right thing, yes?:
generate_finite_series(start timestamp, step interval, num_elements integer)
generate_finite_series(start date, step integer, num_elements integer)
generate_finite_series(start date, step interval year to month, num_elements integer)
Corey Huinker <corey.huinker@gmail.com> writes: >> Less sure about that. ISTM the reason that the previous proposal failed >> was that it introduced too much ambiguity about how to resolve >> unknown-type arguments. Wouldn't the same problems arise here? > By adding a different function, there is no prior behavior to worry about. True, that's one less thing to worry about. > So we should be safe with the following signatures doing the right thing, > yes?: > generate_finite_series(start timestamp, step interval, num_elements > integer) > generate_finite_series(start date, step integer, num_elements integer) > generate_finite_series(start date, step interval year to month, > num_elements integer) No. You can experiment with it easily enough using stub functions: regression=# create function generate_finite_series(start timestamp, step interval, num_elements regression(# integer) returns timestamp as 'select $1' language sql; CREATE FUNCTION regression=# create function generate_finite_series(start date, step integer, num_elements integer) returns timestamp as'select $1' language sql; CREATE FUNCTION regression=# create function generate_finite_series(start date, step interval year to month, regression(# num_elements integer) returns timestamp as 'select $1' language sql;; CREATE FUNCTION regression=# select generate_finite_series(current_date, '1 day', 10); ERROR: function generate_finite_series(date, unknown, integer) is not unique LINE 1: select generate_finite_series(current_date, '1 day', 10); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. It's even worse if the first argument is also an unknown-type literal. Sure, you could add explicit casts to force the choice of variant, but then ease of use went out the window somewhere --- and IMO this proposal is mostly about ease of use, since there's no fundamentally new functionality. It looks like you could make it work with just these three variants: regression=# \df generate_finite_series List of functions Schema | Name | Result data type | Argument data types | Type --------+------------------------+-----------------------------+------------------------------------------------------------------------+------ public | generate_finite_series | timestamp without time zone | start date, step interval, num_elements integer | func public | generate_finite_series | timestamp with time zone | start timestamp with time zone, step interval, num_elementsinteger | func public | generate_finite_series | timestamp without time zone | start timestamp without time zone, step interval, num_elementsinteger | func (3 rows) I get non-error results with these: regression=# select generate_finite_series(current_date, '1 day', 10); generate_finite_series ------------------------ 2022-05-10 00:00:00 (1 row) regression=# select generate_finite_series('now', '1 day', 10); generate_finite_series ------------------------------- 2022-05-10 19:35:33.773738-04 (1 row) That shows that an unknown-type literal in the first argument will default to timestamptz given these choices, which seems like a sane default. BTW, you don't get to say "interval year to month" as a function argument, or at least it won't do anything useful. If you want to restrict the contents of the interval it'll have to be a runtime check inside the function. regards, tom lane