On Thu, 3 Feb 2022 at 06:25, Michael Harris <harmic@gmail.com> wrote:
>
> Hi again
>
> Some good news. After some more debugging & reflection, I realized
> that the likely cause is one of our own libraries that gets loaded as
> part of some custom functions we are using.
>
> Some of these functions trigger fetching of remote resources, for
> which a timeout is set using `alarm`. The function unfortunately does
> not re-establish any pre-existing interval timers after it is done,
> which leads to postgresql missing it's own expected alarm signal.
>
> The reason that this was not affecting us on previous postgres
> versions was this commit:
>
>
https://github.com/postgres/postgres/commit/09cf1d52267644cdbdb734294012cf1228745aaa#diff-b12a7ca3bf9c6a56745844c2670b0b28d2a4237741c395dda318c6cc3664ad4a
>
> After this commit, once an alarm is missed, that backend never sets
> one again, so no timeouts of any kind will work. Therefore, the
> deadlock detector was never being run. Prior to that, the next time
> any timeout was set by the backend it would re-establish it's timer.
>
> We will of course fix our own code to prevent this issue, but I am a
> little concerned at the above commit as it reduces the robustness of
> postgres in this situation. Perhaps I will raise it on the
> pgsql-hackers list.
Hmm, so you turned off Postgres' alarms so they stopped working, and
you're saying that is a robustness issue of Postgres?
Yes, something broke and it would be nice to avoid that, but the
responsibility for that lies in the user code that was called.
Postgres can't know what kernel calls are made during a custom
function.
Perhaps you could contribute a test case for this situation and a new
call to check/reset any missing alarms? Or alternatively, contribute
the function library that fetches remote resources, so that can become
an optional part of Postgres, in contrib.
--
Simon Riggs http://www.EnterpriseDB.com/