Hi,
On Mon, Jun 27, 2022 at 02:38:21PM -0600, David Johansen wrote:
> We're running into an issue where the database can't be connected to. It
> appears that the auto-vacuum is timing out and then that prevents new
> connections from happening. This assumption is based on these logs showing
> up in the logs:
> WARNING: worker took too long to start; canceled
I don't think that autovacuum is the reason of the problem, but just another
victim of the same problem as the autovacuum launcher is still active and tries
to schedule workers, which can't connect either.
> The log appears about every 5 minutes and eventually nothing can connect to
> it and it has to be rebooted.
Are you saying that you have to reboot every 5 minutes?
Also, do you mean reboot the server or just restarting the postgres service is
enough?
> These are the most similarly related previous posts, but the CPU usage
> isn't high when this happens, so I don't believe that's the problem
> https://www.postgresql.org/message-id/20081105185206.GS4114%40alvh.no-ip.org
> https://www.postgresql.org/message-id/AANLkTinsGLeRc26RT5Kb4_HEhow5e97p0ZBveg=p9xqS@mail.gmail.com
>
> What can we do to diagnose this problem and get our database working
> reliably again?
As mentioned in the 2nd link, getting a strace of the postmaster when the
problem happens may help.