Re: Auto-vacuum timing out and preventing connections - Mailing list pgsql-bugs

From David Johansen
Subject Re: Auto-vacuum timing out and preventing connections
Date
Msg-id CAAcYxUfMtLYdhyyLdoP0LWn5J-dRjvuixQVs_HQi8Nn7cOeeSQ@mail.gmail.com
Whole thread Raw
In response to Re: Auto-vacuum timing out and preventing connections  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Auto-vacuum timing out and preventing connections  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: Auto-vacuum timing out and preventing connections  (David Johansen <davejohansen@gmail.com>)
List pgsql-bugs
On Tue, Jun 28, 2022 at 1:31 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jun 27, 2022 at 4:38 PM David Johansen <davejohansen@gmail.com> wrote:
We're running into an issue where the database can't be connected to. It appears that the auto-vacuum is timing out and then that prevents new connections from happening. This assumption is based on these logs showing up in the logs:
WARNING:  worker took too long to start; canceled
The log appears about every 5 minutes and eventually nothing can connect to it and it has to be rebooted.

As Julien suggested, this sounds like another victim, not the cause.  Is there anything else in the log files? 

That's the only thing in the logs for the 12-24 hours before the database becomes inaccessible.
 
What version are you using?

13.6
 
These are the most similarly related previous posts, but the CPU usage isn't high when this happens, so I don't believe that's the problem

But, I don't see high CPU described as a symptom in either of those threads. 

I was referring to the "I've seen this happen under heavy load" statement. Not sure that's the cause or related in those posts, but it doesn't appear to be the issue here.
 
 If you can't reproduce the problem locally, there probably isn't much we can do.  Maybe ask Amazon to look into it, since they are the only ones with sufficient access to do so.

We've opened a support case, but I was trying to be proactive and seeing what we could dig into on our end. Is there a way to tell which table the auto-vacuum is trying to run on and timing out with? 

pgsql-bugs by date:

Previous
From: Dave Cramer
Date:
Subject: Re: BUG #17518: Getting Error "new multixact has more than one updating member" when trying to delete records.
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end