We're running into an issue where the database can't be connected to. It appears that the auto-vacuum is timing out and then that prevents new connections from happening. This assumption is based on these logs showing up in the logs:
WARNING: worker took too long to start; canceled
The log appears about every 5 minutes and eventually nothing can connect to it and it has to be rebooted.
As Julien suggested, this sounds like another victim, not the cause. Is there anything else in the log files?
What version are you using?
These are the most similarly related previous posts, but the CPU usage isn't high when this happens, so I don't believe that's the problem
But, I don't see high CPU described as a symptom in either of those threads.
If you can't reproduce the problem locally, there probably isn't much we can do. Maybe ask Amazon to look into it, since they are the only ones with sufficient access to do so.