Hi,
This is a bit of an odd one - I'm on PostgreSQL 11.9 with a single streaming replica.
The system handles a lot of connections - we have a max_connections of 600. Most are long lived JDBC, but there are a lot of ETL / ad-hoc jobs etc.
Connections normally sit at 300ish, with 70 active at the most. The machines have 32 CPU cores . PgBouncer is sadly not an option hereas we are using many long lived connections which make use of prepared statements.
Sometimes a strange condition occurs. The number of connections is well under 600 (and dropping), but new connections are not being allowed into the database, I can see this message in the logs:
(0:53300)FATAL: remaining connection slots are reserved for non-replication superuser connections
From ps I can see a lot of processes like this:
postgres: accessor monitoring 10.10.7.54[41655] startup
The number of these grows, and no new connections are allowed in. These startup connections do not appear in pg_stat_activity so I can't find what they are waiting on.
Removing some long lived connections sometimes appears to help clear the startup processes and return the system to normal - but I have not been able to find a correlation. It's more blind luck at the moment.
Any pointers on where I could start prodding?
Cheers,
James
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.