Hi,
Yesterday an interesting scenario was diagnosed on IRC. If you're
running a synchronous slave and the connection to the slave is lost
momentarily, your backends start naturally waiting for the slave to
reconnect. If then your application keeps trying to create new
connections, it can use all non-reserved connections, thus locking out
the synchronous slave when the connection problem has resolved itself.
This brings the entire cluster into a state where manual intervention is
necessary.
While you could limit the number of connections for non-replication
roles, that's not always possible or desirable. I would like to
introduce a way to reserve connection slots for replication. However,
it's not clear how this would work. I looked at how
superuser_reserved_connections is implented, and with small changes I
could see how to implement two ideas:
1) Reserve a portion of superuser_reserved_connections for replication connections. For example, with
max_connections=10, superuser_reserved_connections=2 and replication_reserved_connections=1, at 8 connections
eithera replication connection or a superuser connection can be created, and at 9 connections only a superuser
onewould be allowed. This is a bit clumsy as there still aren't guaranteed slots for replication. 2) A GUC
whichsays "superuser_reserved_connections can be used up by replication connections", and then limiting the number
of replication connections using per-role limits to make sure superusers aren't locked out.
Does anyone see a better way to do this? I'm not too satisfied with
either of these ideas.
Regards,
Marko Tiikkaja