Thread: replication_reserved_connections
Hi, Yesterday an interesting scenario was diagnosed on IRC. If you're running a synchronous slave and the connection to the slave is lost momentarily, your backends start naturally waiting for the slave to reconnect. If then your application keeps trying to create new connections, it can use all non-reserved connections, thus locking out the synchronous slave when the connection problem has resolved itself. This brings the entire cluster into a state where manual intervention is necessary. While you could limit the number of connections for non-replication roles, that's not always possible or desirable. I would like to introduce a way to reserve connection slots for replication. However, it's not clear how this would work. I looked at how superuser_reserved_connections is implented, and with small changes I could see how to implement two ideas: 1) Reserve a portion of superuser_reserved_connections for replication connections. For example, with max_connections=10, superuser_reserved_connections=2 and replication_reserved_connections=1, at 8 connections eithera replication connection or a superuser connection can be created, and at 9 connections only a superuser onewould be allowed. This is a bit clumsy as there still aren't guaranteed slots for replication. 2) A GUC whichsays "superuser_reserved_connections can be used up by replication connections", and then limiting the number of replication connections using per-role limits to make sure superusers aren't locked out. Does anyone see a better way to do this? I'm not too satisfied with either of these ideas. Regards, Marko Tiikkaja
Sent from my iPad On 28-Jul-2013, at 5:53, Marko Tiikkaja <marko@joh.to> wrote: > Hi, > > Yesterday an interesting scenario was diagnosed on IRC. If you're running a synchronous slave and the connection to theslave is lost momentarily, your backends start naturally waiting for the slave to reconnect. If then your applicationkeeps trying to create new connections, it can use all non-reserved connections, thus locking out the synchronousslave when the connection problem has resolved itself. This brings the entire cluster into a state where manualintervention is necessary. > Solving that was fun! > While you could limit the number of connections for non-replication roles, that's not always possible or desirable. Iwould like to introduce a way to reserve connection slots for replication. However, it's not clear how this would work. I looked at how superuser_reserved_connections is implented, and with small changes I could see how to implement twoideas: > > 1) Reserve a portion of superuser_reserved_connections for replication > connections. For example, with max_connections=10, > superuser_reserved_connections=2 and > replication_reserved_connections=1, at 8 connections either a > replication connection or a superuser connection can be created, > and at 9 connections only a superuser one would be allowed. This > is a bit clumsy as there still aren't guaranteed slots for > replication. > I would generally in agree with sharing super user reserved connections with replication.One thing I would like to exploreis if we could potentially add some sort of priority system for avoiding contention between super user threads andreplication threads competing for the same connection. We could potentially add a GUC for specifying which has the higher priority. I am just musing here,though. Thanks and Regards, Atri
On 28/07/2013 08:51, Atri Sharma wrote: > I would generally in agree with sharing super user reserved connections with replication.One thing I would like to exploreis if we could potentially add some sort of priority system for avoiding contention between super user threads andreplication threads competing for the same connection. > > We could potentially add a GUC for specifying which has the higher priority. This sounds an awful lot like it would have to scan through the list of existing connections, which I wanted to avoid. Or maybe we could maintain a separate list of "reserved" connections, i.e. ones that were created when we were at max_connections - ReservedBackends? We could quickly look through that list to see how many of which we have allowed. Not sure if that's practical, though. Regards, Marko Tiikkaja
On Sun, 28 Jul 2013 02:23:47 +0200 Marko Tiikkaja <marko@joh.to> wrote: > Hi, > > Yesterday an interesting scenario was diagnosed on IRC. If you're > running a synchronous slave and the connection to the slave is lost > momentarily, your backends start naturally waiting for the slave to > reconnect. If then your application keeps trying to create new > connections, it can use all non-reserved connections, thus locking > out the synchronous slave when the connection problem has resolved > itself. This brings the entire cluster into a state where manual > intervention is necessary. > > While you could limit the number of connections for non-replication > roles, that's not always possible or desirable. I would like to > introduce a way to reserve connection slots for replication. > However, it's not clear how this would work. I looked at how > superuser_reserved_connections is implented, and with small changes I > could see how to implement two ideas: > > 1) Reserve a portion of superuser_reserved_connections for > replication connections. For example, with max_connections=10, > superuser_reserved_connections=2 and > replication_reserved_connections=1, at 8 connections either a > replication connection or a superuser connection can be created, > and at 9 connections only a superuser one would be allowed. > This is a bit clumsy as there still aren't guaranteed slots for > replication. > 2) A GUC which says "superuser_reserved_connections can be used up > by replication connections", and then limiting the number of > replication connections using per-role limits to make sure > superusers aren't locked out. > > Does anyone see a better way to do this? I'm not too satisfied with > either of these ideas. > > > Regards, > Marko Tiikkaja > > Hi, I had the same problem and I created a patch to introduce a GUC for reserved_replication_connections as a seperate flag. You can find my patch here https://commitfest.postgresql.org/action/patch_view?id=1180 I am still waiting for feedback though. regards, Stefan Radomski
On 2013-07-28 19:21, Gibheer wrote: > I had the same problem and I created a patch to introduce a GUC for > reserved_replication_connections as a seperate flag. > You can find my patch here > https://commitfest.postgresql.org/action/patch_view?id=1180 Oops. I guess I should've searched through the archives before my email. I didn't remember seeing anything about this so I just assumed nobody was working on it. I'll take a look at your patch.. Regards, Marko Tiikkaja
On 2013-07-28 02:23:47 +0200, Marko Tiikkaja wrote: > While you could limit the number of connections for non-replication roles, > that's not always possible or desirable. I would like to introduce a way to > reserve connection slots for replication. However, it's not clear how this > would work. I looked at how superuser_reserved_connections is implented, > and with small changes I could see how to implement two ideas: > > Does anyone see a better way to do this? I'm not too satisfied with either > of these ideas. Personally I think we should just shouldn't allow normal connections for the backend slots added by max_wal_senders. They are internally *added* to max_connections, so limiting that seems perfectly fine to me since the system provides max_connections connections externally. Hm... I wonder how that's managed for 9.4's max_worker_processes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jul 28, 2013 at 2:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-07-28 02:23:47 +0200, Marko Tiikkaja wrote: >> While you could limit the number of connections for non-replication roles, >> that's not always possible or desirable. I would like to introduce a way to >> reserve connection slots for replication. However, it's not clear how this >> would work. I looked at how superuser_reserved_connections is implented, >> and with small changes I could see how to implement two ideas: >> >> Does anyone see a better way to do this? I'm not too satisfied with either >> of these ideas. > > Personally I think we should just shouldn't allow normal connections for > the backend slots added by max_wal_senders. They are internally *added* > to max_connections, so limiting that seems perfectly fine to me since > the system provides max_connections connections externally. > > Hm... I wonder how that's managed for 9.4's max_worker_processes. See InitProcGlobal(). There are three lists of PGPROC objects. PGPROCs for incoming connections are pulled off of ProcGlobal->freeProcs, the autovacuum and its workers pull from ProcGlobal->autovacFreeProcs, and background workers pull from ProcGlobal->bgworkerFreeProcs. Auxiliary processes have a separate pool of PGPROCs to pull from, but they use linear search rather than a list, for reasons described in the comments in that function. There may be other checks elsewhere that enforce these same limits; not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company