Connection limits/permissions, slotsync workers, etc - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Connection limits/permissions, slotsync workers, etc |
Date | |
Msg-id | 1808397.1735156190@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
In connection with the discussion at [1], I started to look at exactly which server processes ought to be subject to connection limits (datconnlimit, ACL_CONNECT, and related checks). The current situation seems to be an inconsistent mess. Looking at InitPostgres, InitializeSessionUserId, and CheckMyDatabase, we have several different permissions and resource-limiting checks that may be enforced against an incoming process: ReservedConnections/SuperuserReservedConnections rolcanlogin rolconnlimit datallowconn datconnlimit database's ACL_CONNECT privilege I want to argue that ReservedConnections, rolconnlimit, and datconnlimit should only be applied to regular backends. It makes no particular sense to enforce them against autovac workers, background workers, or wal senders, because each of those types of processes has its own resource-limiting PGPROC pool. It's especially bad to enforce them against parallel workers, since that creates edge-case failures that make the use of parallelism not transparent to applications. We hacked around that at 73c9f91a1, but I think that should be reverted in favor of not applying the check at all. I further argue that rolcanlogin, datallowconn, and ACL_CONNECT should not be checked in a parallel worker, again primarily on the grounds that this creates parallelism-specific failures (cf [1]). The two scenarios where this occurs are (1) permissions were revoked since the leader process connected, or (2) leader is currently running as a role that wouldn't have permission to connect on its own. We don't attempt to kick out the leader process when either of those things happen, so why should we prevent it from using parallelism? The current situation for each of these checks is: ReservedConnections is enforced if !am_superuser && !am_walsender, so it is enforced against non-superuser background workers, which is silly because BG workers have their own PGPROC pool; moreover, what's the argument for letting walsenders but not other kinds of background processes escape this? I propose changing it to apply only to regular backends. rolcanlogin is enforced if IsUnderPostmaster and we reach InitializeSessionUserId, which basically reduces to regular backends, parallel workers, logrep workers, and walsenders. Seems reasonable for logrep workers and walsenders which represent fresh logins, but not for parallel workers. I propose fixing this by making ParallelWorkerMain pass BGWORKER_BYPASS_ROLELOGINCHECK. rolconnlimit is enforced if IsUnderPostmaster and we reach InitializeSessionUserId and it's not a superuser. So that applies to non-superuser parallel workers, logrep workers, and walsenders, and I don't think it's reasonable to apply it to any of them since those all come out of other PGPROC pools. I propose switching that to apply only to regular backends. BTW, I kind of wonder why rolconnlimit is ineffectual for superusers, especially when rolcanlogin does apply to them. Not a bug exactly, but it sure seems inconsistent. If you've taken the trouble to set it you'd expect it to work. Shall we take out the !is_superuser check? datallowconn is enforced against all non-standalone, non-AV-worker processes that connect to a specific database, except bgworkers that pass BGWORKER_BYPASS_ALLOWCONN (nothing in core except test module). So again that includes parallel workers, logrep workers, and walsenders. Again this seems reasonable for logrep workers and walsenders but not for parallel workers. I propose fixing this by making ParallelWorkerMain pass BGWORKER_BYPASS_ALLOWCONN. datconnlimit is enforced against all non-superuser processes, including per-DB walsenders and BG workers (see above). This is fairly dubious given that they have their own PGPROC pools. I propose switching that to apply only to regular backends, too. ACL_CONNECT is enforced against all non-superuser processes, including per-DB walsenders and BG workers (includes parallel workers, subscription apply workers, logrep workers). Perhaps that's OK for most, but I argue not for parallel workers; maybe skip it if BGWORKER_BYPASS_ALLOWCONN? Also, the enforcement of datconnlimit and rolconnlimit is inconsistent in another way: our counting of the pre-existing processes is pretty random. CountDBConnections is not consistent with either the current set of processes that datconnlimit is enforced against, or my proposal to enforce it only against regular backends. It counts anything that is not AmBackgroundWorkerProcess, including AV workers and per-DB walsenders. I think it should count only regular backends, because anything else leads to weird inconsistencies in whether a rejection occurs. The same applies to CountUserBackends (used for rolconnlimit check). I argue these two functions should count only regular backends, and the enforcement should likewise be only against regular backends. Another recently-created problem is that the "slotsync worker" process type we added in v17 hasn't been considered in any of this. In particular, unlike every other process type that can obtain a PGPROC, we did not consider it in the MaxBackends calculation: /* the extra unit accounts for the autovacuum launcher */ MaxBackends = MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders; This means AFAICS that it effectively competes for one of the MaxConnections PGPROC slots, meaning that it could fail for lack of a slot or could lock out a client that should have been able to connect. Shouldn't we have added another dedicated PGPROC slot for it? (proc.c would require some additional work to make that happen.) I wonder if the AV launcher and slotsync worker could be reclassified as "auxiliary processes" instead of being their own weird animal. regards, tom lane [1] https://www.postgresql.org/message-id/8befc845430ba1ae3748af900af298788e579c89.camel%40cybertec.at
pgsql-hackers by date: