Re: Missed check for too-many-children in bgworker spawning - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Missed check for too-many-children in bgworker spawning
Date
Msg-id 16897.1570660018@sss.pgh.pa.us
Whole thread Raw
In response to Re: Missed check for too-many-children in bgworker spawning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Missed check for too-many-children in bgworker spawning
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 9, 2019 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could improve on matters so far as the postmaster's child-process
>> arrays are concerned, by defining separate slot "pools" for the different
>> types of child processes.  But I don't see much point if the code is
>> not prepared to recover from a fork() failure --- and if it is, that
>> would a fortiori deal with out-of-child-slots as well.

> I would say rather that if fork() is failing on your system, you have
> a not very stable system. The fact that parallel query is going to
> fail is sad, but not as sad as the fact that connecting to the
> database is also going to fail, and that logging into the system to
> try to fix the problem may well fail as well.

True, it's not a situation you especially want to be in.  However,
I've lost count of the number of times that I've heard someone talk
about how their system was overstressed to the point that everything
else was failing, but Postgres kept chugging along.  That's a good
reputation to have and we shouldn't just walk away from it.

> Code that tries to make
> parallel query cope with this situation without an error wouldn't
> often be tested, so it might be buggy, and it wouldn't necessarily be
> a benefit if it did work. I expect many people would rather have the
> query fail and free up slots in the system process table than consume
> precisely all of them and then try to execute the query at a
> slower-than-expected rate.

I find that argument to be utter bunkum.  The parallel query code is
*already* designed to silently degrade performance when its primary
resource limit (shared bgworker slots) is exhausted.  How can it be
all right to do that but not all right to cope with fork failure
similarly?  If we think running up against the kernel limits is a
case that we can roll over and die on, why don't we rip out the
virtual-FD stuff in fd.c?

As for "might be buggy", if we ripped out every part of Postgres
that's under-tested, I'm afraid there might not be much left.
In any case, a sane design for this would make as much as possible
of the code handle "out of shared bgworker slots" just the same as
resource failures later on, so that there wouldn't be that big a gap
in coverage.

Having said all that, I made a patch that causes the postmaster
to reserve separate child-process-array slots for autovac workers
and bgworkers, as per attached, so that excessive connection
requests can't DOS those subsystems.  But I'm not sure that it's
worth the complication; it wouldn't be necessary if the parallel
query launch code were more robust.

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 85f15a5..8c66578 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2441,17 +2441,43 @@ canAcceptConnections(int backend_type)
     /*
      * Don't start too many children.
      *
-     * We allow more connections here than we can have backends because some
-     * might still be authenticating; they might fail auth, or some existing
-     * backend might exit before the auth cycle is completed.  The exact
-     * MaxBackends limit is enforced when a new backend tries to join the
-     * shared-inval backend array.
+     * We allow more connections (including walsenders) here than we can have
+     * backends because some might still be authenticating; they might fail
+     * auth, or some existing backend might exit before the auth cycle is
+     * completed.  The exact MaxBackends limit is enforced when a new backend
+     * tries to join the shared-inval backend array.
      *
-     * The limit here must match the sizes of the per-child-process arrays;
-     * see comments for MaxLivePostmasterChildren().
+     * On the other hand, there's no need to allow extra autovac workers or
+     * bgworkers, because the requesting logic for those should never request
+     * more processes than the corresponding limit.
+     *
+     * This logic must agree with MaxLivePostmasterChildren(), which sets the
+     * total size of the per-child-process arrays.
+     *
+     * Note: currently, we can't see backend_type == BACKEND_TYPE_WALSND here,
+     * but it seems less weird to include it in the switch than not.  We must
+     * treat normal backends and walsenders alike for this purpose because one
+     * type can metamorphose into the other later.
      */
-    if (CountChildren(BACKEND_TYPE_ALL) >= MaxLivePostmasterChildren())
-        result = CAC_TOOMANY;
+    switch (backend_type)
+    {
+        case BACKEND_TYPE_NORMAL:
+        case BACKEND_TYPE_WALSND:
+            if (CountChildren(BACKEND_TYPE_NORMAL | BACKEND_TYPE_WALSND) >=
+                2 * (MaxConnections + max_wal_senders))
+                result = CAC_TOOMANY;
+            break;
+        case BACKEND_TYPE_AUTOVAC:
+            if (CountChildren(BACKEND_TYPE_AUTOVAC) >= autovacuum_max_workers)
+                result = CAC_TOOMANY;
+            break;
+        case BACKEND_TYPE_BGWORKER:
+            if (CountChildren(BACKEND_TYPE_BGWORKER) >= max_worker_processes)
+                result = CAC_TOOMANY;
+            break;
+        default:
+            elog(ERROR, "unexpected backend_type: %d", backend_type);
+    }

     return result;
 }
@@ -5334,8 +5360,18 @@ static int
 CountChildren(int target)
 {
     dlist_iter    iter;
+    bool        check_walsender;
     int            cnt = 0;

+    /*
+     * In common cases we don't need to distinguish normal from walsender
+     * children, so we can save a few cycles and shared-memory touches by not
+     * checking for walsenders if not.
+     */
+    check_walsender =
+        (target & (BACKEND_TYPE_NORMAL | BACKEND_TYPE_WALSND)) != 0 &&
+        (target & (BACKEND_TYPE_NORMAL | BACKEND_TYPE_WALSND)) != (BACKEND_TYPE_NORMAL | BACKEND_TYPE_WALSND);
+
     dlist_foreach(iter, &BackendList)
     {
         Backend    *bp = dlist_container(Backend, elem, iter.cur);
@@ -5343,11 +5379,7 @@ CountChildren(int target)
         if (bp->dead_end)
             continue;

-        /*
-         * Since target == BACKEND_TYPE_ALL is the most common case, we test
-         * it first and avoid touching shared memory for every child.
-         */
-        if (target != BACKEND_TYPE_ALL)
+        if (check_walsender)
         {
             /*
              * Assign bkend_type for any recently announced WAL Sender
@@ -5356,11 +5388,11 @@ CountChildren(int target)
             if (bp->bkend_type == BACKEND_TYPE_NORMAL &&
                 IsPostmasterChildWalSender(bp->child_slot))
                 bp->bkend_type = BACKEND_TYPE_WALSND;
-
-            if (!(target & bp->bkend_type))
-                continue;
         }

+        if (!(target & bp->bkend_type))
+            continue;
+
         cnt++;
     }
     return cnt;
@@ -5626,15 +5658,31 @@ CreateOptsFile(int argc, char *argv[], char *fullprogname)
  * (the PMChildFlags array, and if EXEC_BACKEND the ShmemBackendArray).
  * These arrays include regular backends, autovac workers, walsenders
  * and background workers, but not special children nor dead_end children.
- * This allows the arrays to have a fixed maximum size, to wit the same
- * too-many-children limit enforced by canAcceptConnections().  The exact value
- * isn't too critical as long as it's more than MaxBackends.
+ * This allows the arrays to have a fixed maximum size.  The total amount
+ * of space we reserve here mustn't be less than the number of child processes
+ * that canAcceptConnections() will allow.
  */
 int
 MaxLivePostmasterChildren(void)
 {
-    return 2 * (MaxConnections + autovacuum_max_workers + 1 +
-                max_wal_senders + max_worker_processes);
+    int            nchildren = 0;
+
+    /*
+     * For children that are launched from asynchronous connection requests,
+     * allow some extra space in the arrays so that we don't deny a connection
+     * that would have succeeded by the time it is ready to join the shared-
+     * memory data structures.  The 2X multiplier is arbitrary, and probably
+     * overkill, but the amount of space wasted is minimal.
+     */
+    nchildren += 2 * (MaxConnections + max_wal_senders);
+
+    /* Allow for the autovac launcher and its workers */
+    nchildren += 1 + autovacuum_max_workers;
+
+    /* Allow for bgworkers */
+    nchildren += max_worker_processes;
+
+    return nchildren;
 }

 /*

pgsql-hackers by date:

Previous
From: Nikolay Shaplov
Date:
Subject: Re: [PATCH] Do not use StdRdOptions in Access Methods
Next
From: Steve Crawford
Date:
Subject: Re: TCP Wrappers