Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
Date
Msg-id 573582.1608599459@sss.pgh.pa.us
Whole thread Raw
In response to BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> We use pg_prewarm via shared_preload_libraries and we've noticed that if we
> start Pg then stop it again before pg_prewarm finishes its initial
> prewarming, Pg gets stuck in shutdown until we kill the 'autoprewarm master'
> process with 'kill -9'.

Yeah.  I assume you've got multiple large databases?  When I try this
here, the current autoprewarm worker process shuts down promptly, but
then the master process tries to launch another worker for the next DB,
and then just sits there because the postmaster is now ignoring worker
start requests.

So this is partly autoprewarm's fault, because it's taking no thought
for the possibility that a shutdown will be requested before it's
managed to preload everything.  But it's *mostly* the fault of the
incredibly brain-dead design for background workers, which has no
way to tell worker requestors that their request is never going to
be granted.  I've ranted about that before, with little effect on
the people responsible, so I guess at some point I'm going to have
to go fix it myself.

In the meantime, attached is a proposed patch that improves
autoprewarm's behavior.  It will only get stuck if you're unlucky
enough to issue "pg_ctl stop" just as it's trying to launch
another worker, which should be a pretty tight window.

I noticed while testing this that when autoprewarm did get out of
the situation promptly, it'd rewrite autoprewarm.blocks with only
as many blocks as it'd managed to prewarm before being canceled.
This doesn't seem like a good thing, because then the next start
might have next door to no prewarm behavior.  So the patch also
hacks it to skip the final autoprewarm.blocks dump if we were
told to terminate before the initial prewarming pass is done.

This is against HEAD but it seems to apply cleanly to v12.
If you're in a position to rebuild pg_prewarm with the patch,
please see if it resolves the problem for you.

            regards, tom lane

diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index bfcea9bde3..8d6c81998c 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -153,6 +153,7 @@ void
 autoprewarm_main(Datum main_arg)
 {
     bool        first_time = true;
+    bool        final_dump_allowed = true;
     TimestampTz last_dump_time = 0;

     /* Establish signal handlers; once that's done, unblock signals. */
@@ -193,10 +194,15 @@ autoprewarm_main(Datum main_arg)
      * There's not much point in performing a dump immediately after we finish
      * preloading; so, if we do end up preloading, consider the last dump time
      * to be equal to the current time.
+     *
+     * If apw_load_buffers() is terminated early by a shutdown request,
+     * prevent dumping out our state below the loop, because we'd effectively
+     * just truncate the saved state to however much we'd managed to preload.
      */
     if (first_time)
     {
         apw_load_buffers();
+        final_dump_allowed = !ShutdownRequestPending;
         last_dump_time = GetCurrentTimestamp();
     }

@@ -254,7 +260,8 @@ autoprewarm_main(Datum main_arg)
      * Dump one last time.  We assume this is probably the result of a system
      * shutdown, although it's possible that we've merely been terminated.
      */
-    apw_dump_now(true, true);
+    if (final_dump_allowed)
+        apw_dump_now(true, true);
 }

 /*
@@ -387,6 +394,18 @@ apw_load_buffers(void)
         if (!have_free_buffer())
             break;

+        /*
+         * Likewise, don't launch if we've already been told to shut down.
+         *
+         * There is a race condition here: if the postmaster has received a
+         * fast-shutdown signal, but we've not heard about it yet, then the
+         * postmaster will ignore our worker start request and we'll wait
+         * forever.  However, that's a bug in the general background-worker
+         * logic, not the fault of this module.
+         */
+        if (ShutdownRequestPending)
+            break;
+
         /*
          * Start a per-database worker to load blocks for this database; this
          * function will return once the per-database worker exits.
@@ -404,10 +423,11 @@ apw_load_buffers(void)
     apw_state->pid_using_dumpfile = InvalidPid;
     LWLockRelease(&apw_state->lock);

-    /* Report our success. */
-    ereport(LOG,
-            (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
-                    apw_state->prewarmed_blocks, num_elements)));
+    /* Report our success, if we were able to finish. */
+    if (!ShutdownRequestPending)
+        ereport(LOG,
+                (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
+                        apw_state->prewarmed_blocks, num_elements)));
 }

 /*

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
Next
From: PG Bug reporting form
Date:
Subject: BUG #16787: ODBC driver (libpq.dll, psqlodbc30a.dll, psqlodbc30w.dll) issues