Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers

From Ilya Gladyshev
Subject Re: optimizing pg_upgrade's once-in-each-database steps
Date
Msg-id 0fe740dd-cd23-45a6-b116-c777349acbf1@gmail.com
Whole thread Raw
In response to Re: optimizing pg_upgrade's once-in-each-database steps  (Ilya Gladyshev <ilya.v.gladyshev@gmail.com>)
List pgsql-hackers


On 01.09.2024 22:05, Nathan Bossart wrote:
I think we can actually just use PQstatus() here.  But furthermore, I think
the way I was initiating connections was completely bogus.  IIUC before
calling PQconnectPoll() the first time, we should wait for a write
indicator from select(), and then we should only call PQconnectPoll() after
subsequent indicators from select().  After spending quite a bit of time
staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any
problems thus far.  If I had to guess, I'd say that either PQconnectPoll()
is more resilient than I think it is, or I've just gotten lucky because
pg_upgrade establishes connections quickly.
Good catch, I didn't look so closely at this.
Anyway, to fix this, I've added some more fields to the slot struct to
keep track of the information we need to properly establish connections,
and we now pay careful attention to the return value of select() so that we
know which slots are ready for processing.  This seemed like a nice little
optimization independent of fixing connection establishment.  I was worried
this was going to require a lot more code, but I think it only added ~50
lines or so.
The fix looks right to me, but I got confused by the skip_wait and this `if`:

+            if (PQstatus(slot->conn) != CONNECTION_OK)
+                return;

This branch checks connection status that hasn't been refreshed after the select. When we go back to wait_slots after this, PQconnectPoll will refresh the connection status and run select with skip_wait=true, I believe, we could simplify this by moving the PQconnectPoll back to the process_slots, so that we can process connection right after polling, if it's ready. Something like this:

diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index 3618dc08ff..73e987febf 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -237,6 +237,8 @@ process_query_result(const ClusterInfo *cluster, UpgradeTaskSlot *slot,
 static void
 process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTask *task)
 {
+    PostgresPollingStatusType status;
+
     if (!slot->ready)
         return;
 
@@ -260,26 +262,26 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
             start_conn(cluster, slot);
 
             return;
-
         case CONNECTING:
 
-            /* Check for connection failure. */
-            if (PQstatus(slot->conn) == CONNECTION_BAD)
-                pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
-
-            /* Check whether the connection is still establishing. */
-            if (PQstatus(slot->conn) != CONNECTION_OK)
-                return;
-
-            /*
-             * Move on to running/processing the queries in the task.
-             */
-            slot->state = RUNNING_QUERIES;
-            if (!PQsendQuery(slot->conn, task->queries->data))
+            status = PQconnectPoll(slot->conn);
+            if (status == PGRES_POLLING_READING)
+                slot->select_mode = true;
+            else if (status == PGRES_POLLING_WRITING)
+                slot->select_mode = false;
+            else if (status == PGRES_POLLING_FAILED)
                 pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+            else
+            {
+                /*
+                 * Move on to running/processing the queries in the task.
+                 */
+                slot->state = RUNNING_QUERIES;
+                if (!PQsendQuery(slot->conn, task->queries->data))
+                    pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
 
+            }
             return;
-
         case RUNNING_QUERIES:
 
             /*
@@ -370,8 +372,6 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
 
     for (int i = 0; i < numslots; i++)
     {
-        PostgresPollingStatusType status;
-
         switch (slots[i].state)
         {
             case FREE:
@@ -386,33 +386,7 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
                 continue;
 
             case CONNECTING:
-
-                /*
-                 * Don't call PQconnectPoll() again for this slot until
-                 * select() tells us something is ready.  Be sure to use the
-                 * previous poll mode in this case.
-                 */
-                if (!slots[i].ready)
-                    break;
-
-                /*
-                 * If we are waiting for the connection to establish, choose
-                 * whether to wait for reading or for writing on the socket as
-                 * appropriate.  If neither apply, mark the slot as ready and
-                 * skip waiting so that it is handled ASAP (we assume this
-                 * means the connection is either bad or fully ready).
-                 */
-                status = PQconnectPoll(slots[i].conn);
-                if (status == PGRES_POLLING_READING)
-                    slots[i].select_mode = true;
-                else if (status == PGRES_POLLING_WRITING)
-                    slots[i].select_mode = false;
-                else
-                {
-                    slots[i].ready = true;
-                    skip_wait = true;
-                    continue;
-                }
+                /* All the slot metadata was already setup in process_slots() */
 
                 break;
 
--
2.43.0

skip_wait can be removed in this case as well.

This is up to you, I think the v12 is good and commitable in any case.

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Next
From: Tatsuo Ishii
Date:
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN