Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers
From | Ilya Gladyshev |
---|---|
Subject | Re: optimizing pg_upgrade's once-in-each-database steps |
Date | |
Msg-id | 0fe740dd-cd23-45a6-b116-c777349acbf1@gmail.com Whole thread Raw |
In response to | Re: optimizing pg_upgrade's once-in-each-database steps (Ilya Gladyshev <ilya.v.gladyshev@gmail.com>) |
List | pgsql-hackers |
Good catch, I didn't look so closely at this.I think we can actually just use PQstatus() here. But furthermore, I think the way I was initiating connections was completely bogus. IIUC before calling PQconnectPoll() the first time, we should wait for a write indicator from select(), and then we should only call PQconnectPoll() after subsequent indicators from select(). After spending quite a bit of time staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any problems thus far. If I had to guess, I'd say that either PQconnectPoll() is more resilient than I think it is, or I've just gotten lucky because pg_upgrade establishes connections quickly.
The fix looks right to me, but I got confused by the skip_wait and this `if`:Anyway, to fix this, I've added some more fields to the slot struct to keep track of the information we need to properly establish connections, and we now pay careful attention to the return value of select() so that we know which slots are ready for processing. This seemed like a nice little optimization independent of fixing connection establishment. I was worried this was going to require a lot more code, but I think it only added ~50 lines or so.
+ if (PQstatus(slot->conn) != CONNECTION_OK)
+ return;
This branch checks connection status that hasn't been refreshed after the select. When we go back to wait_slots after this, PQconnectPoll will refresh the connection status and run select with skip_wait=true, I believe, we could simplify this by moving the PQconnectPoll back to the process_slots, so that we can process connection right after polling, if it's ready. Something like this:
diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index 3618dc08ff..73e987febf 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -237,6 +237,8 @@ process_query_result(const ClusterInfo *cluster, UpgradeTaskSlot *slot,
static void
process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTask *task)
{
+ PostgresPollingStatusType status;
+
if (!slot->ready)
return;
@@ -260,26 +262,26 @@ process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const UpgradeTas
start_conn(cluster, slot);
return;
-
case CONNECTING:
- /* Check for connection failure. */
- if (PQstatus(slot->conn) == CONNECTION_BAD)
- pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
-
- /* Check whether the connection is still establishing. */
- if (PQstatus(slot->conn) != CONNECTION_OK)
- return;
-
- /*
- * Move on to running/processing the queries in the task.
- */
- slot->state = RUNNING_QUERIES;
- if (!PQsendQuery(slot->conn, task->queries->data))
+ status = PQconnectPoll(slot->conn);
+ if (status == PGRES_POLLING_READING)
+ slot->select_mode = true;
+ else if (status == PGRES_POLLING_WRITING)
+ slot->select_mode = false;
+ else if (status == PGRES_POLLING_FAILED)
pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+ else
+ {
+ /*
+ * Move on to running/processing the queries in the task.
+ */
+ slot->state = RUNNING_QUERIES;
+ if (!PQsendQuery(slot->conn, task->queries->data))
+ pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+ }
return;
-
case RUNNING_QUERIES:
/*
@@ -370,8 +372,6 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
for (int i = 0; i < numslots; i++)
{
- PostgresPollingStatusType status;
-
switch (slots[i].state)
{
case FREE:
@@ -386,33 +386,7 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
continue;
case CONNECTING:
-
- /*
- * Don't call PQconnectPoll() again for this slot until
- * select() tells us something is ready. Be sure to use the
- * previous poll mode in this case.
- */
- if (!slots[i].ready)
- break;
-
- /*
- * If we are waiting for the connection to establish, choose
- * whether to wait for reading or for writing on the socket as
- * appropriate. If neither apply, mark the slot as ready and
- * skip waiting so that it is handled ASAP (we assume this
- * means the connection is either bad or fully ready).
- */
- status = PQconnectPoll(slots[i].conn);
- if (status == PGRES_POLLING_READING)
- slots[i].select_mode = true;
- else if (status == PGRES_POLLING_WRITING)
- slots[i].select_mode = false;
- else
- {
- slots[i].ready = true;
- skip_wait = true;
- continue;
- }
+ /* All the slot metadata was already setup in process_slots() */
break;
--
2.43.0
skip_wait can be removed in this case as well.
This is up to you, I think the v12 is good and commitable in any case.
pgsql-hackers by date: