Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: optimizing pg_upgrade's once-in-each-database steps
Date
Msg-id ZreEh54Xr6D3Qy8k@nathan
Whole thread Raw
In response to Re: optimizing pg_upgrade's once-in-each-database steps  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: optimizing pg_upgrade's once-in-each-database steps
List pgsql-hackers
On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote:
>> I'll admit I hadn't really considered pipelining, but I'm tempted to say
>> that it's probably not worth the complexity.  Not only do most of the tasks
>> have only one step, but even tasks like the data types check are unlikely
>> to require more than a few queries for upgrades from supported versions.
> 
> Can you point me to a complex multi-step task that you think wouldn't work
> for pipelining? My skimming of the other patches all seemed to be one query
> with one result set to be processed by one callback.

I think it would work fine.  I'm just not sure it's worth it, especially
for tasks that run one exactly one query in each connection.

>> Furthermore, most of the callbacks should do almost nothing for a given
>> upgrade, and since pg_upgrade runs on the server, client/server round-trip
>> time should be pretty low.
> 
> To my mind, that makes pipelining make more sense, you throw out N queries,
> most of which are trivial, and by the time you cycle back around and start
> digesting result sets via callbacks, more of the queries have finished
> because they were waiting on the query ahead of them in the pipeline, not
> waiting on a callback to finish consuming its assigned result set and then
> launching the next task query.

My assumption is that the "waiting for a callback before launching the next
query" time will typically be pretty short in practice.  I could try
measuring it...

>> Perhaps pipelining would make more sense if we consolidated the tasks a bit
>> better, but when I last looked into that, I didn't see a ton of great
>> opportunities that would help anything except for upgrades from really old
>> versions.  Even then, I'm not sure if pipelining is worth it.
> 
> I think you'd want to do the opposite of consolidating the tasks. If
> anything, you'd want to break them down in known single-query operations,
> and if the callback function for one of them happens to queue up a
> subsequent query (with subsequent callback) then so be it.

By "consolidating," I mean combining tasks into fewer tasks with additional
steps.  This would allow us to reuse connections instead of creating N
connections for every single query.  If we used a task per query, I'd
expect pipelining to provide zero benefit.

-- 
nathan



pgsql-hackers by date:

Previous
From: Stepan Neretin
Date:
Subject: Re: SPI_connect, SPI_connect_ext return type
Next
From: "David G. Johnston"
Date:
Subject: Re: SPI_connect, SPI_connect_ext return type