Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers

From Ilya Gladyshev
Subject Re: optimizing pg_upgrade's once-in-each-database steps
Date
Msg-id 10c1c8dd-4685-46d4-80be-56bdfca8659a@gmail.com
Whole thread Raw
In response to Re: optimizing pg_upgrade's once-in-each-database steps  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: optimizing pg_upgrade's once-in-each-database steps
List pgsql-hackers
On 22.07.2024 21:07, Nathan Bossart wrote:
> On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
>> However, while looking into this, I noticed that only one get_query
>> callback (get_db_subscription_count()) actually customizes the generated
>> query using information in the provided DbInfo.  AFAICT we can do this
>> particular step without running a query in each database, as I mentioned
>> elsewhere [0].  That should speed things up a bit and allow us to simplify
>> the AsyncTask code.
>>
>> With that, if we are willing to assume that a given get_query callback will
>> generate the same string for all databases (and I think we should), we can
>> run the callback once and save the string in the step for dispatch_query()
>> to use.  This would look more like what you suggested in the quoted text.
> Here is a new patch set.  I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon.  I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too.  Otherwise, I've tried to address all feedback
> provided thus far.
>
> [0] https://commitfest.postgresql.org/49/5135/
>
Hi,

I like your idea of parallelizing these checks with async libpq API, 
thanks for working on it. The patch doesn't apply cleanly on master 
anymore, but I've rebased locally and taken it for a quick spin with a 
pg16 instance of 1000 empty databases. Didn't see any regressions with 
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).

One thing that I noticed that could be improved is we could start a new 
connection right away after having run all query callbacks for the 
current connection in process_slot, instead of just returning and 
establishing the new connection only on the next iteration of the loop 
in async_task_run after potentially sleeping on select.

+1 to Jeff's suggestion that perhaps we could reuse connections, but 
perhaps that's a separate story.

Regards,

Ilya




pgsql-hackers by date:

Previous
From: Sutou Kouhei
Date:
Subject: Re: Fixing backslash dot for COPY FROM...CSV
Next
From: David Rowley
Date:
Subject: Re: Add mention of execution time memory for enable_partitionwise_* GUCs