Home > mailing lists

Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers

From	Ilya Gladyshev
Subject	Re: optimizing pg_upgrade's once-in-each-database steps
Date	August 1 00:55:33
Msg-id	10c1c8dd-4685-46d4-80be-56bdfca8659a@gmail.com Whole thread Raw
In response to	Re: optimizing pg_upgrade's once-in-each-database steps (Nathan Bossart <nathandbossart@gmail.com>)
Responses	Re: optimizing pg_upgrade's once-in-each-database steps
List	pgsql-hackers

Tree view

On 22.07.2024 21:07, Nathan Bossart wrote:
> On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
>> However, while looking into this, I noticed that only one get_query
>> callback (get_db_subscription_count()) actually customizes the generated
>> query using information in the provided DbInfo.  AFAICT we can do this
>> particular step without running a query in each database, as I mentioned
>> elsewhere [0].  That should speed things up a bit and allow us to simplify
>> the AsyncTask code.
>>
>> With that, if we are willing to assume that a given get_query callback will
>> generate the same string for all databases (and I think we should), we can
>> run the callback once and save the string in the step for dispatch_query()
>> to use.  This would look more like what you suggested in the quoted text.
> Here is a new patch set.  I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon.  I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too.  Otherwise, I've tried to address all feedback
> provided thus far.
>
> [0] https://commitfest.postgresql.org/49/5135/
>
Hi,

I like your idea of parallelizing these checks with async libpq API, 
thanks for working on it. The patch doesn't apply cleanly on master 
anymore, but I've rebased locally and taken it for a quick spin with a 
pg16 instance of 1000 empty databases. Didn't see any regressions with 
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).

One thing that I noticed that could be improved is we could start a new 
connection right away after having run all query callbacks for the 
current connection in process_slot, instead of just returning and 
establishing the new connection only on the next iteration of the loop 
in async_task_run after potentially sleeping on select.

+1 to Jeff's suggestion that perhaps we could reuse connections, but 
perhaps that's a separate story.

Regards,

Ilya

pgsql-hackers by date:

From: Sutou Kouhei
Date: 01 August, 00:49:49
Subject: Re: Fixing backslash dot for COPY FROM...CSV

From: David Rowley
Date: 01 August, 00:58:44
Subject: Re: Add mention of execution time memory for enable_partitionwise_* GUCs

Re: optimizing pg_upgrade's once-in-each-database steps - Mailing list pgsql-hackers

Previous

Next