On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
> I like your idea of parallelizing these checks with async libpq API, thanks
> for working on it. The patch doesn't apply cleanly on master anymore, but
> I've rebased locally and taken it for a quick spin with a pg16 instance of
> 1000 empty databases. Didn't see any regressions with -j 1, there's some
> speedup with -j 8 (33 sec vs 8 sec for these checks).
Thanks for taking a look. I'm hoping to do a round of polishing before
posting a rebased patch set soon.
> One thing that I noticed that could be improved is we could start a new
> connection right away after having run all query callbacks for the current
> connection in process_slot, instead of just returning and establishing the
> new connection only on the next iteration of the loop in async_task_run
> after potentially sleeping on select.
Yeah, we could just recursively call process_slot() right after freeing the
slot. That'd at least allow us to avoid the spinning behavior as we run
out of databases to process, if nothing else.
> +1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps
> that's a separate story.
When I skimmed through the various tasks, I didn't see a ton of
opportunities for further consolidation, or at least opportunities that
would help for upgrades from supported versions. The data type checks are
already consolidated, for example.
--
nathan