Thread: doc: expand note about pg_upgrade's --jobs option

doc: expand note about pg_upgrade's --jobs option

From
Nathan Bossart
Date:
Magnus noted to me off-list that the "et cetera" in the following sentence
in pg_upgrade's docs is doing quite a bit of heavy lifting:

    The --jobs option allows multiple CPU cores to be used for
    copying/linking of files, dumping and restoring database schemas in
    parallel, etc.; a good place to start is the maximum of the number of
    CPU cores and tablespaces.

I added the "et cetera" in commit 40e2e5e92b to cover the many follow-up
commits that parallelized various pg_upgrade tasks.  I was initially
worried that trying to list all the parallelized stuff would be too
verbose, but looking again, I think all the changes can be grouped into
"gathering cluster information" and "performing cluster checks."  The
attached patch replaces the "et cetera" with those two general categories.

-- 
nathan

Attachment

Re: doc: expand note about pg_upgrade's --jobs option

From
Daniel Gustafsson
Date:
> On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote:

> The attached patch replaces the "et cetera" with those two general categories.

LGTM.

--
Daniel Gustafsson




Re: doc: expand note about pg_upgrade's --jobs option

From
Magnus Hagander
Date:
On Wed, Mar 5, 2025 at 11:00 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote:

> The attached patch replaces the "et cetera" with those two general categories.

LGTM.

Another option that I think would also work is to just cut down the details to just "The <option>--jobs</option> option allows multiple CPU cores to be used".

I think this is also slightly confusing, but maybe that's a non-native-english thing: "a good place to start is the maximum of the number of  CPU cores and tablespaces.". Am I supposed to set it to max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

--

Re: doc: expand note about pg_upgrade's --jobs option

From
Nathan Bossart
Date:
On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
> Another option that I think would also work is to just cut down the details
> to just "The <option>--jobs</option> option allows multiple CPU cores to be
> used".

That's fine with me.  It's probably not particularly actionable
information, anyway.  If anything, IMHO we should make it clear to users
that the parallelization is per-database (except for file transfer, which
is per-tablespace).  If you've just got one big database in the default
tablespace, --jobs won't help.

> I think this is also slightly confusing, but maybe that's a
> non-native-english thing: "a good place to start is the maximum of the
> number of  CPU cores and tablespaces.". Am I supposed to set it to
> max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?

I've always read it to mean the former.  But I'm not sure that's great
advice.  If you have 8 cores and 100 tablespaces, does it make sense to use
--jobs=100?  Ordinarily, I'd suggest the number of cores as the starting
point.

-- 
nathan



Re: doc: expand note about pg_upgrade's --jobs option

From
Nathan Bossart
Date:
On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote:
> On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote:
>> Another option that I think would also work is to just cut down the details
>> to just "The <option>--jobs</option> option allows multiple CPU cores to be
>> used".
> 
> That's fine with me.  It's probably not particularly actionable
> information, anyway.  If anything, IMHO we should make it clear to users
> that the parallelization is per-database (except for file transfer, which
> is per-tablespace).  If you've just got one big database in the default
> tablespace, --jobs won't help.
> 
>> I think this is also slightly confusing, but maybe that's a
>> non-native-english thing: "a good place to start is the maximum of the
>> number of  CPU cores and tablespaces.". Am I supposed to set it to
>> max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
> 
> I've always read it to mean the former.  But I'm not sure that's great
> advice.  If you have 8 cores and 100 tablespaces, does it make sense to use
> --jobs=100?  Ordinarily, I'd suggest the number of cores as the starting
> point.

Here's another attempt at the patch based on the latest discussion.

-- 
nathan

Attachment