Thread: doc: expand note about pg_upgrade's --jobs option
Magnus noted to me off-list that the "et cetera" in the following sentence in pg_upgrade's docs is doing quite a bit of heavy lifting: The --jobs option allows multiple CPU cores to be used for copying/linking of files, dumping and restoring database schemas in parallel, etc.; a good place to start is the maximum of the number of CPU cores and tablespaces. I added the "et cetera" in commit 40e2e5e92b to cover the many follow-up commits that parallelized various pg_upgrade tasks. I was initially worried that trying to list all the parallelized stuff would be too verbose, but looking again, I think all the changes can be grouped into "gathering cluster information" and "performing cluster checks." The attached patch replaces the "et cetera" with those two general categories. -- nathan
Attachment
> On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote: > The attached patch replaces the "et cetera" with those two general categories. LGTM. -- Daniel Gustafsson
On Wed, Mar 5, 2025 at 11:00 AM Daniel Gustafsson <daniel@yesql.se> wrote:
> On 4 Mar 2025, at 19:08, Nathan Bossart <nathandbossart@gmail.com> wrote:
> The attached patch replaces the "et cetera" with those two general categories.
LGTM.
Another option that I think would also work is to just cut down the details to just "The <option>--jobs</option> option allows multiple CPU cores to be used".
I think this is also slightly confusing, but maybe that's a non-native-english thing: "a good place to start is the maximum of the number of CPU cores and tablespaces.". Am I supposed to set it to max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)?
On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote: > Another option that I think would also work is to just cut down the details > to just "The <option>--jobs</option> option allows multiple CPU cores to be > used". That's fine with me. It's probably not particularly actionable information, anyway. If anything, IMHO we should make it clear to users that the parallelization is per-database (except for file transfer, which is per-tablespace). If you've just got one big database in the default tablespace, --jobs won't help. > I think this is also slightly confusing, but maybe that's a > non-native-english thing: "a good place to start is the maximum of the > number of CPU cores and tablespaces.". Am I supposed to set it to > max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)? I've always read it to mean the former. But I'm not sure that's great advice. If you have 8 cores and 100 tablespaces, does it make sense to use --jobs=100? Ordinarily, I'd suggest the number of cores as the starting point. -- nathan
On Wed, Mar 05, 2025 at 09:35:27AM -0600, Nathan Bossart wrote: > On Wed, Mar 05, 2025 at 01:52:40PM +0100, Magnus Hagander wrote: >> Another option that I think would also work is to just cut down the details >> to just "The <option>--jobs</option> option allows multiple CPU cores to be >> used". > > That's fine with me. It's probably not particularly actionable > information, anyway. If anything, IMHO we should make it clear to users > that the parallelization is per-database (except for file transfer, which > is per-tablespace). If you've just got one big database in the default > tablespace, --jobs won't help. > >> I think this is also slightly confusing, but maybe that's a >> non-native-english thing: "a good place to start is the maximum of the >> number of CPU cores and tablespaces.". Am I supposed to set it to >> max(cpucores, ntablespaces) or to max(cpucores+ntablespaces)? > > I've always read it to mean the former. But I'm not sure that's great > advice. If you have 8 cores and 100 tablespaces, does it make sense to use > --jobs=100? Ordinarily, I'd suggest the number of cores as the starting > point. Here's another attempt at the patch based on the latest discussion. -- nathan