Thread: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel

[WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel

From
Jaime Casanova
Date:
Hi,

This patch adds a new option (-J num, --jobs-per-disk=num) in 
pg_upgrade to speed up copy mode. This generates upto ${num} 
processes per tablespace to copy segments of the same relfilenode 
in parallel.

This can help when you have many multi gigabyte tables (each segment 
is 1GB by default) in different tablespaces (each tablespace in a 
different disk) and multiple processors.

In a customer's database (~20Tb) it went down from 6h to 4h 45min.

It lacks documentation and I need help with WIN32 part of it, I created
this new mail to put the patch on the next commitfest.

Original thread: https://www.postgresql.org/message-id/flat/YZVbtHKYP02AZDIO%40ahch-to

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment
Hi,

On 2022-02-01 21:57:00 -0500, Jaime Casanova wrote:
> This patch adds a new option (-J num, --jobs-per-disk=num) in 
> pg_upgrade to speed up copy mode. This generates upto ${num} 
> processes per tablespace to copy segments of the same relfilenode 
> in parallel.
> 
> This can help when you have many multi gigabyte tables (each segment 
> is 1GB by default) in different tablespaces (each tablespace in a 
> different disk) and multiple processors.
> 
> In a customer's database (~20Tb) it went down from 6h to 4h 45min.
> 
> It lacks documentation and I need help with WIN32 part of it, I created
> this new mail to put the patch on the next commitfest.

The patch currently fails on cfbot due to warnings, likely related due to the
win32 issue: https://cirrus-ci.com/task/4566046517493760?logs=mingw_cross_warning#L388

As it's a new patch submitted to the last CF, hasn't gotten any review yet and
misses some platform support, it seems like there's no chance it can make it
into 15?

Greetings,

Andres Freund



Re: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel

From
Jaime Casanova
Date:
On Mon, Mar 21, 2022 at 05:34:31PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-02-01 21:57:00 -0500, Jaime Casanova wrote:
> > This patch adds a new option (-J num, --jobs-per-disk=num) in 
> > pg_upgrade to speed up copy mode. This generates upto ${num} 
> > processes per tablespace to copy segments of the same relfilenode 
> > in parallel.
> > 
> > This can help when you have many multi gigabyte tables (each segment 
> > is 1GB by default) in different tablespaces (each tablespace in a 
> > different disk) and multiple processors.
> > 
> > In a customer's database (~20Tb) it went down from 6h to 4h 45min.
> > 
> > It lacks documentation and I need help with WIN32 part of it, I created
> > this new mail to put the patch on the next commitfest.
> 
> The patch currently fails on cfbot due to warnings, likely related due to the
> win32 issue: https://cirrus-ci.com/task/4566046517493760?logs=mingw_cross_warning#L388
> 
> As it's a new patch submitted to the last CF, hasn't gotten any review yet and
> misses some platform support, it seems like there's no chance it can make it
> into 15?
> 

Hi,

Because I have zero experience on the windows side of this, I will take
some time to complete that part.

Should we move this to the next commitfest (and make 16 the target for
this)?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



On Sun, Mar 27, 2022 at 11:07:27AM -0500, Jaime Casanova wrote:
> > > It lacks documentation and I need help with WIN32 part of it, I created
> > > this new mail to put the patch on the next commitfest.
> > 
> > The patch currently fails on cfbot due to warnings, likely related due to the
> > win32 issue: https://cirrus-ci.com/task/4566046517493760?logs=mingw_cross_warning#L388
> > 
> > As it's a new patch submitted to the last CF, hasn't gotten any review yet and
> > misses some platform support, it seems like there's no chance it can make it
> > into 15?
> 
> Because I have zero experience on the windows side of this, I will take
> some time to complete that part.
> 
> Should we move this to the next commitfest (and make 16 the target for
> this)?

Done.

src/tools/ci/README may help test this under windows, but that's probably not enough
to allow writing the win-specific parts.

I guess you'll need to write tests for this..unfortunately that requires files
>1GB in size, unless you recompile postgres :(

It may be good enough to write an 0002 patch meant for CI only, but not
intended to be merged.  That can create a 2300MB table in src/test/regress, and
change pg_upgrade to run with (or default to) multiple jobs per tablespace.
Make sure it fails if the loop around relfilenodes doesn't work.

I can't help with win32, but that would be enough to verify it if someone else
fills in the windows parts.

-- 
Justin



Re: [WIP] Allow pg_upgrade to copy segments of the same relfilenode in parallel

From
Jacob Champion
Date:
This entry has been waiting on author input for a while (our current
threshold is roughly two weeks), so I've marked it Returned with
Feedback.

Once you think the patchset is ready for review again, you (or any
interested party) can resurrect the patch entry by visiting

    https://commitfest.postgresql.org/38/3525/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob