Thread: pg_dump/pg_restore --jobs practical limit?

pg_dump/pg_restore --jobs practical limit?

From

Ron

Date:

01 November 2023, 18:09:10

I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).

-- 
Born in Arizona, moved to Babylonia.

Re: pg_dump/pg_restore --jobs practical limit?

From

Laurenz Albe

Date:

01 November 2023, 20:42:45

On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:
> I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg
> 14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed
> by LVM, and are all on ESX blades.  nproc count on some is 16 and on others
> is 32.
>
> Does anyone have experience as to the point of diminishing returns?
>
> IOW, can I crank them processes up to --jobs=30, will I see no gain -- or
> even degradation -- after, for example, --jobs=24?
>
> This would be for both pg_dump and pg_restore (which would be run on the
> RHEL 8 VM).

Test, test, test.  Theoretical considerations are pretty worthless, and it
is easy to measure that.

Yours,
Laurenz Albe

Re: pg_dump/pg_restore --jobs practical limit?

From

Ron

Date:

01 November 2023, 21:16:35

On 11/1/23 15:42, Laurenz Albe wrote:

On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:

I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).

Test, test, test.  Theoretical considerations are pretty worthless,

Which is why I asked if anyone has experience.

 and it is easy to measure that.

Not necessarily. Our test systems are way too small (only good enough to validate that the script works correctly), and there's always something (sometimes a lot, sometime just "some") going on in production, whether it's my customer's work, or the SAN (like snapshotting every VM and then copying the snapshots to the virtual tape device) or something else.

--
Born in Arizona, moved to Babylonia.

Re: pg_dump/pg_restore --jobs practical limit?

From

Brad White

Date:

02 November 2023, 01:05:06

From: Ron <ronljohnsonjr@gmail.com>
Sent: Thursday, November 2, 2023 3:01:47 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: pg_dump/pg_restore --jobs practical limit?

On 11/1/23 15:42, Laurenz Albe wrote:

On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:

I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).

Test, test, test.  Theoretical considerations are pretty worthless,

Which is why I asked if anyone has experience.

 and it is easy to measure that.

Sure, but are the new systems busy already?

Ideally you could run tests on them before they are put into production.

Re: pg_dump/pg_restore --jobs practical limit?

From

Ron

Date:

02 November 2023, 01:20:16

On 11/1/23 20:05, Brad White wrote:

From: Ron <ronljohnsonjr@gmail.com>
Sent: Thursday, November 2, 2023 3:01:47 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: pg_dump/pg_restore --jobs practical limit?

On 11/1/23 15:42, Laurenz Albe wrote:
On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:
I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).
Test, test, test.  Theoretical considerations are pretty worthless,
Which is why I asked if anyone has experience.
 and it is easy to measure that.
Not necessarily. Our test systems are way too small (only good enough to validate that the script works correctly), and there's always something (sometimes a lot, sometime just "some") going on in production, whether it's my customer's work, or the SAN (like snapshotting every VM and then copying the snapshots to the virtual tape device) or something else.

Sure, but are the new systems busy already?
Ideally you could run tests on them before they are put into production.

Testing pg_restore with different --jobs= values will be easier. pg_dump is what's going to be reading from a constantly varying system.

--
Born in Arizona, moved to Babylonia.

Re: pg_dump/pg_restore --jobs practical limit?

From

Tomek

Date:

02 November 2023, 09:07:50

Hi!

When I really want to use all the resources - I set the number of jobs to a value equal to the number of CPU plus 1. Probably there is no reason to make run more jobs than number of CPU :-) .
Unfortunately, pg_dump will not allocate more than one thread to a table, even a huge one (unless it is partitioned) - so, sometimes it is no sense to define many jobs when you have one or two big tables and the rest of them are relatively small - in such situation there will be no difference if you define 4 or 10 jobs.
But, yes testing is the best way to get known :-).

   Regards Tomek

czw., 2 lis 2023 o 02:20 Ron <ronljohnsonjr@gmail.com> napisał(a):

On 11/1/23 20:05, Brad White wrote:
From: Ron <ronljohnsonjr@gmail.com>
Sent: Thursday, November 2, 2023 3:01:47 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: pg_dump/pg_restore --jobs practical limit?

On 11/1/23 15:42, Laurenz Albe wrote:
On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:
I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).
Test, test, test.  Theoretical considerations are pretty worthless,
Which is why I asked if anyone has experience.
 and it is easy to measure that.
Not necessarily. Our test systems are way too small (only good enough to validate that the script works correctly), and there's always something (sometimes a lot, sometime just "some") going on in production, whether it's my customer's work, or the SAN (like snapshotting every VM and then copying the snapshots to the virtual tape device) or something else.

Sure, but are the new systems busy already?
Ideally you could run tests on them before they are put into production.
Testing pg_restore with different --jobs= values will be easier. pg_dump is what's going to be reading from a constantly varying system.

--
Born in Arizona, moved to Babylonia.

Re: pg_dump/pg_restore --jobs practical limit?

From

Marc Millas

Date:

03 November 2023, 10:09:27

Marc MILLAS

Senior Architect

+33607850334

www.mokadb.com

Testing pg_restore with different --jobs= values will be easier. pg_dump is what's going to be reading from a constantly varying system.
Hello,

each time I do a replatforming of this kind, with DB up to 2 TB, I did create the target DB, eventually needed users then the appropriate databases, and finally, a simple script to pipe pg_dump into psql, databases one by one.

So.. one thread. Each time, it was limited by the network bandwidth. My last replatforming with a 10 Gb net and a 1.5 TB DB did show a transfer of 500 Mbytes per second (5Gbs) so.. less than an hour.

which is just fine. Launch it, have lunch, a coffee, and ...done for test. For Prod, I am used to do it at the quietest night of the week end. and have a nap ( a short one !)...:-)

Re: pg_dump/pg_restore --jobs practical limit?

From

Ron

Date:

03 November 2023, 13:19:42

On 11/3/23 05:09, Marc Millas wrote:

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com

Testing pg_restore with different --jobs= values will be easier. pg_dump is what's going to be reading from a constantly varying system.
Hello,
each time I do a replatforming of this kind, with DB up to 2 TB, I did create the target DB, eventually needed users then the appropriate databases, and finally, a simple script to pipe pg_dump into psql, databases one by one.
So.. one thread. Each time, it was limited by the network bandwidth. My last replatforming with a 10 Gb net and a 1.5 TB DB did show a transfer of 500 Mbytes per second (5Gbs) so.. less than an hour.

I'm shocked that old-school "SQL" single-threading works that fast.

Thanks for the data point.

which is just fine. Launch it, have lunch, a coffee, and ...done for test. For Prod, I am used to do it at the quietest night of the week end. and have a nap ( a short one !)...:-)

--
Born in Arizona, moved to Babylonia.