Thread: pg_dump and pg_restore with multiple streams does Not seem to improve overall times

Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore by
using a postgresql.conf with various performance options set, and the --jobs option to force multiple
streams.  But various tests, with various "--jobs="  numbers only achieve at most a 1 minute improvement
in elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performance
options.  Am I missing some key option(s) to improve things?? 
 
The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
 
I have tried with "--jobs = 8, 12, and 20"  with little or no discernible improvements.
 
(FWIW - I am still building this box, it is Not in production yet, so no real users/DB activity )
 
My postgresql.conf has these options for (hopefully) pg_dump and pg_restore improvements:
 
=================
work_mem = 1GB                          # dump/restore Perf Value
#maintenance_work_mem = 2048MB          # min 1MB
maintenance_work_mem = 1GB                              # dump/restore Perf Value
#autovacuum_work_mem = -1         # min 1MB, or -1 to use maintenance_work_mem
...
fsync = off                             # dump/restore Perf Valuecheckpoint_segments = 60                # dump/restore Perf Value
checkpoint_warning = 60s                # dump/restore Perf Value
......
autovacuum = off                        #  dump/restore Perf Value..
=============================
 
Any thoughts on what else I might try to improve things ?
 
Thanks for your time,
 
Susan
 
 
 
 
 
shared_buffers = 4096MB
 
Susan K. McClure
smcclure@rice.edu
713.348.4852
 
 
Attachment
On May 1, 2015, at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
>
> The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
>
> I have tried with "--jobs = 8, 12, and 20"  with little or no discernible improvements.

I would immediately suspect that you might be disk-bound instead of CPU-bound. What’s the setup of your storage system?

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice







A few years back, I wrote something very similar to what ended up being in PG, and I've been using it successfully ever since.

I did find there's a law of diminishing returns after about 6 threads running pg_dump.

In my case, I'm on a 6-drive SSD RAID and getting nowhere near the IOPS of the RAID,
but my suspicion is either that I'm maxing my RAID controller's potential or pg_dump just
won't get any faster.

16 cores 3.4GHz in my case.

Now, the interesting thing is that I tried this on spindle arrays, single drives, multiple machines (quad core,
quad proc, dual core, dual core-dual proc, etc) and there always seemed to be a diminishing return around
6 or 8 jobs running.

----- On May 1, 2015, at 10:22 AM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
On May 1, 2015, at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
>
> The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
>  
> I have tried with "--jobs = 8, 12, and 20"  with little or no discernible improvements.

I would immediately suspect that you might be disk-bound instead of CPU-bound. What’s the setup of your storage system?

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice







--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com

p 512.834.8888 
f 512-834-8858 

Do you receive our promotional emails? You can subscribe or unsubscribe to those emails here


On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore by
using a postgresql.conf with various performance options set, and the --jobs option to force multiple
streams.  But various tests, with various "--jobs="  numbers only achieve at most a 1 minute improvement
in elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performance
options.  Am I missing some key option(s) to improve things?? 
 
The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
 
I have tried with "--jobs = 8, 12, and 20"  with little or no discernible improvements.

So have you tried 2 jobs first? I'd see how 1, 2, 3, 4 etc work. See if 2 is faster than 1, then 3 faster than 2 etc.

Most of the time, unless you've got a really fast IO subsystem increasing the --jobs doesn't make a big difference as a lot of the work is sequential. Also on restores I think the extra jobs part only kicks in for index builds.


On Fri, May 1, 2015 at 11:49 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore by
using a postgresql.conf with various performance options set, and the --jobs option to force multiple
streams.  But various tests, with various "--jobs="  numbers only achieve at most a 1 minute improvement
in elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performance
options.  Am I missing some key option(s) to improve things?? 
 
The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
 
I have tried with "--jobs = 8, 12, and 20"  with little or no discernible improvements.

So have you tried 2 jobs first? I'd see how 1, 2, 3, 4 etc work. See if 2 is faster than 1, then 3 faster than 2 etc.

Most of the time, unless you've got a really fast IO subsystem increasing the --jobs doesn't make a big difference as a lot of the work is sequential. Also on restores I think the extra jobs part only kicks in for index builds.


Also depends how many tables you have and how big they are. Each job process can only do one table at a time, so if all your data is concentrated in 2-3 tables and the rest are fairly small, you're not going to see much of an improvement on dumps or restores.

Keith
Am 2015-05-01 17:49, schrieb Scott Marlowe:
> On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu
> [1]> wrote:
>
>> Running postgresql 9-4 on REHL 7 system. I am trying to speed up
>> pg_dump and pg_restore by
>> using a postgresql.conf with various performance options set, and
>> the --jobs option to force multiple
>> streams.  But various tests, with various "--jobs="  numbers only
>> achieve at most a 1 minute improvement
>> in elapsed time versus doing pg_dump or pg_restore with no "--jobs"
>> option and no postgresql.conf with performance
>> options.  Am I missing some key option(s) to improve things?? 
>>  
>> The DB in question is ~25GB.  The processor has 24 Cpus, 12 cores
>>  
>> I have tried with "--jobs = 8, 12, and 20"  with little or no
>> discernible improvements.
>
> So have you tried 2 jobs first? Id see how 1, 2, 3, 4 etc work. See
> if
> 2 is faster than 1, then 3 faster than 2 etc.
>
> Most of the time, unless youve got a really fast IO subsystem
> increasing the --jobs doesnt make a big difference as a lot of the
> work is sequential. Also on restores I think the extra jobs part only
> kicks in for index builds.

It does parallel COPY, too.

Jan



Am 2015-05-01 17:16, schrieb Susan K. McClure:

[...]
> My postgresql.conf has these options for (hopefully) pg_dump and
> pg_restore improvements:
>
> =================
> work_mem = 1GB # dump/restore Perf Value
> #maintenance_work_mem = 2048MB # min 1MB
> maintenance_work_mem = 1GB # dump/restore Perf Value
> #autovacuum_work_mem = -1 # min 1MB, or -1 to use
> maintenance_work_mem
>
> ...
> fsync = off # dump/restore Perf Valuecheckpoint_segments = 60 #
> dump/restore Perf Value
> checkpoint_warning = 60s # dump/restore Perf Value
> ......
> autovacuum = off # dump/restore Perf Value..
> =============================
>
> Any thoughts on what else I might try to improve things ?
[...]

It ist more interesting to know how your database schema is. How many
"large" tables do you have. What is e.g. the size of your 10,20,30
largest tables?

Regards

Jan





My understanding of parallel dump performance is that it only makes a
difference when you have a large number of DBs (thousands if not tens of
thousands). We performed similar testing using 9.3.x and found little
performance gains using -j (with 100+ tables). See Bruce Momjian’s post :
http://momjian.us/main/blogs/pgblog/2012.html

Mel Llaguno • Staff Engineer – Team Lead
Office: +1.403.264.9717 x3310
www.coverity.com <http://www.coverity.com/> • Twitter: @coverity


On 5/4/15, 6:17 AM, "Jan Lentfer" <Jan.Lentfer@web.de> wrote:

>Am 2015-05-01 17:16, schrieb Susan K. McClure:
>
>[...]
>> My postgresql.conf has these options for (hopefully) pg_dump and
>> pg_restore improvements:
>>
>> =================
>> work_mem = 1GB # dump/restore Perf Value
>> #maintenance_work_mem = 2048MB # min 1MB
>> maintenance_work_mem = 1GB # dump/restore Perf Value
>> #autovacuum_work_mem = -1 # min 1MB, or -1 to use
>> maintenance_work_mem
>>
>> ...
>> fsync = off # dump/restore Perf Valuecheckpoint_segments = 60 #
>> dump/restore Perf Value
>> checkpoint_warning = 60s # dump/restore Perf Value
>> ......
>> autovacuum = off # dump/restore Perf Value..
>> =============================
>>
>> Any thoughts on what else I might try to improve things ?
>[...]
>
>It ist more interesting to know how your database schema is. How many
>"large" tables do you have. What is e.g. the size of your 10,20,30
>largest tables?
>
>Regards
>
>Jan
>
>
>
>
>-- 
>Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-admin


Am 2015-05-04 18:19, schrieb Mel Llaguno:
> My understanding of parallel dump performance is that it only makes a
> difference when you have a large number of DBs (thousands if not tens
> of
> thousands). We performed similar testing using 9.3.x and found little
> performance gains using -j (with 100+ tables). See Bruce Momjian’s
> post :
> http://momjian.us/main/blogs/pgblog/2012.html

I don't know about parallel pg_dump as we use -Fc and pg_dump can't do
that in parallel (afaik). For dumping I have wrapped pg_dump in a shell
script to dump several databases in parallel.
But for pg_restore -j option does make a big difference, at least when
you have a lot of larger tables and indexes.

Regards,

Jan




-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer
Sent: Monday, May 04, 2015 12:42 PM
To: Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times

I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have
wrappedpg_dump in a shell script to dump several databases in parallel.
 
But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes.

Regards,

Jan


Combination of -Fc  and -j definitely works in pg_dump.  And there is nothing in the docs that states otherwise.

Regards,
Igor Neyman




Am 4. Mai 2015 19:22:07 MESZ, schrieb Igor Neyman <ineyman@perceptron.com>:


-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer
Sent: Monday, May 04, 2015 12:42 PM
To: Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times

I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrapped pg_dump in a shell script to dump several databases in parallel.
But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes.

Regards,

Jan


Combination of -Fc and -j definitely works in pg_dump. And there is nothing in the docs that states otherwise.

Regards,
Igor Neyman


Well, the manual says

You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.

For the -j Option

Regards

Jan

 

 

From: Jan Lentfer [mailto:Jan.Lentfer@web.de]
Sent: Monday, May 04, 2015 2:03 PM
To: Igor Neyman; Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: RE: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times

 

 

Am 4. Mai 2015 19:22:07 MESZ, schrieb Igor Neyman <ineyman@perceptron.com>:



-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer
Sent: Monday, May 04, 2015 12:42 PM
To: Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times

I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrapped pg_dump in a shell script to dump several databases in parallel.
But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes.

Regards,

Jan


Combination of -Fc  and -j definitely works in pg_dump.  And there is nothing in the docs that states otherwise.

Regards,
Igor Neyman


Well, the manual says

You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.

For the -j Option

Regards

Jan

 

 

Sorry, was talking (for some reason?) about parallel restore, which was available long before parallel dump.

 

Igor Neyman