Thread: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
"Susan K. McClure"
Date:
Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore by
using a postgresql.conf with various performance options set, and the --jobs option to force multiple
streams. But various tests, with various "--jobs=" numbers only achieve at most a 1 minute improvement
in elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performance
options. Am I missing some key option(s) to improve things??
The DB in question is ~25GB. The processor has 24 Cpus, 12 cores
I have tried with "--jobs = 8, 12, and 20" with little or no discernible improvements.
(FWIW - I am still building this box, it is Not in production yet, so no real users/DB activity )
My postgresql.conf has these options for (hopefully) pg_dump and pg_restore improvements:
=================
work_mem = 1GB # dump/restore Perf Value
#maintenance_work_mem = 2048MB # min 1MB
maintenance_work_mem = 1GB # dump/restore Perf Value
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#maintenance_work_mem = 2048MB # min 1MB
maintenance_work_mem = 1GB # dump/restore Perf Value
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
...
fsync = off # dump/restore Perf Valuecheckpoint_segments = 60 # dump/restore Perf Value
checkpoint_warning = 60s # dump/restore Perf Value
......
checkpoint_warning = 60s # dump/restore Perf Value
......
autovacuum = off # dump/restore Perf Value..
=============================
Any thoughts on what else I might try to improve things ?
Thanks for your time,
Susan
shared_buffers = 4096MB
Susan K. McClure
smcclure@rice.edu
713.348.4852
Attachment
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Scott Ribe
Date:
On May 1, 2015, at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote: > > The DB in question is ~25GB. The processor has 24 Cpus, 12 cores > > I have tried with "--jobs = 8, 12, and 20" with little or no discernible improvements. I would immediately suspect that you might be disk-bound instead of CPU-bound. What’s the setup of your storage system? -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ https://www.linkedin.com/in/scottribe/ (303) 722-0567 voice
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Scott Whitney
Date:
A few years back, I wrote something very similar to what ended up being in PG, and I've been using it successfully ever since.
I did find there's a law of diminishing returns after about 6 threads running pg_dump.
In my case, I'm on a 6-drive SSD RAID and getting nowhere near the IOPS of the RAID,
but my suspicion is either that I'm maxing my RAID controller's potential or pg_dump just
won't get any faster.
16 cores 3.4GHz in my case.
Now, the interesting thing is that I tried this on spindle arrays, single drives, multiple machines (quad core,
quad proc, dual core, dual core-dual proc, etc) and there always seemed to be a diminishing return around
6 or 8 jobs running.
On May 1, 2015, at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
>
> The DB in question is ~25GB. The processor has 24 Cpus, 12 cores
>
> I have tried with "--jobs = 8, 12, and 20" with little or no discernible improvements.
I would immediately suspect that you might be disk-bound instead of CPU-bound. What’s the setup of your storage system?
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Journyx, Inc.
7600 Burnet Road #300
Austin, TX 78757
www.journyx.com
Austin, TX 78757
www.journyx.com
p 512.834.8888
f 512-834-8858
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Scott Marlowe
Date:
On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:
Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore byusing a postgresql.conf with various performance options set, and the --jobs option to force multiplestreams. But various tests, with various "--jobs=" numbers only achieve at most a 1 minute improvementin elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performanceoptions. Am I missing some key option(s) to improve things??The DB in question is ~25GB. The processor has 24 Cpus, 12 coresI have tried with "--jobs = 8, 12, and 20" with little or no discernible improvements.
So have you tried 2 jobs first? I'd see how 1, 2, 3, 4 etc work. See if 2 is faster than 1, then 3 faster than 2 etc.
Most of the time, unless you've got a really fast IO subsystem increasing the --jobs doesn't make a big difference as a lot of the work is sequential. Also on restores I think the extra jobs part only kicks in for index builds.
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Keith
Date:
On Fri, May 1, 2015 at 11:49 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu> wrote:Running postgresql 9-4 on REHL 7 system. I am trying to speed up pg_dump and pg_restore byusing a postgresql.conf with various performance options set, and the --jobs option to force multiplestreams. But various tests, with various "--jobs=" numbers only achieve at most a 1 minute improvementin elapsed time versus doing pg_dump or pg_restore with no "--jobs" option and no postgresql.conf with performanceoptions. Am I missing some key option(s) to improve things??The DB in question is ~25GB. The processor has 24 Cpus, 12 coresI have tried with "--jobs = 8, 12, and 20" with little or no discernible improvements.So have you tried 2 jobs first? I'd see how 1, 2, 3, 4 etc work. See if 2 is faster than 1, then 3 faster than 2 etc.Most of the time, unless you've got a really fast IO subsystem increasing the --jobs doesn't make a big difference as a lot of the work is sequential. Also on restores I think the extra jobs part only kicks in for index builds.
Also depends how many tables you have and how big they are. Each job process can only do one table at a time, so if all your data is concentrated in 2-3 tables and the rest are fairly small, you're not going to see much of an improvement on dumps or restores.
Keith
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Jan Lentfer
Date:
Am 2015-05-01 17:49, schrieb Scott Marlowe: > On Fri, May 1, 2015 at 9:16 AM, Susan K. McClure <smcclure@rice.edu > [1]> wrote: > >> Running postgresql 9-4 on REHL 7 system. I am trying to speed up >> pg_dump and pg_restore by >> using a postgresql.conf with various performance options set, and >> the --jobs option to force multiple >> streams. But various tests, with various "--jobs=" numbers only >> achieve at most a 1 minute improvement >> in elapsed time versus doing pg_dump or pg_restore with no "--jobs" >> option and no postgresql.conf with performance >> options. Am I missing some key option(s) to improve things?? >> >> The DB in question is ~25GB. The processor has 24 Cpus, 12 cores >> >> I have tried with "--jobs = 8, 12, and 20" with little or no >> discernible improvements. > > So have you tried 2 jobs first? Id see how 1, 2, 3, 4 etc work. See > if > 2 is faster than 1, then 3 faster than 2 etc. > > Most of the time, unless youve got a really fast IO subsystem > increasing the --jobs doesnt make a big difference as a lot of the > work is sequential. Also on restores I think the extra jobs part only > kicks in for index builds. It does parallel COPY, too. Jan
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Jan Lentfer
Date:
Am 2015-05-01 17:16, schrieb Susan K. McClure: [...] > My postgresql.conf has these options for (hopefully) pg_dump and > pg_restore improvements: > > ================= > work_mem = 1GB # dump/restore Perf Value > #maintenance_work_mem = 2048MB # min 1MB > maintenance_work_mem = 1GB # dump/restore Perf Value > #autovacuum_work_mem = -1 # min 1MB, or -1 to use > maintenance_work_mem > > ... > fsync = off # dump/restore Perf Valuecheckpoint_segments = 60 # > dump/restore Perf Value > checkpoint_warning = 60s # dump/restore Perf Value > ...... > autovacuum = off # dump/restore Perf Value.. > ============================= > > Any thoughts on what else I might try to improve things ? [...] It ist more interesting to know how your database schema is. How many "large" tables do you have. What is e.g. the size of your 10,20,30 largest tables? Regards Jan
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Mel Llaguno
Date:
My understanding of parallel dump performance is that it only makes a difference when you have a large number of DBs (thousands if not tens of thousands). We performed similar testing using 9.3.x and found little performance gains using -j (with 100+ tables). See Bruce Momjian’s post : http://momjian.us/main/blogs/pgblog/2012.html Mel Llaguno • Staff Engineer – Team Lead Office: +1.403.264.9717 x3310 www.coverity.com <http://www.coverity.com/> • Twitter: @coverity On 5/4/15, 6:17 AM, "Jan Lentfer" <Jan.Lentfer@web.de> wrote: >Am 2015-05-01 17:16, schrieb Susan K. McClure: > >[...] >> My postgresql.conf has these options for (hopefully) pg_dump and >> pg_restore improvements: >> >> ================= >> work_mem = 1GB # dump/restore Perf Value >> #maintenance_work_mem = 2048MB # min 1MB >> maintenance_work_mem = 1GB # dump/restore Perf Value >> #autovacuum_work_mem = -1 # min 1MB, or -1 to use >> maintenance_work_mem >> >> ... >> fsync = off # dump/restore Perf Valuecheckpoint_segments = 60 # >> dump/restore Perf Value >> checkpoint_warning = 60s # dump/restore Perf Value >> ...... >> autovacuum = off # dump/restore Perf Value.. >> ============================= >> >> Any thoughts on what else I might try to improve things ? >[...] > >It ist more interesting to know how your database schema is. How many >"large" tables do you have. What is e.g. the size of your 10,20,30 >largest tables? > >Regards > >Jan > > > > >-- >Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-admin
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Jan Lentfer
Date:
Am 2015-05-04 18:19, schrieb Mel Llaguno: > My understanding of parallel dump performance is that it only makes a > difference when you have a large number of DBs (thousands if not tens > of > thousands). We performed similar testing using 9.3.x and found little > performance gains using -j (with 100+ tables). See Bruce Momjian’s > post : > http://momjian.us/main/blogs/pgblog/2012.html I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrapped pg_dump in a shell script to dump several databases in parallel. But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes. Regards, Jan
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Igor Neyman
Date:
-----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer Sent: Monday, May 04, 2015 12:42 PM To: Mel Llaguno Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrappedpg_dump in a shell script to dump several databases in parallel. But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes. Regards, Jan Combination of -Fc and -j definitely works in pg_dump. And there is nothing in the docs that states otherwise. Regards, Igor Neyman
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Jan Lentfer
Date:
Am 4. Mai 2015 19:22:07 MESZ, schrieb Igor Neyman <ineyman@perceptron.com>:
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer
Sent: Monday, May 04, 2015 12:42 PM
To: Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times
I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrapped pg_dump in a shell script to dump several databases in parallel.
But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes.
Regards,
Jan
Combination of -Fc and -j definitely works in pg_dump. And there is nothing in the docs that states otherwise.
Regards,
Igor Neyman
Well, the manual says
You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.
For the -j Option
Regards
Jan
Re: pg_dump and pg_restore with multiple streams does Not seem to improve overall times
From
Igor Neyman
Date:
From: Jan Lentfer [mailto:Jan.Lentfer@web.de]
Sent: Monday, May 04, 2015 2:03 PM
To: Igor Neyman; Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: RE: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times
Am 4. Mai 2015 19:22:07 MESZ, schrieb Igor Neyman <ineyman@perceptron.com>:
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Jan Lentfer
Sent: Monday, May 04, 2015 12:42 PM
To: Mel Llaguno
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] pg_dump and pg_restore with multiple streams does Not seem to improve overall times
I don't know about parallel pg_dump as we use -Fc and pg_dump can't do that in parallel (afaik). For dumping I have wrapped pg_dump in a shell script to dump several databases in parallel.
But for pg_restore -j option does make a big difference, at least when you have a lot of larger tables and indexes.
Regards,
Jan
Combination of -Fc and -j definitely works in pg_dump. And there is nothing in the docs that states otherwise.
Regards,
Igor Neyman
Well, the manual says
You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.
For the -j Option
Regards
Jan
Sorry, was talking (for some reason?) about parallel restore, which was available long before parallel dump.
Igor Neyman