Thread: Improving pg_dump performance
Hi, We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that needs to be migrated to a new data center and then restored to v9.6.9. The database has many large tables full of bytea columns containing pdf images, and so the dump file is going to be more than 2x larger than the existing data/base... The command is: $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2> ${DATE}_${DB}.log Using -Z0 because pdf files are already compressed. Because of an intricate web of FK constraints and partitioned tables, the customer doesn't trust a set of "partitioned" backups using --table= and regular expressions (the names of those big tables all have the year in them), and so am stuck with a single-threaded backup. Are there any config file elements that I can tweak (extra points for not having to restart postgres) to make it run faster, or deeper knowledge of how pg_restore works so that I could convince them to let me do the partitioned backups? Lastly, is there any way to not make the backups so large (maybe by using the --binary-upgrade option, even though the man page says, "in-place upgrades only")? -- Angular momentum makes the world go 'round.
Am 23.07.2018 um 09:23 schrieb Ron: > Hi, > > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database > that needs to be migrated to a new data center and then restored to > v9.6.9. you can use the pg_dump from the newer version (9.6) to dump the old database, over the net. In this way you can also use parallel backups (directory format) Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
On 07/23/2018 02:32 AM, Andreas Kretschmer wrote: > > > Am 23.07.2018 um 09:23 schrieb Ron: >> Hi, >> >> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that >> needs to be migrated to a new data center and then restored to v9.6.9. > > you can use the pg_dump from the newer version (9.6) to dump the old > database, over the net. In this way you can also use parallel backups > (directory format) That DC circuit is too slow, and also used by lots of other production data. -- Angular momentum makes the world go 'round.
Am 23.07.2018 um 15:06 schrieb Ron: > On 07/23/2018 02:32 AM, Andreas Kretschmer wrote: >> >> >> Am 23.07.2018 um 09:23 schrieb Ron: >>> Hi, >>> >>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database >>> that needs to be migrated to a new data center and then restored to >>> v9.6.9. >> >> you can use the pg_dump from the newer version (9.6) to dump the old >> database, over the net. In this way you can also use parallel backups >> (directory format) > > That DC circuit is too slow, and also used by lots of other production > data. > install the 9.6 also on the old server, or use an other server in the same DC. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Greetings, * Ron (ronljohnsonjr@gmail.com) wrote: > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that > needs to be migrated to a new data center and then restored to v9.6.9. You should be using 9.6's pg_dump to perform the export. Might be a bit annoying to do, but you should be able to install it on to a nearby server or the same server as 8.4 is running on but in another location. With 9.6's pg_dump, you could use parallel mode, but you have to prevent anything from changing the data between when the first connection from pg_dump is made until all of the connections have completed and started their transactions (should be just a few seconds, really). Of course, that export won't include any changes after the pg_dump starts, so you'll need a way to manage those. > The database has many large tables full of bytea columns containing pdf > images, and so the dump file is going to be more than 2x larger than the > existing data/base... The dump file isn't going to include any content from indexes and, at least looking at some PDFs locally, they can certainly be compressed effectively sometimes, and they might be getting compressed today in your 8.4 instance thanks to TOAST, and more to the point, the textual representation of a bytea which will end up in the export would almost certainly be compressable too. > The command is: > $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2> > ${DATE}_${DB}.log > > Using -Z0 because pdf files are already compressed. They aren't really PDF files in the export though- they're bytea's represented in a textual format. Have you tested the difference between using -Z0 and allowing compression to happen? > Because of an intricate web of FK constraints and partitioned tables, the > customer doesn't trust a set of "partitioned" backups using --table= and > regular expressions (the names of those big tables all have the year in > them), and so am stuck with a single-threaded backup. All of the FKs will be re-checked when the data is imported into the new instance. > Are there any config file elements that I can tweak (extra points for not > having to restart postgres) to make it run faster, or deeper knowledge of > how pg_restore works so that I could convince them to let me do the > partitioned backups? pg_restore isn't doing much more than restoring what's in the backup into the database using COPY commands. Since it's an export/import, all the FKs and other constraints will be re-checked and all indexes will be rebuilt during the import. > Lastly, is there any way to not make the backups so large (maybe by using > the --binary-upgrade option, even though the man page says, "in-place > upgrades only")? You could possibly upgrade the existing system from 8.4 to 9.6 in-place (which would require a bit of downtime but typically on the order of minutes instead of many hours) and then take a filesystem-level backup using a tool like pgBackRest and then restore that at the new data as a replica and use streaming replication until you're ready to cut over to the new data center. That's probably how I'd tackle this anyway, though one nice thing about the dump/restore is that you could have checksums enabled on the new cluster. Thanks! Stephen
Attachment
On 07/23/2018 12:23 AM, Ron wrote: > Hi, > > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that > needs to be migrated to a new data center and then restored to v9.6.9. > > The database has many large tables full of bytea columns containing pdf > images, and so the dump file is going to be more than 2x larger than the > existing data/base... > > > The command is: > $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2> > ${DATE}_${DB}.log > > Using -Z0 because pdf files are already compressed. This is not consistent with your statement that the dump file will double in size over database size. That would imply the data is being decompressed on dumping. I would test -Z(>0) on one of the tables with PDF's to determine whether it would help or not. > > Because of an intricate web of FK constraints and partitioned tables, > the customer doesn't trust a set of "partitioned" backups using --table= > and regular expressions (the names of those big tables all have the year > in them), and so am stuck with a single-threaded backup. > > Are there any config file elements that I can tweak (extra points for > not having to restart postgres) to make it run faster, or deeper > knowledge of how pg_restore works so that I could convince them to let > me do the partitioned backups? > > Lastly, is there any way to not make the backups so large (maybe by > using the --binary-upgrade option, even though the man page says, > "in-place upgrades only")? > -- Adrian Klaver adrian.klaver@aklaver.com
On 07/23/2018 08:27 AM, Andreas Kretschmer wrote: > > Am 23.07.2018 um 15:06 schrieb Ron: >> On 07/23/2018 02:32 AM, Andreas Kretschmer wrote: >>> >>> >>> Am 23.07.2018 um 09:23 schrieb Ron: >>>> Hi, >>>> >>>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database >>>> that needs to be migrated to a new data center and then restored to >>>> v9.6.9. >>> >>> you can use the pg_dump from the newer version (9.6) to dump the old >>> database, over the net. In this way you can also use parallel backups >>> (directory format) >> >> That DC circuit is too slow, and also used by lots of other production data. >> > > install the 9.6 also on the old server, Are there 9.6 packages for RHEL 5.10? > or use an other server in the same DC. An interesting idea. To clarify: it's possible to parallel backup a running 8.4 cluster remotely from a 9.6 system? -- Angular momentum makes the world go 'round.
Greetings, * Ron (ronljohnsonjr@gmail.com) wrote: > An interesting idea. To clarify: it's possible to parallel backup a running > 8.4 cluster remotely from a 9.6 system? Yes, you can do a parallel backup, but you won't be able to get a consistent snapshot. You'll need to pause all changes to the database while the pg_dump processes connect and start their transactions to have the backup be consistent. Thanks, Stephen
Attachment
On 07/23/2018 08:46 AM, Stephen Frost wrote: > Greetings, > > * Ron (ronljohnsonjr@gmail.com) wrote: >> An interesting idea. To clarify: it's possible to parallel backup a running >> 8.4 cluster remotely from a 9.6 system? > Yes, you can do a parallel backup, but you won't be able to get a > consistent snapshot. You'll need to pause all changes to the database > while the pg_dump processes connect and start their transactions to > have the backup be consistent. I can do that!!! -- Angular momentum makes the world go 'round.
On 07/23/2018 06:47 AM, Ron wrote: > On 07/23/2018 08:46 AM, Stephen Frost wrote: >> Greetings, >> >> * Ron (ronljohnsonjr@gmail.com) wrote: >>> An interesting idea. To clarify: it's possible to parallel backup a >>> running >>> 8.4 cluster remotely from a 9.6 system? >> Yes, you can do a parallel backup, but you won't be able to get a >> consistent snapshot. You'll need to pause all changes to the database >> while the pg_dump processes connect and start their transactions to >> have the backup be consistent. > > I can do that!!! Assuming you can get this setup, have you tested some subset of your data on 9.6.9?: 1) Going from 8.4 --> 9.6 is jumping 7 major versions of Postgres. Do you know that the data/code will work in 9.6.9? 2) Does the transfer have time built in for fixing problems on the 9.6.9 end? 3) If the answer 2) is no, then is there a plan to deal with changes in the 8.4 database while working on the 9.6.9 database? -- Adrian Klaver adrian.klaver@aklaver.com
Hi, On 2018-07-23 02:23:45 -0500, Ron wrote: > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that > needs to be migrated to a new data center and then restored to v9.6.9. Have you considered using pg_upgrade instead? Greetings, Andres Freund
On 07/23/2018 08:56 AM, Adrian Klaver wrote: > On 07/23/2018 06:47 AM, Ron wrote: >> On 07/23/2018 08:46 AM, Stephen Frost wrote: >>> Greetings, >>> >>> * Ron (ronljohnsonjr@gmail.com) wrote: >>>> An interesting idea. To clarify: it's possible to parallel backup a >>>> running >>>> 8.4 cluster remotely from a 9.6 system? >>> Yes, you can do a parallel backup, but you won't be able to get a >>> consistent snapshot. You'll need to pause all changes to the database >>> while the pg_dump processes connect and start their transactions to >>> have the backup be consistent. >> >> I can do that!!! > > Assuming you can get this setup, have you tested some subset of your data > on 9.6.9?: > > 1) Going from 8.4 --> 9.6 is jumping 7 major versions of Postgres. Do you > know that the data/code will work in 9.6.9? > > 2) Does the transfer have time built in for fixing problems on the 9.6.9 end? > > 3) If the answer 2) is no, then is there a plan to deal with changes in > the 8.4 database while working on the 9.6.9 database? Yes, we've migrated CAT and Staging databases, and the application has been tested. And this is a test conversion of the prod databases... -- Angular momentum makes the world go 'round.
On 07/23/2018 09:11 AM, Andres Freund wrote: > Hi, > > On 2018-07-23 02:23:45 -0500, Ron wrote: >> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that >> needs to be migrated to a new data center and then restored to v9.6.9. > Have you considered using pg_upgrade instead? Yes, but: 1. can't find 9.6 packages on the RHEL 5.10 server. 2. The window is only 8 hours, and the data also has to be moved to a new DC in that window. -- Angular momentum makes the world go 'round.
On 2018-07-23 09:17:41 -0500, Ron wrote: > On 07/23/2018 09:11 AM, Andres Freund wrote: > > Hi, > > > > On 2018-07-23 02:23:45 -0500, Ron wrote: > > > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that > > > needs to be migrated to a new data center and then restored to v9.6.9. > > Have you considered using pg_upgrade instead? > > Yes, but: > 1. can't find 9.6 packages on the RHEL 5.10 server. If necessary you could just build it yourself. > 2. The window is only 8 hours, and the data also has to be moved to a new DC > in that window. You could just use physical drives ;) Greetings, Andres Freund
-----Original Message----- From: Adrian Klaver [mailto:adrian.klaver@aklaver.com] Sent: Monday, July 23, 2018 8:56 AM To: Ron <ronljohnsonjr@gmail.com> Cc: pgsql-general@lists.postgresql.org Subject: Re: Improving pg_dump performance On 07/23/2018 06:47 AM, Ron wrote: > On 07/23/2018 08:46 AM, Stephen Frost wrote: >> Greetings, >> >> * Ron (ronljohnsonjr@gmail.com) wrote: >>> An interesting idea. To clarify: it's possible to parallel backup a >>> running >>> 8.4 cluster remotely from a 9.6 system? >> Yes, you can do a parallel backup, but you won't be able to get a >> consistent snapshot. You'll need to pause all changes to the >> database while the pg_dump processes connect and start their >> transactions to have the backup be consistent. > > I can do that!!! Assuming you can get this setup, have you tested some subset of your data on 9.6.9?: ------------------- +1 on that! Case in point... When we upgraded from 9.5 to 9.6 (only 1 version so it doesn't sound all that bad does it?) our application failed in 2 differentplaces which we traced down to SQL failing. Both instances where something along the lines of: select fields from table1 join table2 on (key) where conditionA and conditionB; What happened was in that 9.5, the planner reordered the WHERE and did conditionB first, which always failed (at least whenit mattered). In 9.6 the planner did conditionA first. The problem came from conditionA needing a type conversion thatdidn't automatically exist, hence the failure. A simple casting fixed the issue and we really should have had that inthe original version, but the data we tested with never had the characteristics that would have triggered the problem (inour defense, the data that caused the failure had never shown itself in over 3 years of real usage, so I think I can callthat rare). The mistakes were ours, but the new version "tightened" some things and they caught us. The fixes were quite simple to make,but it was a real surprise to us. So be aware that while Pg has been very good about being backward compatible, or it has for us, you can get bit in upgrades.Reading the release notes looking for change is good, but in the end, running your code against the new versionis the only way to find out. HTH, Kevin This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them todisk. Thank you.