Thread: Improving pg_dump performance

Improving pg_dump performance

From
Ron
Date:
Hi,

We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that 
needs to be migrated to a new data center and then restored to v9.6.9.

The database has many large tables full of bytea columns containing pdf 
images, and so the dump file is going to be more than 2x larger than the 
existing data/base...


The command is:
$ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2> 
${DATE}_${DB}.log

Using -Z0 because pdf files are already compressed.

Because of an intricate web of FK constraints and partitioned tables, the 
customer doesn't trust a set of "partitioned" backups using --table= and 
regular expressions (the names of those big tables all have the year in 
them), and so am stuck with a single-threaded backup.

Are there any config file elements that I can tweak (extra points for not 
having to restart postgres) to make it run faster, or deeper knowledge of 
how pg_restore works so that I could convince them to let me do the 
partitioned backups?

Lastly, is there any way to not make the backups so large (maybe by using 
the --binary-upgrade option, even though the man page says, "in-place 
upgrades only")?

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Andreas Kretschmer
Date:

Am 23.07.2018 um 09:23 schrieb Ron:
> Hi,
>
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database 
> that needs to be migrated to a new data center and then restored to 
> v9.6.9. 

you can use the pg_dump from the newer version (9.6) to dump the old 
database, over the net. In this way you can also use parallel backups 
(directory format)


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: Improving pg_dump performance

From
Ron
Date:
On 07/23/2018 02:32 AM, Andreas Kretschmer wrote:
>
>
> Am 23.07.2018 um 09:23 schrieb Ron:
>> Hi,
>>
>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that 
>> needs to be migrated to a new data center and then restored to v9.6.9. 
>
> you can use the pg_dump from the newer version (9.6) to dump the old 
> database, over the net. In this way you can also use parallel backups 
> (directory format)

That DC circuit is too slow, and also used by lots of other production data.

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Andreas Kretschmer
Date:

Am 23.07.2018 um 15:06 schrieb Ron:
> On 07/23/2018 02:32 AM, Andreas Kretschmer wrote:
>>
>>
>> Am 23.07.2018 um 09:23 schrieb Ron:
>>> Hi,
>>>
>>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database 
>>> that needs to be migrated to a new data center and then restored to 
>>> v9.6.9. 
>>
>> you can use the pg_dump from the newer version (9.6) to dump the old 
>> database, over the net. In this way you can also use parallel backups 
>> (directory format)
>
> That DC circuit is too slow, and also used by lots of other production 
> data.
>

install the 9.6 also on the old server, or use an other server in the 
same DC.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: Improving pg_dump performance

From
Stephen Frost
Date:
Greetings,

* Ron (ronljohnsonjr@gmail.com) wrote:
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
> needs to be migrated to a new data center and then restored to v9.6.9.

You should be using 9.6's pg_dump to perform the export.  Might be a bit
annoying to do, but you should be able to install it on to a nearby
server or the same server as 8.4 is running on but in another location.

With 9.6's pg_dump, you could use parallel mode, but you have to prevent
anything from changing the data between when the first connection from
pg_dump is made until all of the connections have completed and started
their transactions (should be just a few seconds, really).  Of course,
that export won't include any changes after the pg_dump starts, so
you'll need a way to manage those.

> The database has many large tables full of bytea columns containing pdf
> images, and so the dump file is going to be more than 2x larger than the
> existing data/base...

The dump file isn't going to include any content from indexes and, at
least looking at some PDFs locally, they can certainly be compressed
effectively sometimes, and they might be getting compressed today in
your 8.4 instance thanks to TOAST, and more to the point, the textual
representation of a bytea which will end up in the export would almost
certainly be compressable too.

> The command is:
> $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2>
> ${DATE}_${DB}.log
>
> Using -Z0 because pdf files are already compressed.

They aren't really PDF files in the export though- they're bytea's
represented in a textual format.  Have you tested the difference between
using -Z0 and allowing compression to happen?

> Because of an intricate web of FK constraints and partitioned tables, the
> customer doesn't trust a set of "partitioned" backups using --table= and
> regular expressions (the names of those big tables all have the year in
> them), and so am stuck with a single-threaded backup.

All of the FKs will be re-checked when the data is imported into the new
instance.

> Are there any config file elements that I can tweak (extra points for not
> having to restart postgres) to make it run faster, or deeper knowledge of
> how pg_restore works so that I could convince them to let me do the
> partitioned backups?

pg_restore isn't doing much more than restoring what's in the backup
into the database using COPY commands.  Since it's an export/import, all
the FKs and other constraints will be re-checked and all indexes will be
rebuilt during the import.

> Lastly, is there any way to not make the backups so large (maybe by using
> the --binary-upgrade option, even though the man page says, "in-place
> upgrades only")?

You could possibly upgrade the existing system from 8.4 to 9.6 in-place
(which would require a bit of downtime but typically on the order of
minutes instead of many hours) and then take a filesystem-level backup
using a tool like pgBackRest and then restore that at the new data as a
replica and use streaming replication until you're ready to cut over to
the new data center.  That's probably how I'd tackle this anyway, though
one nice thing about the dump/restore is that you could have checksums
enabled on the new cluster.

Thanks!

Stephen

Attachment

Re: Improving pg_dump performance

From
Adrian Klaver
Date:
On 07/23/2018 12:23 AM, Ron wrote:
> Hi,
> 
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that 
> needs to be migrated to a new data center and then restored to v9.6.9.
> 
> The database has many large tables full of bytea columns containing pdf 
> images, and so the dump file is going to be more than 2x larger than the 
> existing data/base...
> 
> 
> The command is:
> $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2> 
> ${DATE}_${DB}.log
> 
> Using -Z0 because pdf files are already compressed.

This is not consistent with your statement that the dump file will 
double in size over database size. That would imply the data is being 
decompressed on dumping. I would test -Z(>0) on one of the tables with 
PDF's to determine whether it would help or not.


> 
> Because of an intricate web of FK constraints and partitioned tables, 
> the customer doesn't trust a set of "partitioned" backups using --table= 
> and regular expressions (the names of those big tables all have the year 
> in them), and so am stuck with a single-threaded backup.
> 
> Are there any config file elements that I can tweak (extra points for 
> not having to restart postgres) to make it run faster, or deeper 
> knowledge of how pg_restore works so that I could convince them to let 
> me do the partitioned backups?


> 
> Lastly, is there any way to not make the backups so large (maybe by 
> using the --binary-upgrade option, even though the man page says, 
> "in-place upgrades only")?
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Improving pg_dump performance

From
Ron
Date:
On 07/23/2018 08:27 AM, Andreas Kretschmer wrote:
>
> Am 23.07.2018 um 15:06 schrieb Ron:
>> On 07/23/2018 02:32 AM, Andreas Kretschmer wrote:
>>>
>>>
>>> Am 23.07.2018 um 09:23 schrieb Ron:
>>>> Hi,
>>>>
>>>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database 
>>>> that needs to be migrated to a new data center and then restored to 
>>>> v9.6.9. 
>>>
>>> you can use the pg_dump from the newer version (9.6) to dump the old 
>>> database, over the net. In this way you can also use parallel backups 
>>> (directory format)
>>
>> That DC circuit is too slow, and also used by lots of other production data.
>>
>
> install the 9.6 also on the old server, 

Are there 9.6 packages for RHEL 5.10?

> or use an other server in the same DC.

An interesting idea.  To clarify: it's possible to parallel backup a running 
8.4 cluster remotely from a 9.6 system?

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Stephen Frost
Date:
Greetings,

* Ron (ronljohnsonjr@gmail.com) wrote:
> An interesting idea.  To clarify: it's possible to parallel backup a running
> 8.4 cluster remotely from a 9.6 system?

Yes, you can do a parallel backup, but you won't be able to get a
consistent snapshot.  You'll need to pause all changes to the database
while the pg_dump processes connect and start their transactions to
have the backup be consistent.

Thanks,

Stephen

Attachment

Re: Improving pg_dump performance

From
Ron
Date:
On 07/23/2018 08:46 AM, Stephen Frost wrote:
> Greetings,
>
> * Ron (ronljohnsonjr@gmail.com) wrote:
>> An interesting idea.  To clarify: it's possible to parallel backup a running
>> 8.4 cluster remotely from a 9.6 system?
> Yes, you can do a parallel backup, but you won't be able to get a
> consistent snapshot.  You'll need to pause all changes to the database
> while the pg_dump processes connect and start their transactions to
> have the backup be consistent.

I can do that!!!

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Adrian Klaver
Date:
On 07/23/2018 06:47 AM, Ron wrote:
> On 07/23/2018 08:46 AM, Stephen Frost wrote:
>> Greetings,
>>
>> * Ron (ronljohnsonjr@gmail.com) wrote:
>>> An interesting idea.  To clarify: it's possible to parallel backup a 
>>> running
>>> 8.4 cluster remotely from a 9.6 system?
>> Yes, you can do a parallel backup, but you won't be able to get a
>> consistent snapshot.  You'll need to pause all changes to the database
>> while the pg_dump processes connect and start their transactions to
>> have the backup be consistent.
> 
> I can do that!!!

Assuming you can get this setup, have you tested some subset of your 
data on 9.6.9?:

1) Going from 8.4 --> 9.6 is jumping 7 major versions of Postgres. Do 
you know that the data/code will work in 9.6.9?

2) Does the transfer have time built in for fixing problems on the 9.6.9 
end?

3) If the answer 2) is no, then is there a plan to deal with changes in 
the 8.4 database while working on the 9.6.9 database?


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Improving pg_dump performance

From
Andres Freund
Date:
Hi,

On 2018-07-23 02:23:45 -0500, Ron wrote:
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
> needs to be migrated to a new data center and then restored to v9.6.9.

Have you considered using pg_upgrade instead?

Greetings,

Andres Freund


Re: Improving pg_dump performance

From
Ron
Date:

On 07/23/2018 08:56 AM, Adrian Klaver wrote:
> On 07/23/2018 06:47 AM, Ron wrote:
>> On 07/23/2018 08:46 AM, Stephen Frost wrote:
>>> Greetings,
>>>
>>> * Ron (ronljohnsonjr@gmail.com) wrote:
>>>> An interesting idea.  To clarify: it's possible to parallel backup a 
>>>> running
>>>> 8.4 cluster remotely from a 9.6 system?
>>> Yes, you can do a parallel backup, but you won't be able to get a
>>> consistent snapshot.  You'll need to pause all changes to the database
>>> while the pg_dump processes connect and start their transactions to
>>> have the backup be consistent.
>>
>> I can do that!!!
>
> Assuming you can get this setup, have you tested some subset of your data 
> on 9.6.9?:
>
> 1) Going from 8.4 --> 9.6 is jumping 7 major versions of Postgres. Do you 
> know that the data/code will work in 9.6.9?
>
> 2) Does the transfer have time built in for fixing problems on the 9.6.9 end?
>
> 3) If the answer 2) is no, then is there a plan to deal with changes in 
> the 8.4 database while working on the 9.6.9 database?

Yes, we've migrated CAT and Staging databases, and the application has been 
tested.

And this is a test conversion of the prod databases...

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Ron
Date:
On 07/23/2018 09:11 AM, Andres Freund wrote:
> Hi,
>
> On 2018-07-23 02:23:45 -0500, Ron wrote:
>> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
>> needs to be migrated to a new data center and then restored to v9.6.9.
> Have you considered using pg_upgrade instead?

Yes, but:
1. can't find 9.6 packages on the RHEL 5.10 server.
2. The window is only 8 hours, and the data also has to be moved to a new DC 
in that window.

-- 
Angular momentum makes the world go 'round.


Re: Improving pg_dump performance

From
Andres Freund
Date:
On 2018-07-23 09:17:41 -0500, Ron wrote:
> On 07/23/2018 09:11 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2018-07-23 02:23:45 -0500, Ron wrote:
> > > We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
> > > needs to be migrated to a new data center and then restored to v9.6.9.
> > Have you considered using pg_upgrade instead?
> 
> Yes, but:
> 1. can't find 9.6 packages on the RHEL 5.10 server.

If necessary you could just build it yourself.


> 2. The window is only 8 hours, and the data also has to be moved to a new DC
> in that window.

You could just use physical drives ;)

Greetings,

Andres Freund


RE: Improving pg_dump performance

From
Kevin Brannen
Date:
-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Sent: Monday, July 23, 2018 8:56 AM
To: Ron <ronljohnsonjr@gmail.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Improving pg_dump performance

On 07/23/2018 06:47 AM, Ron wrote:
> On 07/23/2018 08:46 AM, Stephen Frost wrote:
>> Greetings,
>>
>> * Ron (ronljohnsonjr@gmail.com) wrote:
>>> An interesting idea.  To clarify: it's possible to parallel backup a
>>> running
>>> 8.4 cluster remotely from a 9.6 system?
>> Yes, you can do a parallel backup, but you won't be able to get a
>> consistent snapshot.  You'll need to pause all changes to the
>> database while the pg_dump processes connect and start their
>> transactions to have the backup be consistent.
>
> I can do that!!!

Assuming you can get this setup, have you tested some subset of your data on 9.6.9?:

-------------------

+1 on that!  Case in point...

When we upgraded from 9.5 to 9.6 (only 1 version so it doesn't sound all that bad does it?) our application failed in 2
differentplaces which we traced down to SQL failing. Both instances where something along the lines of:
 

select fields from table1 join table2 on (key) where conditionA and conditionB;

What happened was in that 9.5, the planner reordered the WHERE and did conditionB first, which always failed (at least
whenit mattered). In 9.6 the planner did conditionA first. The problem came from conditionA needing a type conversion
thatdidn't automatically exist, hence the failure. A simple casting fixed the issue and we really should have had that
inthe original version, but the data we tested with never had the characteristics that would have triggered the problem
(inour defense, the data that caused the failure had never shown itself in over 3 years of real usage, so I think I can
callthat rare).
 

The mistakes were ours, but the new version "tightened" some things and they caught us. The fixes were quite simple to
make,but it was a real surprise to us.
 

So be aware that while Pg has been very good about being backward compatible, or it has for us, you can get bit in
upgrades.Reading the release notes looking for change is good, but in the end, running your code against the new
versionis the only way to find out.
 

HTH,
Kevin

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential
information.If you are not the intended recipient, or a person responsible for delivering it to the intended recipient,
youare hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or
attachedto this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately
notifyus by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them
todisk. Thank you.