Thread: Slow pg_dump
We're having what seem like serious performance issues with pg_dump, and I hope someone can help.
We have several tables that are used to store binary data as bytea (in this example image files), but we're having similar time issues with text tables as well.
In my most recent test, the sample table was about 5 GB in 1644 rows, with image files sizes between 1 MB and 35 MB. The server was a 3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk from the data, and little else running on the sytem.
We're doing the following:
pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name
In the test above, this took 1hr 45min to complete. Since we expect to have users with 50-100GB of data, if not more, backup times that take nearly an entire day are unacceptable.
We think there must be something we're doing wrong. A search turned up a similar thread (http://archives.postgresql.org/pgsql-performance/2007-12/msg00404.php), but our number are so much higher than those that we must be doing something very wrong. Hopefully, either there's a server setting or pg_dump option we need to change, but we're open to design changes if necessary.
Can anyone who has dealt with this before advise us?
Thanks!
Ryan
"Ryan Wells" <ryan.wells@soapware.com> writes: > We have several tables that are used to store binary data as bytea (in > this example image files), Precompressed image formats, no doubt? > pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name Try it with -Z0, or even drop the -Fc completely, since it's certainly not very helpful on a single-table dump. Re-compressing already compressed data is not only useless but impressively slow ... Also, drop the -i, that's nothing but a foot-gun. regards, tom lane
The images are stored in whatever format our users load them as, so we don't have any control over their compression or lack thereof. I ran pg_dump with the arguments you suggested, and my 4 GB test table finished backing up in about 25 minutes, which seems great. The only problem is that the resulting backup file was over 9 GB. Using -Z2 resulting in a 55 minute 6GB backup. Here's my interpretation of those results: the TOAST tables for our image files are compressed by Postgres. During the backup, pg_dump uncompresses them, and if compression is turned on, recompresses the backup. Please correct me if I'm wrong there. If we can't find a workable balance using pg_dump, then it looks like our next best alternative may be a utility to handle filesystem backups, which is a little scary for on-site, user-controlled servers. Ryan -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Saturday, April 12, 2008 9:46 PM To: Ryan Wells Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Slow pg_dump "Ryan Wells" <ryan.wells@soapware.com> writes: > We have several tables that are used to store binary data as bytea (in > this example image files), Precompressed image formats, no doubt? > pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f > "backupTest.backup" -t "public"."images" db_name Try it with -Z0, or even drop the -Fc completely, since it's certainly not very helpful on a single-table dump. Re-compressing already compressed data is not only useless but impressively slow ... Also, drop the -i, that's nothing but a foot-gun. regards, tom lane
> I ran pg_dump with the arguments you suggested, and my 4 GB test table finished > backing up in about 25 minutes, which seems great. The only problem is that the > resulting backup file was over 9 GB. Using -Z2 resulting in a 55 minute 6GB backup. > > Here's my interpretation of those results: the TOAST tables for our image files > are compressed by Postgres. During the backup, pg_dump uncompresses them, and if > compression is turned on, recompresses the backup. Please correct me if I'm wrong > there. > > If we can't find a workable balance using pg_dump, then it looks like our next > best alternative may be a utility to handle filesystem backups, which is a littlescary for on-site, user-controlled servers. Ryan -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Saturday, April 12, 2008 9:46 PM To: Ryan Wells Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Slow pg_dump "Ryan Wells" <ryan.wells@soapware.com> writes: > We have several tables that are used to store binary data as bytea (in > this example image files), Precompressed image formats, no doubt? > pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f > "backupTest.backup" -t "public"."images" db_name Try it with -Z0, or even drop the -Fc completely, since it's certainly not very helpful on a single-table dump. Re-compressing already compressed data is not only useless but impressively slow ... Also, drop the -i, that's nothing but a foot-gun. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin THINK BEFORE YOU PRINT - Save paper if you don't really need to print this *******************Confidentiality and Privilege Notice******************* The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email. Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments e-mail.
(Sorry, hit send too soon!) > I ran pg_dump with the arguments you suggested, and my 4 GB test > table finished backing up in about 25 minutes, which seems great. > The only problem is that the resulting backup file was over 9 GB. > Using -Z2 resulting in a 55 minute 6GB backup. > > Here's my interpretation of those results: the TOAST tables for > our image files are compressed by Postgres. During the backup, > pg_dump uncompresses them, and if compression is turned on, > recompresses the backup. Please correct me if I'm wrong there. > > If we can't find a workable balance using pg_dump, then it looks > like our next best alternative may be a utility to handle > filesystem backups, which is a little scary for on-site, > user-controlled servers. How about a post-backup compress? pg_dump -Z0 > uncompressed-backup.sql gzip uncompressed-backup.sql mv uncompressed-backup.sql.gz compressed-backup.sql.gz Your backup is completed in reasonable time, you're just handling the storage of the backup afterwards, while users can be using the System again... THINK BEFORE YOU PRINT - Save paper if you don't really need to print this *******************Confidentiality and Privilege Notice******************* The material contained in this message is privileged and confidential to the addressee. If you are not the addressee indicated in this message or responsible for delivery of the message to such person, you may not copy or deliver this message to anyone, and you should destroy it and kindly notify the sender by reply email. Information in this message that does not relate to the official business of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta. Weatherbeeta, its employees, contractors or associates shall not be liable for direct, indirect or consequential loss arising from transmission of this message or any attachments e-mail.
Hi,
I have never dealt with tables that are made
of compressed data, but I back up the database
via crontab file like this:
<some envrironment variable setup>
.
.
filename=`date +%G%m%d.%w`.gz
/usr/local/pgsql/bin/pg_dumpall | gzip > /some_destination/$filename
.
.
Hope this helps.
Tena Sakai
tsakai@gallo.ucsf.edu
-----Original Message-----
From: pgsql-admin-owner@postgresql.org on behalf of Ryan Wells
Sent: Sat 4/12/2008 5:59 PM
To: Ryan Wells; pgsql-admin@postgresql.org
Subject: [ADMIN] Slow pg_dump
We're having what seem like serious performance issues with pg_dump, and I hope someone can help.
We have several tables that are used to store binary data as bytea (in this example image files), but we're having similar time issues with text tables as well.
In my most recent test, the sample table was about 5 GB in 1644 rows, with image files sizes between 1 MB and 35 MB. The server was a 3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk from the data, and little else running on the sytem.
We're doing the following:
pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name
In the test above, this took 1hr 45min to complete. Since we expect to have users with 50-100GB of data, if not more, backup times that take nearly an entire day are unacceptable.
We think there must be something we're doing wrong. A search turned up a similar thread (http://archives.postgresql.org/pgsql-performance/2007-12/msg00404.php), but our number are so much higher than those that we must be doing something very wrong. Hopefully, either there's a server setting or pg_dump option we need to change, but we're open to design changes if necessary.
Can anyone who has dealt with this before advise us?
Thanks!
Ryan
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes: >> Here's my interpretation of those results: the TOAST tables for >> our image files are compressed by Postgres. During the backup, >> pg_dump uncompresses them, and if compression is turned on, >> recompresses the backup. Please correct me if I'm wrong there. No, the TOAST tables aren't compressed, they're pretty much going to be the raw image data (plus a bit of overhead). What I think is happening is that COPY OUT is encoding the bytea data fairly inefficiently (one byte could go to \\nnn, five bytes) and the compression on the pg_dump side isn't doing very well at buying that back. I experimented a bit and noticed that pg_dump -Fc is a great deal smarter about storing large objects than big bytea fields --- it seems to be pretty nearly one-to-one with the original data size when storing a compressed file that was put into a large object. I dunno if it's practical for you to switch from bytea to large objects, but in the near term I think that's your only option if the dump file size is a showstopper problem for you. regards, tom lane
Thanks for the info on TOAST. We're still finding our legs with Postgres after several years on MySQL.
We do have the flexibility to adjust our data types and schema if we need to. We try to keep it to a minimum, but it's doable. I'm completely open to the possibility that we just have a very inefficient DB design or that we're misusing the data types.
We'll be running some more tests looking for the sweet spot between time and size. I expect we'll find a good balance somewhere.
Thanks!
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mon 4/14/2008 7:58 PM
To: Phillip Smith
Cc: Ryan Wells; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
>> Here's my interpretation of those results: the TOAST tables for
>> our image files are compressed by Postgres. During the backup,
>> pg_dump uncompresses them, and if compression is turned on,
>> recompresses the backup. Please correct me if I'm wrong there.
No, the TOAST tables aren't compressed, they're pretty much going to be
the raw image data (plus a bit of overhead).
What I think is happening is that COPY OUT is encoding the bytea
data fairly inefficiently (one byte could go to \\nnn, five bytes)
and the compression on the pg_dump side isn't doing very well at buying
that back.
I experimented a bit and noticed that pg_dump -Fc is a great deal
smarter about storing large objects than big bytea fields --- it seems
to be pretty nearly one-to-one with the original data size when storing
a compressed file that was put into a large object. I dunno if it's
practical for you to switch from bytea to large objects, but in the near
term I think that's your only option if the dump file size is a
showstopper problem for you.
regards, tom lane