Thread: Slow pg_dump

Slow pg_dump

From
"Ryan Wells"
Date:

We're having what seem like serious performance issues with pg_dump, and I hope someone can help. 

We have several tables that are used to store binary data as bytea (in this example image files), but we're having similar time issues with text tables as well.

In my most recent test, the sample table was about 5 GB in 1644 rows, with image files sizes between 1 MB and 35 MB.  The server was a 3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk from the data, and little else running on the sytem.

We're doing the following:

pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name

In the test above, this took 1hr 45min to complete.  Since we expect to have users with 50-100GB of data, if not more, backup times that take nearly an entire day are unacceptable.

We think there must be something we're doing wrong. A search turned up a similar thread (http://archives.postgresql.org/pgsql-performance/2007-12/msg00404.php), but our number are so much higher than those that we must be doing something very wrong.  Hopefully, either there's a server setting or pg_dump option we need to change, but we're open to design changes if necessary.

Can anyone who has dealt with this before advise us?

Thanks!
Ryan






Re: Slow pg_dump

From
Tom Lane
Date:
"Ryan Wells" <ryan.wells@soapware.com> writes:
> We have several tables that are used to store binary data as bytea (in
> this example image files),

Precompressed image formats, no doubt?

> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly
not very helpful on a single-table dump.  Re-compressing already
compressed data is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

            regards, tom lane

Re: Slow pg_dump

From
"Ryan Wells"
Date:
The images are stored in whatever format our users load them as, so we
don't have any control over their compression or lack thereof.

I ran pg_dump with the arguments you suggested, and my 4 GB test table
finished backing up in about 25 minutes, which seems great.  The only
problem is that the resulting backup file was over 9 GB.  Using -Z2
resulting in a 55 minute 6GB backup.

Here's my interpretation of those results: the TOAST tables for our
image files are compressed by Postgres.  During the backup, pg_dump
uncompresses them, and if compression is turned on, recompresses the
backup.  Please correct me if I'm wrong there.

If we can't find a workable balance using pg_dump, then it looks like
our next best alternative may be a utility to handle filesystem backups,
which is a little scary for on-site, user-controlled servers.

Ryan

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, April 12, 2008 9:46 PM
To: Ryan Wells
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump

"Ryan Wells" <ryan.wells@soapware.com> writes:
> We have several tables that are used to store binary data as bytea (in

> this example image files),

Precompressed image formats, no doubt?

> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f
> "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly
not very helpful on a single-table dump.  Re-compressing already
compressed data is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

            regards, tom lane

Re: Slow pg_dump

From
"Phillip Smith"
Date:
> I ran pg_dump with the arguments you suggested, and my 4 GB test table
finished
> backing up in about 25 minutes, which seems great.  The only problem is
that the
> resulting backup file was over 9 GB.  Using -Z2 resulting in a 55 minute
6GB backup.
>
> Here's my interpretation of those results: the TOAST tables for our image
files
> are compressed by Postgres.  During the backup, pg_dump uncompresses them,
and if
> compression is turned on, recompresses the backup.  Please correct me if
I'm wrong
> there.
>
> If we can't find a workable balance using pg_dump, then it looks like our
next
> best alternative may be a utility to handle filesystem backups, which is a
littlescary for on-site, user-controlled servers.

Ryan

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, April 12, 2008 9:46 PM
To: Ryan Wells
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump

"Ryan Wells" <ryan.wells@soapware.com> writes:
> We have several tables that are used to store binary data as bytea (in

> this example image files),

Precompressed image formats, no doubt?

> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f
> "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly not
very helpful on a single-table dump.  Re-compressing already compressed data
is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

            regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make
changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

Re: Slow pg_dump

From
"Phillip Smith"
Date:
(Sorry, hit send too soon!)

> I ran pg_dump with the arguments you suggested, and my 4 GB test
> table finished backing up in about 25 minutes, which seems great.
> The only problem is that the resulting backup file was over 9 GB.
> Using -Z2 resulting in a 55 minute 6GB backup.
>
> Here's my interpretation of those results: the TOAST tables for
> our image files are compressed by Postgres.  During the backup,
> pg_dump uncompresses them, and if compression is turned on,
> recompresses the backup.  Please correct me if I'm wrong there.
>
> If we can't find a workable balance using pg_dump, then it looks
> like our next best alternative may be a utility to handle
> filesystem backups, which is a little scary for on-site,
> user-controlled servers.

How about a post-backup compress?
    pg_dump -Z0 > uncompressed-backup.sql
    gzip uncompressed-backup.sql
    mv uncompressed-backup.sql.gz compressed-backup.sql.gz

Your backup is completed in reasonable time, you're just handling
the storage of the backup afterwards, while users can be using the
System again...


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

Re: Slow pg_dump

From
"Tena Sakai"
Date:

Hi,

I have never dealt with tables that are made
of compressed data, but I back up the database
via crontab file like this:

     <some envrironment variable setup>
              .
              .
  filename=`date +%G%m%d.%w`.gz
  /usr/local/pgsql/bin/pg_dumpall | gzip > /some_destination/$filename
              .
              .

Hope this helps.

Tena Sakai
tsakai@gallo.ucsf.edu


-----Original Message-----
From: pgsql-admin-owner@postgresql.org on behalf of Ryan Wells
Sent: Sat 4/12/2008 5:59 PM
To: Ryan Wells; pgsql-admin@postgresql.org
Subject: [ADMIN] Slow pg_dump


We're having what seem like serious performance issues with pg_dump, and I hope someone can help. 

We have several tables that are used to store binary data as bytea (in this example image files), but we're having similar time issues with text tables as well.

In my most recent test, the sample table was about 5 GB in 1644 rows, with image files sizes between 1 MB and 35 MB.  The server was a 3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk from the data, and little else running on the sytem.

We're doing the following:

pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name

In the test above, this took 1hr 45min to complete.  Since we expect to have users with 50-100GB of data, if not more, backup times that take nearly an entire day are unacceptable.

We think there must be something we're doing wrong. A search turned up a similar thread (http://archives.postgresql.org/pgsql-performance/2007-12/msg00404.php), but our number are so much higher than those that we must be doing something very wrong.  Hopefully, either there's a server setting or pg_dump option we need to change, but we're open to design changes if necessary.

Can anyone who has dealt with this before advise us?

Thanks!
Ryan







Re: Slow pg_dump

From
Tom Lane
Date:
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
>> Here's my interpretation of those results: the TOAST tables for
>> our image files are compressed by Postgres.  During the backup,
>> pg_dump uncompresses them, and if compression is turned on,
>> recompresses the backup.  Please correct me if I'm wrong there.

No, the TOAST tables aren't compressed, they're pretty much going to be
the raw image data (plus a bit of overhead).

What I think is happening is that COPY OUT is encoding the bytea
data fairly inefficiently (one byte could go to \\nnn, five bytes)
and the compression on the pg_dump side isn't doing very well at buying
that back.

I experimented a bit and noticed that pg_dump -Fc is a great deal
smarter about storing large objects than big bytea fields --- it seems
to be pretty nearly one-to-one with the original data size when storing
a compressed file that was put into a large object.  I dunno if it's
practical for you to switch from bytea to large objects, but in the near
term I think that's your only option if the dump file size is a
showstopper problem for you.

            regards, tom lane

Re: Slow pg_dump

From
"Ryan Wells"
Date:

Thanks for the info on TOAST.  We're still finding our legs with Postgres after several years on MySQL.

We do have the flexibility to adjust our data types and schema if we need to.  We try to keep it to a minimum, but it's doable.  I'm completely open to the possibility that we just have a very inefficient DB design or that we're misusing the data types.

We'll be running some more tests looking for the sweet spot between time and size.  I expect we'll find a good balance somewhere.

Thanks!

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mon 4/14/2008 7:58 PM
To: Phillip Smith
Cc: Ryan Wells; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump

"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
>> Here's my interpretation of those results: the TOAST tables for
>> our image files are compressed by Postgres.  During the backup,
>> pg_dump uncompresses them, and if compression is turned on,
>> recompresses the backup.  Please correct me if I'm wrong there.

No, the TOAST tables aren't compressed, they're pretty much going to be
the raw image data (plus a bit of overhead).

What I think is happening is that COPY OUT is encoding the bytea
data fairly inefficiently (one byte could go to \\nnn, five bytes)
and the compression on the pg_dump side isn't doing very well at buying
that back.

I experimented a bit and noticed that pg_dump -Fc is a great deal
smarter about storing large objects than big bytea fields --- it seems
to be pretty nearly one-to-one with the original data size when storing
a compressed file that was put into a large object.  I dunno if it's
practical for you to switch from bytea to large objects, but in the near
term I think that's your only option if the dump file size is a
showstopper problem for you.

                        regards, tom lane