Thread: Backup strategies with significant bytea data

Backup strategies with significant bytea data

From
"Leigh Dyer"
Date:
Hi,

For years now I've simply backed up my databases by doing a nightly
pg_dump, but since we added the ability for users to import binary files
in to our application, which are stored in a bytea fields, the dump
sizes have gone through the roof — even with gzip compression, they're
significantly larger than the on-disk size of the database. My guess is
that this due to the way that the binary data from the bytea fields is
encoded in the dump file when it's produced.

Is there any way I can reduce the size of my pg_dump files?
Alternatively, are there other backup methods worth investigating? PITR
looks promising, but I have about a dozen databases on the machine, and
I'd need to be able to restore them individually.

Thanks
Leigh




Please consider the environment before printing this message

Re: Backup strategies with significant bytea data

From
Ivan Voras
Date:
Leigh Dyer wrote:
> Hi,
>
> For years now I've simply backed up my databases by doing a nightly
> pg_dump, but since we added the ability for users to import binary files
> in to our application, which are stored in a bytea fields, the dump
> sizes have gone through the roof — even with gzip compression, they're
> significantly larger than the on-disk size of the database. My guess is
> that this due to the way that the binary data from the bytea fields is
> encoded in the dump file when it's produced.

Have you tried another dump format? E.g. "-F c" argument to pg_dump?

Re: Backup strategies with significant bytea data

From
"Leigh Dyer"
Date:
Ivan Voras wrote:
> Leigh Dyer wrote:
>> Hi,
>>
>>For years now I've simply backed up my databases by doing a nightly
>>pg_dump, but since we added the ability for users to import binary
files
>>in to our application, which are stored in a bytea fields, the dump
>>sizes have gone through the roof — even with gzip compression,
they're
>>significantly larger than the on-disk size of the database. My guess
is
>>that this due to the way that the binary data from the bytea fields
is
>>encoded in the dump file when it's produced.
>
>Have you tried another dump format? E.g. "-F c" argument to pg_dump?

Hi Ivan,

I have tried the custom dump format, but it made no difference to the
backup sizes. Here are some numbers:

Database size on disk (according to pg_database_size): 1017MB
pg_dump size (standard format): 3475MB
pg_dump size (standard format, gzip-compressed): 1132MB
pg_dump size (custom format, ie: "-F c" option): 1134MB

This testing was done on my development server, with the database
restored a backup from the production server, so the database size on
disk might be a bit smaller than it would be if this was a running
database with some dead tuples in there. Still, there's a very big
difference between the raw data size and the dump sizes.

Thanks
Leigh



Please consider the environment before printing this message