pg_dump slow with bytea data - Mailing list pgsql-general

From chris r.
Subject pg_dump slow with bytea data
Date
Msg-id 4D6E0151.20402@gmx.net
Whole thread Raw
Responses Re: pg_dump slow with bytea data  ("chris r." <chricki@gmx.net>)
Re: pg_dump slow with bytea data  (Merlin Moncure <mmoncure@gmail.com>)
Re: pg_dump slow with bytea data  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
List pgsql-general
Dear list,

As discussed extensively in the past [1], pg_dump tends to be slow for
tables that contain bytea columns with large contents. Starting with
postgres version 8.5 the COPY format of bytea was changed from escape to
hex [1], giving ~50% performance boost.

However, we experience heavy problems during our weekly backup of our
database recently. We suspect the reason for this is that we changed
some columns from text with base64-encoded binary stuff to bytea
columns. This change affected a large fraction of the database (~400
GB). Note that we ran VACUUM FULL on the tables affected.

After this change our backup procedure heavily slowed down. Whereas it
took about 8 hours before the change, pg_dump is still busy with the
first table (keeping roughly 50GB) after 12 hours of backup. If I
approximate the time to complete the backup based on this, the backup
procedure would require factor 10 the time it required before the
change. The command we run is simply:  pg_dump -f <outputfile> -F c <db>

The main reason for this immense slow-down was identified in [1] as the
conversion of bytea into a compatible format (i.e. hex). However, given
the size of the db, a factor 10 makes backups practically infeasible.

We do not see any good solution to our problem except COPYing all data
in BINARY format. We understand there is a tough trade-off between
backup portability and backup efficiency here. As Bernd mentioned in
[1], however, not in all cases portability is required - particularly
not in ours.

A switch for binary output in pg_dump, or some alternative way to export
data *consistently* in binary format would be ideal for us, and probably
some others storing bytea data. Or do you see an alternative way how we
could get around this issue? Obviously, having no backup or deleting the
binary stuff from the database are no serious options.

Thanks for any discussion input in advance,
Chris

pgsql-general by date:

Previous
From: Thom Brown
Date:
Subject: Re: Comparing md5 hash with md5 password hash
Next
From: "chris r."
Date:
Subject: Re: pg_dump slow with bytea data