Thread: Updating large postgresql database with blobs

Updating large postgresql database with blobs

From

"CAJ CAJ"

Date:

11 March 2007, 21:18:28

Hello, <br /><br />I didn't get any response on the GENERAL list so i'm escalating this ....<br /><br />We have several
independentdatabase servers with ~50GB+ databases running postgres 8.0.x. We are planning to upgrade these databases to
postgres8.2.x over the weekend <br /><br />We plan to use the following steps to upgrade each server, <br /><br />1.
Dumpthe 8.0.x database cluster using 8.2.x pg_dumpall<br />% ./pg_dumpall > pgdumpall_backup.sql<br /><br />2.Dump
the8.0.x database  including large objects in  compressed custom format using 8.2.x pg_dump <br /> % ./pg_dump -Fc -b
-Z9dbname > pgdump_lobs_backup<br /><br /><br />Restoring database<br />1. Initialize 8.2.x darabase<br />% initdb
-D/data/pgdata<br /><br />2. Restore template1 database from cluster dump<br />% ./psql -d template1 <
pgdumpall_backup.sql<br /><br />3. Delete database dbname else restoring will give error about existing dbname<br />%
dropdbdbname<br /><br />4. Create fresh dbname<br />% createdb -O dbowner dbname<br /><br />5. Restore database with
lobs<br/>% ./pg_restore -v -Fc -d dbname -e -U dbowner < pgdumpall_lobs_backup <br /><br />Some of the problems we
haveare,<br />1. We are not sure if all of the data will be available after dump/restore with above process<br />2. The
dumpand restore process is very very slow to be complete over the weekend (takes approx 1GB/hr to dump on a dual G5 PPC
2Ghzwith 1GB RAM and RAID 1 disks) <br /><br />What is the fastest way to upgrade postgres for large databases that has
binaryobjects?<br /><br />Thanks for all your help.

Re: Updating large postgresql database with blobs

From

Andrew Dunstan

Date:

12 March 2007, 12:22:44

CAJ CAJ wrote:
> Hello,
>
> I didn't get any response on the GENERAL list so i'm escalating this ....
>
> We have several independent database servers with ~50GB+ databases 
> running postgres 8.0.x. We are planning to upgrade these databases to 
> postgres 8.2.x over the weekend
>
> We plan to use the following steps to upgrade each server,
>
> 1. Dump the 8.0.x database cluster using 8.2.x pg_dumpall
> % ./pg_dumpall > pgdumpall_backup.sql
>
> 2.Dump the 8.0.x database  including large objects in  compressed 
> custom format using 8.2.x pg_dump
> % ./pg_dump -Fc -b -Z9 dbname > pgdump_lobs_backup
>
>
> Restoring database
> 1. Initialize 8.2.x darabase
> % initdb -D /data/pgdata
>
> 2. Restore template1 database from cluster dump
> % ./psql -d template1 < pgdumpall_backup.sql
>
> 3. Delete database dbname else restoring will give error about 
> existing dbname
> % dropdb dbname
>
> 4. Create fresh dbname
> % createdb -O dbowner dbname
>
> 5. Restore database with lobs
> % ./pg_restore -v -Fc -d dbname -e -U dbowner < pgdumpall_lobs_backup
>
> Some of the problems we have are,
> 1. We are not sure if all of the data will be available after 
> dump/restore with above process
> 2. The dump and restore process is very very slow to be complete over 
> the weekend (takes approx 1GB/hr to dump on a dual G5 PPC 2Ghz with 
> 1GB RAM and RAID 1 disks)
>
> What is the fastest way to upgrade postgres for large databases that 
> has binary objects?
>
> Thanks for all your help. 


Your procedure dumps and restore the databases twice. This seems less 
than sound. My prediction is that you could get a 50% speed improvement 
by fixing that ...

The only thing you really need pg_dumpall for is the global tables. I 
would just use pg_dumpall -g to get those, and then use pg_dump -F c  + 
pg_restore for each actual database.

Another thing is to make sure that pg_dump/pg_restore are not competing 
with postgres for access to the same disk(s). One way to do that is to 
run them from a different machine - they don't have to be run on the 
server machine - of course then the network can become a bottleneck, so 
YMMV.

cheers

andrew

Re: Updating large postgresql database with blobs

From

"CAJ CAJ"

Date:

12 March 2007, 14:29:08

<snip>

> What is the fastest way to upgrade postgres for large databases that
> has binary objects?

Your procedure dumps and restore the databases twice. This seems less
than sound. My prediction is that you could get a 50% speed improvement
by fixing that ...

Thanks for the response. This'd be wonderful if I can get my process right. My assumptions (probably incorrect) are that pgdump has to be excuted twice on a database with blobs. Once to get the data and once to get the blob (using the -b flag).

The only thing you really need pg_dumpall for is the global tables. I
would just use pg_dumpall -g to get those, and then use pg_dump -F c +
pg_restore for each actual database.

This makes sense :) I assume that running pg_dump with -b will get all of the data including the blobs?

Another thing is to make sure that pg_dump/pg_restore are not competing
with postgres for access to the same disk(s). One way to do that is to
run them from a different machine - they don't have to be run on the
server machine - of course then the network can become a bottleneck, so
YMMV.

We are using separate servers for dump and restore.

Thanks again for your suggestions. This helps immensely.

Re: Updating large postgresql database with blobs

From

Andrew Dunstan

Date:

12 March 2007, 14:40:21

CAJ CAJ wrote:
>
>
> Thanks for the response. This'd be wonderful if I can get my process 
> right. My assumptions (probably incorrect) are that pgdump has to be 
> excuted twice on a database with blobs. Once to get the  data and once 
> to get the blob  (using the -b flag).
>
>

Why do you assume that? The pg_dump manual says:
 -b --blobs
   Include large objects in the dump. This is the default behavior   except when --schema, --table, or --schema-only is
specified,so the   -b switch is only useful to add large objects to selective dumps.
 


IOW, in most cases you will just get the large objects for free, and 
don't even need to use this flag to get them. No need to run twice.


cheers

andrew