Re: Practical maximums (was Re: PostgreSQL theoretical - Mailing list pgsql-general

From Jeff Davis
Subject Re: Practical maximums (was Re: PostgreSQL theoretical
Date
Msg-id 1155055646.12968.109.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Practical maximums (was Re: PostgreSQL theoretical  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: Practical maximums (was Re: PostgreSQL theoretical  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-general
On Mon, 2006-08-07 at 21:10 -0500, Ron Johnson wrote:
> > What situation are you worried about here exactly?
>
> Backing up a large database in a limited amount of time.
>
> If the DB is large enough, and the time is short enough, then the
> single-reader pg_dump can not read the data fast enough (especially
> if other operations are pounding the disk system) to meet the time
> limit.
>
> Thus, the need (some times) for multiple readers.
>

Oh, I see. You're not only worried about write speed (to tape), but also
the ability of pg_dump to read from all your disk spindles at once to
get fast enough read speed.

If you use PITR, the during the backup you can copy the data directory
in any order that you want. It should be pretty easy to get it to copy
from multiple streams in that way. You can use tar to create a single
stream out of each tablespace, and then multiplex the stream to multiple
tapes so that it can write quickly enough.

It's not as easy to create multiple reading streams using pg_dump,
because each transaction can have at most one COPY going at a time. You
might be able to hack up something, but I think PITR is the way to go.

> But, you say: do PITR and weekly full backups.  Ok.  Say you do
> Saturday night backups and nightly PITR backups.  And the disk
> system barfs hard on Friday.  If the Tuesday night PITR tape has a
> parity error, you're hosed and have lost 3 days of data.
>

By modifying the scripts you use to multiplex the data, you can employ
some redundancy. I don't think it would be hard to include duplicate
blocks or parity blocks in the multiplex script.

By creating multiple reading streams, one for each tablespace, and then
writing each stream to multiple tapes (with parity blocks), you should
be able to back up at the speed your hardware allows, and restore at the
speed your hardware allows. Furthermore, with PITR you can do
incremental backups continuously and wait 'til the weekend to do a full
backup.

I'm imagining something similar to the following:
mydb=# select pg_start_backup('mybackup');

$ tar zcf - /path/to/pgdata | mux.script /dev/st0 /dev/st1 /dev/st2
$ tar zcf - /path/to/tablespace1 | mux.script /dev/st3 /dev/st4 /dev/st5
$ tar zcf - /path/to/tablespace2 | mux.script /dev/st6 /dev/st7 /dev/st8

mydb=# select pg_stop_backup(); -- in same connection as pg_start_backup

Where mux.script can write to three tapes and use parity blocks. Adjust
based on the number of tapes you actually have.

Hope this helps,
    Jeff Davis




pgsql-general by date:

Previous
From: Steve Peterson
Date:
Subject: Re: DROP TABLESPACE fails
Next
From: Jeff Davis
Date:
Subject: Re: Practical maximums (was Re: PostgreSQL theoretical