Re: > 16TB worth of data question - Mailing list pgsql-general

From scott.marlowe
Subject Re: > 16TB worth of data question
Date
Msg-id Pine.LNX.4.33.0304281552240.14549-100000@css120.ihs.com
Whole thread Raw
In response to Re: > 16TB worth of data question  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: > 16TB worth of data question
List pgsql-general
On 28 Apr 2003, Ron Johnson wrote:

> On Mon, 2003-04-28 at 10:42, scott.marlowe wrote:
> > On 28 Apr 2003, Jeremiah Jahn wrote:
> >
> > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote:
> > > > Jeremiah Jahn wrote:
> > > > >
> > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote:
> [snip]
> > Don't shut it down and backup at file system level, leave it up, restrict
> > access via pg_hba.conf if need be, and use pg_dump.  File system level
> > backups are not the best way to go, although for quick recovery they can
> > be added to full pg_dumps as an aid, but don't leave out the pg_dump,
> > it's the way you're supposed to backup postgresql, and it can do so when
> > the database is "hot and in use" and provide a consistent backup
> > snapshot.
>
> What's the problem with doing a file-level backup of a *cold* database?

There's no problem with doing it, the problem is that in order to get
anything back you pretty much have to have all of it to make it work
right, and any subtle problems of a partial copy might not be so obvious.

Plus it sticks you to one major rev of the database.  Pulling out five
year old copies of the base directory can involve a fair bit of work
getting an older flavor of postgresql to run on a newer os.

It's not that it's wrong, it's that it should be considered carefully
before being done.  Note that a month or so ago someone had veritas
backing up their postgresql database while it was still live and it was
corrupting data.  Admitted, you're gonna bring down the database first,
so it's not an issue for you.

> The problem with pg_dump is that it's single-threaded, and it would take
> a whole lotta time to back up 16TB using 1 tape drive...

But, you can run pg_dump against individual databases or tables on the
same postmaster, so you could theoretically write a script around pg_dump
to dump the databases or large tables to different drives.  We backup our
main server to our backup server that way, albeit with only one backup
process at a time, since we can backup about a gig a minute, it's plenty
fast for us.  If we needed to parallelize it that would be pretty easy.

> If a pg database is spread across multiple devices (using symlinks),
> then a cold database can be backed up, at the file level, using
> multiple tape drives.  (Of course, all symlinks would have to be
> recreated when/if the database files had to be restored.)

The same thing can be done with multiple pg_dumps running against the
databases / tables you select.  While the database is still up.  It's a
toss up, but know that the pg_dump format is the "supported" method, so to
speak, for backing up.

Hopefully before it becomes a huge issue PITR will be done, eh?


pgsql-general by date:

Previous
From: Jeff Eckermann
Date:
Subject: Fwd: Re: Selecting the most recent date
Next
From: Stephan Szabo
Date:
Subject: Re: Bad timestamp external representation