Re: > 16TB worth of data question - Mailing list pgsql-general
From | scott.marlowe |
---|---|
Subject | Re: > 16TB worth of data question |
Date | |
Msg-id | Pine.LNX.4.33.0304281552240.14549-100000@css120.ihs.com Whole thread Raw |
In response to | Re: > 16TB worth of data question (Ron Johnson <ron.l.johnson@cox.net>) |
Responses |
Re: > 16TB worth of data question
|
List | pgsql-general |
On 28 Apr 2003, Ron Johnson wrote: > On Mon, 2003-04-28 at 10:42, scott.marlowe wrote: > > On 28 Apr 2003, Jeremiah Jahn wrote: > > > > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote: > > > > Jeremiah Jahn wrote: > > > > > > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote: > [snip] > > Don't shut it down and backup at file system level, leave it up, restrict > > access via pg_hba.conf if need be, and use pg_dump. File system level > > backups are not the best way to go, although for quick recovery they can > > be added to full pg_dumps as an aid, but don't leave out the pg_dump, > > it's the way you're supposed to backup postgresql, and it can do so when > > the database is "hot and in use" and provide a consistent backup > > snapshot. > > What's the problem with doing a file-level backup of a *cold* database? There's no problem with doing it, the problem is that in order to get anything back you pretty much have to have all of it to make it work right, and any subtle problems of a partial copy might not be so obvious. Plus it sticks you to one major rev of the database. Pulling out five year old copies of the base directory can involve a fair bit of work getting an older flavor of postgresql to run on a newer os. It's not that it's wrong, it's that it should be considered carefully before being done. Note that a month or so ago someone had veritas backing up their postgresql database while it was still live and it was corrupting data. Admitted, you're gonna bring down the database first, so it's not an issue for you. > The problem with pg_dump is that it's single-threaded, and it would take > a whole lotta time to back up 16TB using 1 tape drive... But, you can run pg_dump against individual databases or tables on the same postmaster, so you could theoretically write a script around pg_dump to dump the databases or large tables to different drives. We backup our main server to our backup server that way, albeit with only one backup process at a time, since we can backup about a gig a minute, it's plenty fast for us. If we needed to parallelize it that would be pretty easy. > If a pg database is spread across multiple devices (using symlinks), > then a cold database can be backed up, at the file level, using > multiple tape drives. (Of course, all symlinks would have to be > recreated when/if the database files had to be restored.) The same thing can be done with multiple pg_dumps running against the databases / tables you select. While the database is still up. It's a toss up, but know that the pg_dump format is the "supported" method, so to speak, for backing up. Hopefully before it becomes a huge issue PITR will be done, eh?
pgsql-general by date: