Re: > 16TB worth of data question - Mailing list pgsql-general

From Ron Johnson
Subject Re: > 16TB worth of data question
Date
Msg-id 1051596099.16230.16.camel@haggis
Whole thread Raw
In response to Re: > 16TB worth of data question  ("scott.marlowe" <scott.marlowe@ihs.com>)
Responses Re: > 16TB worth of data question
List pgsql-general
On Mon, 2003-04-28 at 16:59, scott.marlowe wrote:
> On 28 Apr 2003, Ron Johnson wrote:
> [snip]
> > On Mon, 2003-04-28 at 10:42, scott.marlowe wrote:
> > > On 28 Apr 2003, Jeremiah Jahn wrote:
> > >
> > > > On Fri, 2003-04-25 at 16:46, Jan Wieck wrote:
> > > > > Jeremiah Jahn wrote:
> > > > > >
> > > > > > On Tue, 2003-04-22 at 10:31, Lincoln Yeoh wrote:
> > [snip]
> > > Don't shut it down and backup at file system level, leave it up, restrict
> > > access via pg_hba.conf if need be, and use pg_dump.  File system level
> > > backups are not the best way to go, although for quick recovery they can
> > > be added to full pg_dumps as an aid, but don't leave out the pg_dump,
> > > it's the way you're supposed to backup postgresql, and it can do so when
> > > the database is "hot and in use" and provide a consistent backup
> > > snapshot.
> >
> > What's the problem with doing a file-level backup of a *cold* database?
>
> There's no problem with doing it, the problem is that in order to get
> anything back you pretty much have to have all of it to make it work
> right, and any subtle problems of a partial copy might not be so obvious.
>
> Plus it sticks you to one major rev of the database.  Pulling out five
> year old copies of the base directory can involve a fair bit of work
> getting an older flavor of postgresql to run on a newer os.

Good point...

[snip]
> > The problem with pg_dump is that it's single-threaded, and it would take
> > a whole lotta time to back up 16TB using 1 tape drive...
>
> But, you can run pg_dump against individual databases or tables on the
> same postmaster, so you could theoretically write a script around pg_dump
> to dump the databases or large tables to different drives.  We backup our
> main server to our backup server that way, albeit with only one backup
> process at a time, since we can backup about a gig a minute, it's plenty
> fast for us.  If we needed to parallelize it that would be pretty easy.

But pg doesn't guarantee internal consistency unless you pg_dump
the database in one command "pg_dump db_name > db_yyyymmdd.dmp".

Thus, no parallelism unless there are multiple databases, but if there's
only 1 database...

--
+-----------------------------------------------------------+
| Ron Johnson, Jr.     Home: ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson |
|                                                           |
| An ad currently being run by the NEA (the US's biggest    |
| public school TEACHERS UNION) asks a teenager if he can   |
| find sodium and *chloride* in the periodic table of the   |
| elements.                                                 |
| And they wonder why people think public schools suck...   |
+-----------------------------------------------------------+


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bug(?) with cursors using aggregate functions.
Next
From: g.hintermayer@inode.at (Gerhard Hintermayer)
Date:
Subject: Backend memory leakage when inserting