Re: why postgresql over other RDBMS - Mailing list pgsql-general

From Andrew Sullivan
Subject Re: why postgresql over other RDBMS
Date
Msg-id 20070525214419.GB1790@phlogiston.dyndns.org
Whole thread Raw
In response to Re: why postgresql over other RDBMS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Fri, May 25, 2007 at 05:28:43PM -0400, Tom Lane wrote:
> That's true at the level of DDL operations, but AFAIK we could
> parallelize table-loading and index-creation steps pretty effectively
> --- and that's where all the time goes.

I made a presentation at OSCON a few years ago about how we did it
that way when we imported .org.  We had limited time to work in, and
we had to do a lot of validation, so getting the data in quickly was
important.  So we split the data files up into segments and loaded
them in parallel (Chris Browne did most of the implementation of
this.)  It was pretty helpful for loading, anyway.

> A more interesting question is what sort of hardware you need for that
> actually to be a win, though.  Loading a few tables in parallel sounds
> like an ideal recipe for oversaturating your disk bandwidth...

Right, you need to be prepared for that.  But of course, if you're in
the situation where you have to get a given database up and running,
who cares about the disk bandwidth? -- you don't have the database
running yet.  The kind of system that is busy enough to have that
size of database and that urgency of recovery is also the kind that
is likely to have dedicated storage hardware for that database.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
Unfortunately reformatting the Internet is a little more painful
than reformatting your hard drive when it gets out of whack.
        --Scott Morris

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: why postgresql over other RDBMS
Next
From: "Michael Harris \(BR/EPA\)"
Date:
Subject: Re: ERROR: cache lookup failed for type 0