Re: Review: Revise parallel pg_restore's scheduling heuristic - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Review: Revise parallel pg_restore's scheduling heuristic
Date
Msg-id 4A6455C602000025000289C2@gw.wicourts.gov
Whole thread Raw
In response to Re: Review: Revise parallel pg_restore's scheduling heuristic  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
>> My plan here would be to have
>> the dump on one machine, and run pg_restore there, and push it to a
>> database on another machine through the LAN on a 1Gb connection.
>> (This seems most likely to be what we'd be doing in real life.)
> you need to be careful here - in my latest round of benchmarking I
> had actually test with the workload generator on the same box
> because on fast boxes we can easily achive >100MB/s total load rate
> these days.  At these load rates you are very close or over the
> pratical limits of GigE...
Yeah, I was concerned about that.  Problem was, with the 1.1TB
database it is hard to fit a dump and an extra copy of the database
onto the drives along with the production data for which they exist. 
We would likely face the same constraints with real data if using the
parallel restore, since it requires an interim backup file (i.e., you
can't stream directly from the source database).  There's also the
issue of reading from the same RAID you're targeting with the restore,
which is sometimes not optimal.
If I'm dropping down an order of magnitude or more in the databases I
will use, I could put the backup file on a separate RAID on the same
machine.  This leaves a lot of options.  I'm not sure which
combinations of configuration, file placement, and job count yield the
most useful results.
-Kevin


pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: WIP: Deferrable unique constraints
Next
From: "Joshua D. Drake"
Date:
Subject: Re: hot standby - merged up to CVS HEAD