Re: Benchmark Data requested --- pgloader CE design ideas - Mailing list pgsql-performance

From Greg Smith
Subject Re: Benchmark Data requested --- pgloader CE design ideas
Date
Msg-id Pine.GSO.4.64.0802061041230.15780@westnet.com
Whole thread Raw
In response to Re: Benchmark Data requested --- pgloader CE design ideas  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Benchmark Data requested --- pgloader CE design ideas  (Luke Lonergan <llonergan@greenplum.com>)
Re: Benchmark Data requested --- pgloader CE design ideas  ("Jignesh K. Shah" <J.K.Shah@Sun.COM>)
Re: Benchmark Data requested --- pgloader CE design ideas  (Dimitri Fontaine <dfontaine@hi-media.com>)
List pgsql-performance
On Wed, 6 Feb 2008, Simon Riggs wrote:

> For me, it would be good to see a --parallel=n parameter that would
> allow pg_loader to distribute rows in "round-robin" manner to "n"
> different concurrent COPY statements. i.e. a non-routing version.

Let me expand on this.  In many of these giant COPY situations the
bottleneck is plain old sequential I/O to a single process.  You can
almost predict how fast the rows will load using dd.  Having a process
that pulls rows in and distributes them round-robin is good, but it won't
crack that bottleneck.  The useful approaches I've seen for other
databases all presume that the data files involved are large enough that
on big hardware, you can start multiple processes running at different
points in the file and beat anything possible with a single reader.

If I'm loading a TB file, odds are good I can split that into 4 or more
vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders
at once, and get way more than 1 disk worth of throughput reading.  You
have to play with the exact number because if you push the split too far
you introduce seek slowdown instead of improvements, but that's the basic
design I'd like to see one day.  It's not parallel loading that's useful
for the cases I'm thinking about until something like this comes around.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Benchmark Data requested
Next
From: Tom Lane
Date:
Subject: Re: Optimizer : query rewrite and execution plan ?