Re: general questions postgresql performance config - Mailing list pgsql-general

From Craig Ringer
Subject Re: general questions postgresql performance config
Date
Msg-id 4B5E4F8A.2060008@postnewspapers.com.au
Whole thread Raw
In response to general questions postgresql performance config  (Dino Vliet <dino_vliet@yahoo.com>)
Responses Re: general questions postgresql performance config  (Andy Colson <andy@squeakycode.net>)
List pgsql-general
On 26/01/2010 12:15 AM, Dino Vliet wrote:

> 5) Other considerations?


To get optimal performance for bulk loading you'll want to do concurrent
data loading over several connections - up to as many as you have disk
spindles. Each connection will individually be slower, but the overall
throughput will be much greater.

Just how many connections you'll want to use depends on your I/O
subsystem and to a lesser extent your CPU capacity.

Inserting data via multiple connections isn't something every data
loading tool supports, so make sure to consider this carefully.


Another consideration is how you insert the data. It's vital to insert
your data in large transaction-batched chunks (or all in one
transaction) ; even with synchronized_commit = off you'll still see
rather poor performance if you do each INSERT in its own transaction.
Doing your inserts as prepared statements where each INSERT is multi-row
valued will help too.

Even better is to use COPY to load large chunks of data. libpq provides
access to the COPY interface if you feel like some C coding. The JDBC
driver (dev version only so far) now provides access to the COPY API, so
you can also bulk-load via Java very efficiently now.

If your data needs little/no transformation and is externally validated
you can look into pg_bulkload as an alternative to all this.

--
Craig Ringer

pgsql-general by date:

Previous
From: Jeff Davis
Date:
Subject: Re: postgres
Next
From: Craig Ringer
Date:
Subject: Re: Log full of: statement_timeout out of the valid range.