Re: Load experimentation - Mailing list pgsql-performance

From Ben Brehmer
Subject Re: Load experimentation
Date
Msg-id 4B1D538C.1090200@gmail.com
Whole thread Raw
In response to Re: Load experimentation  (Thom Brown <thombrown@gmail.com>)
Responses Re: Load experimentation  (Craig James <craig_james@emolecules.com>)
Re: Load experimentation  (Alan Hodgson <ahodgson@simkin.ca>)
Re: Load experimentation  (Greg Smith <greg@2ndquadrant.com>)
Re: Load experimentation  (Dimitri Fontaine <dfontaine@hi-media.com>)
Re: Load experimentation  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
Thanks for the quick responses. I will respond to all questions in one email:

By "Loading data" I am implying: "psql -U postgres -d somedatabase -f sql_file.sql".  The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during the load.

OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)

PostgreSQL: I will try upgrading to latest version.

COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data was generated (Hadoop/MapReduce).

Transactions: Have started a second load process with chunks of 1000 inserts wrapped in a transaction. Its dropped the load time for 1000 inserts from 1 Hour to 7 minutes :)

Disk Setup: Using a single disk Amazon image for the destination (database). Source is coming from an EBS volume. I didn't think there were any disk options in Amazon?


Thanks!

Ben





On 07/12/2009 10:39 AM, Thom Brown wrote:
2009/12/7 Kevin Grittner <Kevin.Grittner@wicourts.gov>
Ben Brehmer <benbrehmer@gmail.com> wrote:

> -7.5 GB memory
> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
>    each)
> -64-bit platform

What OS?

> (PostgreSQL 8.1.3)

Why use such an antiquated, buggy version?  Newer versions are
faster.

-Kevin


I'd agree with trying to use the latest version you can.

How are you loading this data?  I'd make sure you haven't got any indices, primary keys, triggers or constraints on your tables before you begin the initial load, just add them after.  Also use either the COPY command for loading, or prepared transactions.  Individual insert commands will just take way too long.

Regards

Thom

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: RAID card recommendation
Next
From: Craig James
Date:
Subject: Re: Load experimentation