Re: Best COPY Performance - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Best COPY Performance
Date
Msg-id b42b73150610201811m84369f7kc7d58b39aed3ad73@mail.gmail.com
Whole thread Raw
In response to Best COPY Performance  ("Worky Workerson" <worky.workerson@gmail.com>)
Responses Re: Best COPY Performance
List pgsql-performance
On 10/21/06, Worky Workerson <worky.workerson@gmail.com> wrote:
> What is the best COPY performance that you have gotten on a "normal" table?
>
> I know that this is question is almost too general, but it might help
> me out a bit, or at least give me the right things to tweak.  Perhaps
> the question can be rewritten as "Where are the major bottlenecks in a
> COPY?" or "How can I compute the max theoretical COPY performance for
> my hardware?".  The two subquestions that I have from this are:
>   -Are my ETL scripts (perl) maximizing the database COPY speeds?
>   -Can I tweak my DB further to eek out a bit more performance?
>
> I'm using perl to ETL a decent sized data set (10 million records) and
> then loading it through perl::DBI's copy.  I am currently getting
> between 10K and 15K inserts/second.  I've profiled the ETL scripts a
> bit and have performance-improved a lot of the code, but I'd like to
> determine whether it makes sense to try and further optimize my Perl
> or count it as "done" and look for improvements elsewhere.
>
> I ran trivial little insert into a table with a single integer row and
> came close to 250K inserts/second using psql's \copy, so I'm thinking
> that my code could be optimized a bit more, but wanted to check around
> to see if that was the case.
>
> I am most interested in loading two tables, one with about 21 (small)
> VARCHARs where each record is about 200 bytes, and another with 7
> INTEGERs, 3 TIMESTAMPs, and 1 BYTEA where each record is about 350
> bytes.

indexes/keys?  more memory for sorting during index creation can have
a dramatic affect on bulk insert performance.  check for pg_tmp
folders popping up during copy run.

> I have implemented most of the various bits of PG config advice that I
> have seen, both here and with a some googling, such as:
>
>  wal_buffers=128
>  checkpoint_segments=128
>  checkpoint_timeout=3000
>
> Software:  PG 8.1.3 on RHEL 4.3 x86_64
> Hardware: Quad Dual-core Opteron, Fibre Channel SAN with 256M BBC

for table light on indexes, 10-15k for copy is pretty poor.  you can
get pretty close to that with raw inserts on good hardware. I would
suggest configuirng your perl script to read from stdin and write to
stdout, and pipe it to psql using copy from stdin.  then just
benchmark your perl script redirecting output to a file.

merlin

pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: Is ODBC that slow?
Next
From: Alvaro Herrera
Date:
Subject: Re: Is ODBC that slow?