Re: Best COPY Performance - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: Best COPY Performance
Date
Msg-id 20061025152209.GP26892@nasby.net
Whole thread Raw
In response to Re: Best COPY Performance  ("Craig A. James" <cjames@modgraph-usa.com>)
Responses Re: Best COPY Performance
List pgsql-performance
On Tue, Oct 24, 2006 at 10:36:04PM -0700, Craig A. James wrote:
> Jim C. Nasby wrote:
> >Well, given that perl is using an entire CPU, it sounds like you should
> >start looking either at ways to remove some of the overhead from perl,
> >or to split that perl into multiple processes.
>
> I use Perl for big database copies (usually with some
> processing/transformation along the way) and I've never seen 100% CPU usage
> except for brief periods, even when copying BLOBS and such.  My typical
> copy divides operations into blocks, for example doing
>
>  N = 0
>  while (more rows to go) {
>     begin transaction
>     select ... where primary_key > N order by primary_key limit 1000
>     while (fetch a row)
>        insert into ...
>     N = (highest value found in last block)
>     commit
>   }
>
> Doing it like this in Perl should keep Postgres busy, with Perl using only
> moderate resources.  If you're seeing high Perl CPU usage, I'd look first
> at the Perl code.

Wait... so you're using perl to copy data between two tables? And using
a cursor to boot? I can't think of any way that could be more
inefficient...

What's wrong with a plain old INSERT INTO ... SELECT? Or if you really
need to break it into multiple transaction blocks, at least don't
shuffle the data from the database into perl and then back into the
database; do an INSERT INTO ... SELECT with that same where clause.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

pgsql-performance by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: Best COPY Performance
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Best COPY Performance