Re: COPY v. java performance comparison - Mailing list pgsql-general

From Steve Atkins
Subject Re: COPY v. java performance comparison
Date
Msg-id D02C467A-2B1E-4114-AFBB-6787D0080480@blighty.com
Whole thread Raw
In response to Re: COPY v. java performance comparison  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-general
On Apr 2, 2014, at 1:14 PM, Rob Sargent <robjsargent@gmail.com> wrote:

> On 04/02/2014 01:56 PM, Steve Atkins wrote:
>> On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsargent@gmail.com>
>>  wrote:
>>
>>>
>>> Impatience got the better of me and I killed the second COPY.  This time it had done 54% of the file in 6.75 hours,
extrapolatingto roughly 12 hours to do the whole thing. 
>>>
>> That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited?
>>
>> Have you read
>> http://www.postgresql.org/docs/current/interactive/populate.html
>>  ?
>>
>> Cheers,
>>   Steve
>>
>>
> The copy command was pretty vanilla:
> copy oldstyle from '/export/home/rob/share/testload/<file-redacted>' with delimiter ' ';
> I've been to that page, but (as I read them) none sticks out as a sure thing.  I'm not so worried about the actual
performanceas I am with the relative throughput (sixes so far). 
>
> I'm not cpu bound, but I confess I didn't look at io stats during the copy runs. I just assume it was pegged :)

If each row is, say, 100 bytes including the per-row overhead (plausible for a uuid and a couple of strings), and
you'reinserting 800 rows a second, that's 80k/second, which would be fairly pathetic. 

On my laptop (which has an SSD, sure, but it's still a laptop) I can insert 40M rows of data that has a few integers
anda few small strings in about 52 seconds. And that's just using a simple, single-threaded load using psql to run copy
fromstdin, reading from the same disk as the DB is on, with no tuning of any parameters to speed up the load. 

12 hours suggests there's something fairly badly wrong with what you're doing. I'd definitely look at the server logs,
checksystem load and double check what you're actually running. 

(Running the same thing on a tiny VM, one that shares a single RAID5 of 7200rpm drives with about 40 other VMs, takes a
shadeunder two minutes, mostly CPU bound). 

Cheers,
  Steve



pgsql-general by date:

Previous
From: Bill Moran
Date:
Subject: Re: COPY v. java performance comparison
Next
From: Rob Sargent
Date:
Subject: Re: COPY v. java performance comparison