Re: COPY v. java performance comparison - Mailing list pgsql-general

From Thomas Kellerer
Subject Re: COPY v. java performance comparison
Date
Msg-id lhjt4g$rr6$1@ger.gmane.org
Whole thread Raw
In response to COPY v. java performance comparison  (Rob Sargent <robjsargent@gmail.com>)
Responses Re: COPY v. java performance comparison  (Rob Sargent <robjsargent@gmail.com>)
List pgsql-general
Rob Sargent, 02.04.2014 21:37:
> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+
> hours (800+ records/sec).  Then I tried COPY and killed that after
> 11.25 hours when I realised that I had added on non-unque index on
> the name fields after the first load. By that point is was on line
> 28301887, so ~0.75 done which implies it would have take ~15hours to
> complete.
>
> Would the overhead of the index likely explain this decrease in
> throughput?
>
> Impatience got the better of me and I killed the second COPY.  This
> time it had done 54% of the file in 6.75 hours, extrapolating to
> roughly 12 hours to do the whole thing.
>
> That matches up with the java speed. Not sure if I should be elated
> with jOOQ or disappointed with COPY.
>

This is not what I see with COPY FROM STDIN

When I load 2million rows using a batch size of 1000 with plain JDBC that takes about 4 minutes

Loading the same file through Java and COPY FROM STDIN takes about 4 seconds

The table looks like this:

                Table "public.products"
      Column       |          Type          | Modifiers
-------------------+------------------------+-----------
 product_id        | integer                | not null
 ean_code          | bigint                 | not null
 product_name      | character varying(100) | not null
 manufacturer_name | character varying      | not null
 price             | numeric(10,2)          | not null
 publish_date      | date                   | not null
Indexes:
    "products_pkey" PRIMARY KEY, btree (product_id)
    "idx_publish_date" btree (publish_date, product_id)


During the load both indexes are present.

Regards
Thomas



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Any way to insert rows with ID used in another column
Next
From: Francisco Olarte
Date:
Subject: Re: Any way to insert rows with ID used in another column