Re: Copy performance issues - Mailing list pgsql-performance

From Sergey Konoplev
Subject Re: Copy performance issues
Date
Msg-id AANLkTi=4og2H8RAubEQDndxn8bMbPe0jPgaD93NiK5ps@mail.gmail.com
Whole thread Raw
In response to Copy performance issues  (Saadat Anwar <sanwar@asu.edu>)
List pgsql-performance
Hi,

Try to split your data in small batches. It helped me in a similar
situation recently. I was loading about a million rows into the table
highly populated with indexes and different triggers and the batch
size was 100 (using COPY). The only thing I did with DDL is droped FKs
and recreated them after.

BTW question to gurus - why and in what cases small batch loading
could theoretically be faster then huge one if there is no another
load on the database but this?


On 18 August 2010 20:25, Saadat Anwar <sanwar@asu.edu> wrote:
> I am having severe COPY performance issues after adding indices. What used
> to take a few minutes (without indices) now takes several hours (with
> indices). I've tried to tweak the database configuration (based on Postgres
> documentation and forums), but it hasn't helped as yet. Perhaps, I haven't
> increased the limits sufficiently. Dropping and recreating indices may not
> be an option due to a long time it takes to rebuild all indices.
>
> I'll appreciate someone looking at my configuration and giving me a few
> ideas on how to increase the copy performance.
>
> Thanks.
> Saadat.
>
> Table structure:
> ===========
> table C:
>            Table "public.C"
>   Column  |       Type       | Modifiers
> ----------+------------------+-----------
>  sclk     | double precision | not null
>  chan     | smallint         | not null
>  det      | smallint         | not null
>  x        | real             | not null
>  y        | real             | not null
>  z        | real             | not null
>  r        | real             |
>  t        | real             |
>  lat      | real             |
>  lon      | real             |
>  a        | real             |
>  b        | real             |
>  c        | real             |
>  time     | real             |
>  qa       | smallint         | not null
>  qb       | smallint         | not null
>  qc       | smallint         | not null
> Indexes:
>     "C_pkey" PRIMARY KEY, btree (sclk, chan, det)
>
>
> partitioned into 19 sub-tables covering lat bands. For example:
>
> sub-table C0:
>    Inherits: C
>    Check constraints:
>        "C0_lat_check" CHECK (lat >= (-10::real) AND lat < 0::real)
>    Indexes:
>        "C0_pkey" PRIMARY KEY, btree (sclk, chan, det)
>        "C0_lat" btree (lat)
>        "C0_time" btree (time)
>        "C0_lon" btree (lon)
>
> sub-table C1:
>    Inherits: C
>    Check constraints:
>        "C1_lat_check" CHECK (lat >= (-20::real) AND lat < -10::real)
>    Indexes:
>        "C1_pkey" PRIMARY KEY, btree (sclk, chan, det)
>        "C1_lat" btree (lat)
>        "C1_time" btree (time)
>        "C1_lon" btree (lon)
>
> The partitions C?s are ~30G (328,000,000 rows) each except one, which is
> ~65G (909,000,000 rows). There are no rows in umbrella table C from which
> C1, C2, ..., C19 inherit. The data is partitioned in C1, C2, ..., C19 in
> order to promote better access. Most people will access the data in C by
> specifying a lat range. Also, C?s can become quite large over time.
>
> The COPY operation copies one file per partition, for each of the 19
> partitions. Each file is between 300,000 - 600,000 records.
>
>
> System configuration:
> ================
> 1. RHEL5 x86_64
> 2. 32G RAM
> 3. 8T RAID5 partition for database on a Dell PERC 5/E controller
>    (I understand that I'll never get fast inserts/updates on it based on
>     http://wiki.postgresql.org/wiki/SlowQueryQuestions but cannot change
>     to a RAID0+1 for now).
>     Database's filesystem is ext4 on LVM on RAID5.
> 4. Postgres 8.4.2
>     shared_buffers = 10GB
>     temp_buffers = 16MB
>     work_mem = 2GB
>     maintenance_work_mem = 256MB
>     max_files_per_process = 1000
>     effective_io_concurrency = 3
>     wal_buffers = 8MB
>     checkpoint_segments = 40
>     enable_seqscan = off
>     effective_cache_size = 16GB
> 5. analyze verbose; ran on the database before copy operation
>
> Bonnie++ output:
> =============
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> dbtest    64240M 78829  99 266172  42 47904   6 58410  72 116247   9 767.9
> 1
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                 256 16229  98 371704  99 20258  36 16115  97 445680  99
> 17966  36
>
dbtest,64240M,78829,99,266172,42,47904,6,58410,72,116247,9,767.9,1,256,16229,98,371704,99,20258,36,16115,97,445680,99,17966,36
>
>
>



--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com /
Linkedin: http://ru.linkedin.com/in/grayhemp /
JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: Performance on new 64bit server compared to my 32bit desktop
Next
From: Dimitri
Date:
Subject: Re: Fwd: Vacuum Full + Cluster + Vacuum full = non removable dead rows