Insert performance slows down in large batch - Mailing list pgsql-performance

From Jeremy Haile
Subject Insert performance slows down in large batch
Date
Msg-id 1133459351.23834.248779098@webmail.messagingengine.com
Whole thread Raw
Responses Re: Insert performance slows down in large batch
List pgsql-performance
I am importing roughly 15 million rows in one batch transaction.  I am
currently doing this through batch inserts of around 500 at a time,
although I am looking at ways to do this via multiple (one-per-table)
copy commands for performance reasons.

I am currently running: PostgreSQL 8.0.4, Redhat Enterprise Linux 4,
ext3, all-on-one partition.  I am aware of methods of improving
performance by changing ext3 mounting options, splitting WAL, data, and
indexes to separate physical disks, etc.  I have also adjusted my
shared_buffers, work_mem, maintenance_work_mem, and checkpoint_segments
and can post their values if anyone thinks it is relevant to my question
(See questions at the bottom)

What confuses me is that at the beginning of the import, I am inserting
roughly 25,000 rows every 7 seconds..and by the time I get towards the
end of the import, it is taking 145 seconds for the same number of rows.
 The inserts are spread across 4 tables and I have dropped all indexes
and constraints on these tables, including foreign keys, unique keys,
and even primary keys (even though I think primary key doesn't improve
performance)  The entire bulk import is done in a single transaction.

The result is a table with 4.8 million rows, two tables with 4.8*2
million rows, and another table with several thousand rows.

So, my questions are:
1) Why does the performance degrade as the table sizes grow?  Shouldn't
the insert performance remain fairly constant if there are no indexes or
constraints?

2) Is there anything I can do to figure out where the time is being
spent?  Will postgres log any statistics or information to help me
diagnose the problem?  I have pasted a fairly representative sample of
vmstat below my e-mail in case it helps, although I'm not quite how to
interpret it in this case.

3) Any other advice, other than the things I listed above (I am aware of
using copy, ext3 tuning, multiple disks, tuning postgresql.conf
settings)?

Thanks in advance,
Jeremy Haile


#vmstat 2 20
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
 id wa
 1  0   9368   4416   2536 1778784    0    0   124    51    3     2  2
 0 96  2
 1  0   9368   4416   2536 1778784    0    0     0     0 1005    53 25
 0 75  0
 1  1   9368   3904   2544 1779320    0    0 12164     6 1103   262 24
 1 59 16
 1  0   9368   3704   2552 1779380    0    0 16256    24 1140   344 23
 1 53 23
 1  1   9368   2936   2560 1780120    0    0 16832     6 1143   359 23
 1 52 24
 1  1   9368   3328   2560 1779712    0    0 13120     0 1111   285 24
 1 58 18
 1  0   9368   4544   2560 1778556    0    0  5184     0 1046   141 25
 0 67  8
 1  1   9368   3776   2568 1779296    0    0  7296     6 1064   195 24
 0 67  9
 1  0   9368   4480   2568 1778548    0    0  4096     0 1036   133 24
 0 69  6
 1  0   9368   4480   2576 1778608    0    0  7504     0 1070   213 23
 0 67 10
 1  0   9368   3136   2576 1779900    0    0  9536     0 1084   235 23
 0 66 10
 1  1   9368   3072   2584 1779960    0    0 13632     6 1118   313 24
 1 60 16
 1  0   9368   4480   2592 1778592    0    0  8576    24 1075   204 24
 0 63 12
 1  0   9368   4480   2592 1778592    0    0     0     6 1004    52 25
 0 75  0
 1  0   9368   4544   2600 1778652    0    0     0     6 1005    55 25
 0 75  0
 1  1   9368   3840   2600 1779332    0    0 11264     4 1098   260 24
 0 63 13
 1  1   9368   3072   2592 1780156    0    0 17088    14 1145   346 24
 1 51 24
 1  1   9368   4096   2600 1779128    0    0 16768     6 1140   360 23
 1 54 21
 1  1   9368   3840   2600 1779332    0    0 16960     0 1142   343 24
 1 54 22
 1  0   9368   3436   2596 1779676    0    0 16960     0 1142   352 24
 1 53 23

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: 15,000 tables
Next
From: "Gavin M. Roy"
Date:
Subject: Re: 15,000 tables