Re: 8.4 open item: copy performance regression? - Mailing list pgsql-hackers

From Stefan Kaltenbrunner
Subject Re: 8.4 open item: copy performance regression?
Date
Msg-id 4A3E65A4.5030101@kaltenbrunner.cc
Whole thread Raw
In response to Re: 8.4 open item: copy performance regression?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 8.4 open item: copy performance regression?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I wonder if using the small ring showed any benefit when the COPY is not 
>> WAL-logged? In that scenario block-on-WAL-flush behavior doesn't happen, 
>> so the small ring might have some L2 cache benefits.
> 
> I think the notion that we might get a cache win from a smaller ring
> is an illusion.  We're not expecting to go back and re-read from a
> previously filled page in this scenario.  In any case, all of the
> profiling results so far show that the CPU bottlenecks are elsewhere.
> Until we can squeeze an order of magnitude out of COPY's data parsing
> and/or XLogInsert, any possible cache effects will be down in the noise.

we also need to take a serious look at our locking overhead - WAL logged 
COPY is already taking a significant performance hit with just a second 
process running in parallel(into a seperate table).
I just did some testing using those 16MB buffer, the upthread mentioned 
postgresql.conf and a 20GB tmpfs.

The following copying 3M rows(each) into a seperate table of the same 
database.

processes    total time(s)    rows/s    rows/s - per core

1    17.5    171428.57    171428.57
2    20.8    288461.54    144230.77
4    25.5    470588.24    117647.06
6    31.1    578778.14    96463.02
8    41.4    579710.14    72463.77
10    63    476190.48    47619.05
12    89    404494.38    33707.87
14    116    362068.97    25862.07
16    151    317880.79    19867.55



the higher the process count the more erratic the box behaves - it will 
show a very high context switch rate (between 300000 and 400000/s) a 
large amount of idle time (>60%!).

example vmstat 5 output for the 12 process test:
 7  0      0 21654500  45436 12932516    0    0     0     3 1079 336941 
34  7 59  0  0 6  0      0 21354044  45444 13232444    0    0     0    52 1068 341836 
35  7 59  0  0 4  0      0 21053832  45452 13531472    0    0     0    23 1082 341672 
35  7 59  0  0 9  0      0 20751136  45460 13833336    0    0     0    41 1063 344117 
35  7 59  0  0 6  0      0 20443856  45468 14138116    0    0     0    14 1079 349398 
35  7 58  0  0 8  0      0 20136592  45476 14444644    0    0     0     8 1060 351569 
35  7 58  0  0
10  0      0 19836600  45484 14743320    0    0     0   144 1086 341533 
35  7 58  0  0 7  0      0 19540472  45492 15039616    0    0     0    94 1067 337731 
36  7 58  0  0 2  0      0 19258244  45500 15321156    0    0     0    15 1079 311394 
34  6 60  0  0



Stefan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: 8.4 open item: copy performance regression?
Next
From: Tom Lane
Date:
Subject: Re: 8.4 open item: copy performance regression?