Re: COPY speedup - Mailing list pgsql-hackers
From | Pierre Frédéric Caillaud |
---|---|
Subject | Re: COPY speedup |
Date | |
Msg-id | op.uylh5fq8cke6l8@soyouz Whole thread Raw |
In response to | Re: COPY speedup (Alvaro Herrera <alvherre@commandprompt.com>) |
List | pgsql-hackers |
>> But when I see a big red button, I just press it to see what happens. >> Ugly hacks are useful to know how fast the thing can go ; then the >> interesting part is to reimplement it cleanly, trying to reach the >> same performance... > > Right -- now that you've shown a 6x speedup increase, it is clear that > it makes sense to attempt a reimplementation. It also means it makes > sense to have an additional pair or two of input/output functions. Okay. Here are some numbers. The tables are the same as in the previous email, and it also contains the same results as "copy patch 4", aka "API hack" for reference. I benchmarked these : * p5 = no api changes, COPY TO optimized : - Optimizations in COPY (fast buffer, much less fwrite() calls, etc) remain. - SendFunction API reverted to original state (actually, the API changes are still there, but deactivated, fcinfo->context = NULL). => small performance gain ; of course the lower per-row overhead is more visible on "test_one_int", because that table has 1 column. => the (still huge) distance between p5 and "API hack" is split between overhead in pq_send*+stringInfo (that we will tackle below) and palloc() overhead (that was removed by the "API hack" by passing the destination buffer directly). * p6 = p5 + optimization of pq_send* - inlining strategic functions - probably benefits many other code paths => small incremental performance gain * p7 = p6 + optimization of StringInfo - inlining strategic functions - probably benefits many other code paths => small incremental performance gain (they start to add up nicely) * p8 = p7 + optimization of palloc() - actually this is extremely dumb : - int4send and int2send simply palloc() 16 bytes instead of 1024...... - the initial size of the allocset is 64K instead of 8K => still it has interesting results... The three patches above are quite simple (especially the inlines) and yet, speedup is already nice. * p9 = p8 + monstrously ugly hack copy looks at the sendfunc, notices it's int{2,4}send , and replaces it with int{2,4}fastsend which is called directly from C, bypassing the fmgr (urrrgghhhhhh) of course it only works for ints. This gives information about fmgr overhead : fmgr is pretty damn fast. * p10 no copy does everything except calling the SendFuncs, it writes dummy data instead. This gives the time used in everything except the SendFuncs : table scan, deform_tuple, file writes, etc, which is an interesting thing to know. RESULTS : COPY annonces TO '/dev/null' BINARY : Time | Speedup | Table | KRows | MTuples | Name (s) | | MB/s | /s | /s | ------|---------|--------|--------|---------|--------------------------------------------- 2.149 | 2.60 x | 151.57 | 192.40 | 7.50 | copy to patch 4 3.055 | 1.83 x | 106.64 | 135.37 | 5.28 | p8 = p7 + optimization of palloc() 3.202 | 1.74 x | 101.74 | 129.15 | 5.04 | p7 = p6 + optimization of StringInfo 3.754 | 1.49 x | 86.78 | 110.15 | 4.30 | p6 = p5 + optimization of pq_send* 4.434 | 1.26 x | 73.47 | 93.26 | 3.64 | p5 no api changes, COPY TO optimized 5.579 | --- | 58.39 | 74.12 | 2.89 | compiled from source COPY archive_data TO '/dev/null' BINARY : Time | Speedup | Table | KRows | MTuples | Name (s) | | MB/s | /s | /s | -------|---------|-------|--------|---------|--------------------------------------------- 5.372 | 3.75 x | 73.96 | 492.88| 13.80 | copy to patch 4 8.545 | 2.36 x | 46.49 | 309.83 | 8.68 | p8 = p7 + optimization of palloc() 10.229 | 1.97 x | 38.84 | 258.82 | 7.25 | p7 = p6 + optimization of StringInfo 12.869 | 1.57 x | 30.87 | 205.73 | 5.76 | p6 = p5 + optimization of pq_send* 15.559 | 1.30 x | 25.54 | 170.16 | 4.76 | p5 no api changes, COPY TO optimized 20.165 | --- | 19.70 | 131.29 | 3.68 | 8.4.0 / compiled from source COPY test_one_int TO '/dev/null' BINARY : Time | Speedup | Table | KRows | MTuples | Name (s) | | MB/s | /s | /s | ------|---------|--------|---------|---------|--------------------------------------------- 1.493 | 4.23 x | 205.25 | 6699.22 | 6.70 | p10 no copy 1.660 | 3.80 x | 184.51 | 6022.33 | 6.02 | p9 monstrously ugly hack 2.003 | 3.15 x | 152.94 | 4991.87 | 4.99 | copy to patch 4 2.803 | 2.25 x | 109.32 | 3568.03 | 3.57 | p8 = p7 + optimization of palloc() 2.976 | 2.12 x | 102.94 | 3360.05 | 3.36 | p7 = p6 + optimization of StringInfo 3.165 | 2.00 x | 96.82 | 3160.05 | 3.16 | p6 = p5 + optimization of pq_send* 3.698 | 1.71 x | 82.86 | 2704.43 | 2.70 | p5 no api changes, COPY TO optimized 6.318 | --- | 48.49 | 1582.85 | 1.58 | 8.4.0 / compiled from source COPY test_many_ints TO '/dev/null' BINARY : Time | Speedup | Table | KRows | MTuples | Name (s) | | MB/s | /s | /s | ------|---------|--------|--------|---------|--------------------------------------------- 1.007 | 8.80 x | 127.23 | 993.34 | 25.83 | p10 no copy 1.114 | 7.95 x | 114.95 | 897.52 | 23.34 | p9 monstrously ugly hack 1.706 | 5.19 x | 75.08 | 586.23 | 15.24 | copy to patch 4 3.396 | 2.61 x | 37.72 | 294.49 | 7.66 | p8 = p7 + optimization of palloc() 4.588 | 1.93 x | 27.92 | 217.98 | 5.67 | p7 = p6 + optimization of StringInfo 5.821 | 1.52 x | 22.00 | 171.80 | 4.47 | p6 = p5 + optimization of pq_send* 6.890 | 1.29 x | 18.59 | 145.14 | 3.77 | p5 no api changes, COPY TO optimized 8.861 | --- | 14.45 | 112.85 | 2.93 | 8.4.0 / compiled from source
pgsql-hackers by date: