Re: COPY speedup - Mailing list pgsql-hackers

From Pierre Frédéric Caillaud
Subject Re: COPY speedup
Date
Msg-id op.uylh5fq8cke6l8@soyouz
Whole thread Raw
In response to Re: COPY speedup  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
>> But when I see a big red button, I just press it to see what happens.
>> Ugly hacks are useful to know how fast the thing can go ; then the
>> interesting part is to reimplement it cleanly, trying to reach the
>> same performance...
>
> Right -- now that you've shown a 6x speedup increase, it is clear that
> it makes sense to attempt a reimplementation.  It also means it makes
> sense to have an additional pair or two of input/output functions.

Okay.

Here are some numbers. The tables are the same as in the previous email,
and it also contains the same results as "copy patch 4", aka "API hack"
for reference.

I benchmarked these :
* p5 = no api changes, COPY TO optimized :
- Optimizations in COPY (fast buffer, much less fwrite() calls, etc)
remain.
- SendFunction API reverted to original state (actually, the API changes
are still there, but deactivated, fcinfo->context = NULL).

=> small performance gain ; of course the lower per-row overhead is more
visible on "test_one_int", because that table has 1 column.
=> the (still huge) distance between p5 and "API hack" is split between
overhead in pq_send*+stringInfo (that we will tackle below) and palloc()
overhead (that was removed by the "API hack" by passing the destination
buffer directly).

* p6 = p5 + optimization of pq_send*
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain

* p7 = p6 + optimization of StringInfo
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain (they start to add up nicely)

* p8 = p7 + optimization of palloc()
- actually this is extremely dumb :
- int4send and int2send simply palloc() 16 bytes instead of 1024......
- the initial size of the allocset is 64K instead of 8K

=> still it has interesting results...

The three patches above are quite simple (especially the inlines) and yet,
speedup is already nice.

* p9 = p8 + monstrously ugly hack
copy looks at the sendfunc, notices it's int{2,4}send , and replaces it
with int{2,4}fastsend which is called directly from C, bypassing the fmgr
(urrrgghhhhhh)
of course it only works for ints.
This gives information about fmgr overhead : fmgr is pretty damn fast.

* p10 no copy
does everything except calling the SendFuncs, it writes dummy data instead.
This gives the time used in everything except the SendFuncs : table scan,
deform_tuple, file writes, etc, which is an interesting thing to know.

RESULTS :

COPY annonces TO '/dev/null' BINARY  :  Time | Speedup |  Table |  KRows | MTuples | Name   (s) |         |   MB/s |
/s |      /s |
 
------|---------|--------|--------|---------|---------------------------------------------
2.149 |  2.60 x | 151.57 | 192.40 |    7.50 | copy to patch 4
3.055 |  1.83 x | 106.64 | 135.37 |    5.28 | p8 = p7 +  optimization of  
palloc()
3.202 |  1.74 x | 101.74 | 129.15 |    5.04 | p7 = p6 +  optimization of  
StringInfo
3.754 |  1.49 x |  86.78 | 110.15 |    4.30 | p6 = p5 +  optimization of  
pq_send*
4.434 |  1.26 x |  73.47 |  93.26 |    3.64 | p5 no api changes,  COPY TO  
optimized
5.579 |     --- |  58.39 |  74.12 |    2.89 | compiled from source

COPY archive_data TO '/dev/null' BINARY  :  Time | Speedup | Table |  KRows | MTuples | Name   (s) |         |  MB/s |
  /s |      /s |
 
-------|---------|-------|--------|---------|--------------------------------------------- 5.372 |  3.75 x | 73.96 |
492.88|   13.80 | copy to patch 4 8.545 |  2.36 x | 46.49 | 309.83 |    8.68 | p8 = p7 +  optimization of  
 
palloc()
10.229 |  1.97 x | 38.84 | 258.82 |    7.25 | p7 = p6 +  optimization of  
StringInfo
12.869 |  1.57 x | 30.87 | 205.73 |    5.76 | p6 = p5 +  optimization of  
pq_send*
15.559 |  1.30 x | 25.54 | 170.16 |    4.76 | p5 no api changes,  COPY TO  
optimized
20.165 |     --- | 19.70 | 131.29 |    3.68 | 8.4.0 / compiled from source

COPY test_one_int TO '/dev/null' BINARY  : Time | Speedup |  Table |   KRows | MTuples | Name  (s) |         |   MB/s |
    /s |      /s |
 
------|---------|--------|---------|---------|---------------------------------------------
1.493 |  4.23 x | 205.25 | 6699.22 |    6.70 | p10 no copy
1.660 |  3.80 x | 184.51 | 6022.33 |    6.02 | p9 monstrously ugly  hack
2.003 |  3.15 x | 152.94 | 4991.87 |    4.99 | copy to patch 4
2.803 |  2.25 x | 109.32 | 3568.03 |    3.57 | p8 = p7 +  optimization of  
palloc()
2.976 |  2.12 x | 102.94 | 3360.05 |    3.36 | p7 = p6 +  optimization of  
StringInfo
3.165 |  2.00 x |  96.82 | 3160.05 |    3.16 | p6 = p5 +  optimization of  
pq_send*
3.698 |  1.71 x |  82.86 | 2704.43 |    2.70 | p5 no api changes,  COPY TO  
optimized
6.318 |     --- |  48.49 | 1582.85 |    1.58 | 8.4.0 / compiled from source

COPY test_many_ints TO '/dev/null' BINARY  : Time | Speedup |  Table |  KRows | MTuples | Name  (s) |         |   MB/s
|    /s |      /s |
 
------|---------|--------|--------|---------|---------------------------------------------
1.007 |  8.80 x | 127.23 | 993.34 |   25.83 | p10 no copy
1.114 |  7.95 x | 114.95 | 897.52 |   23.34 | p9 monstrously ugly  hack
1.706 |  5.19 x |  75.08 | 586.23 |   15.24 | copy to patch 4
3.396 |  2.61 x |  37.72 | 294.49 |    7.66 | p8 = p7 +  optimization of  
palloc()
4.588 |  1.93 x |  27.92 | 217.98 |    5.67 | p7 = p6 +  optimization of  
StringInfo
5.821 |  1.52 x |  22.00 | 171.80 |    4.47 | p6 = p5 +  optimization of  
pq_send*
6.890 |  1.29 x |  18.59 | 145.14 |    3.77 | p5 no api changes,  COPY TO  
optimized
8.861 |     --- |  14.45 | 112.85 |    2.93 | 8.4.0 / compiled from source


pgsql-hackers by date:

Previous
From: "Massa, Harald Armin"
Date:
Subject: Re: Alpha 1 release notes
Next
From: Itagaki Takahiro
Date:
Subject: FDW-based dblink