Re: COPY speedup - Mailing list pgsql-hackers
From | Pierre Frédéric Caillaud |
---|---|
Subject | Re: COPY speedup |
Date | |
Msg-id | op.uyl227c3cke6l8@soyouz Whole thread Raw |
In response to | Re: COPY speedup (Alvaro Herrera <alvherre@commandprompt.com>) |
Responses |
Re: COPY speedup
|
List | pgsql-hackers |
In the previous mails I made a mistake, writing "MTuples/s" instead of "MDatums/s", sorry about that. It is the number of rows x columns. The title was wrong, but the data was right. I've been doing some tests on COPY FROM ... BINARY. - inlines in various pg_get* etc - a faster buffer handling for copy - that's about it... In the below tables, you have "p17" (ie test patch 17, the last one) and straight postgres compared. COPY annonces_2 FROM 'annonces.bin' BINARY : Time | Speedup | Table | KRows | MDatums | Name (s) | | MB/s| /s | /s | -------|---------|--------|--------|---------|---------------------------------------------------- 8.417 | 1.40 x | 38.70| 49.13 | 1.92 | 8.4.0 / p17 11.821 | --- | 27.56 | 34.98 | 1.36 | 8.4.0 / compiled from source COPY archive_data_2 FROM 'archive_data.bin' BINARY : Time | Speedup | Table | KRows | MDatums | Name (s) | | MB/s | /s | /s | -------|---------|-------|--------|---------|---------------------------------------------------- 15.314 | 1.93 x | 25.94 | 172.88 | 4.84 | 8.4.0 / p17 COPY FROM BINARY all 29.520 | --- | 13.46 | 89.69 | 2.51 | 8.4.0 / compiled from source COPY test_one_int_2 FROM 'test_one_int.bin' BINARY : Time | Speedup | Table | KRows | MDatums | Name (s) | | MB/s | /s | /s | -------|---------|--------|---------|---------|---------------------------------------------------- 10.003 | 1.39 x | 30.63 | 999.73 | 1.00 | 8.4.0 / p17 COPY FROM BINARY all 13.879 | --- | 22.08 | 720.53 | 0.72 | 8.4.0 / compiled from source COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY : Time | Speedup | Table | KRows | MDatums | Name (s) | | MB/s | /s | /s | -------|---------|-------|--------|---------|---------------------------------------------------- 6.009 | 2.08 x | 21.31| 166.42 | 4.33 | 8.4.0 / p17 COPY FROM BINARY all 12.516 | --- | 10.23 | 79.90 | 2.08 | 8.4.0 / compiled from source I thought it might be interesting to get split timings of the various steps in COPY FROM, so I simply commented out bits of code and ran tests. The "delta" columns are differences between two lines, that is the time taken in the step mentioned on the right. reading data only = reading all the data and parsing it into chunks, doing everything until the RecvFunc is called. RecvFuncs = same, + RecvFunc is called heap_form_tuple = same + heap_form_tuple is called triggers = same + triggers are applied insert = actual tuple insertion p17 = total time (post insert triggers, constraint check, etc) Time | Delta | Row delta | Datum delta | Name (s) | (s) | (us) | (us) | -------|-------|-----------|-------------|---------------------- 1.311 | --- | --- | --- | reading data only4.516 | 3.205 | 7.750 | 0.199 | RecvFuncs 4.534 | 0.018 | 0.043 | 0.001 | heap_form_tuple 5.323 |0.789 | 1.908 | 0.049 | triggers 8.182 | 2.858 | 6.912 | 0.177 | insert 8.417 | 0.236 | 0.570 | 0.015 | p17 COPY archive_data_2 FROM 'archive_data.bin' BINARY : Time | Delta | Row delta | Datum delta | Name (s) | (s) | (us) | (us) | -------|--------|-----------|-------------|--------------------- 4.729 | --- | --- | --- | reading dataonly 8.508 | 3.778 | 1.427 | 0.051 | RecvFuncs 8.567 | 0.059 | 0.022 | 0.001 | heap_form_tuple 10.804 | 2.237 | 0.845 | 0.030 | triggers 14.475 | 3.671 | 1.386 | 0.050 | insert 15.314 | 0.839 | 0.317 | 0.011 | p17 COPY test_one_int_2 FROM 'test_one_int.bin' BINARY : Time | Delta | Row delta | Datum delta | Name (s) | (s) | (us) | (us) | -------|-------|-----------|-------------|---------------------- 1.247 | --- | --- | --- | reading data only1.745 | 0.498 | 0.050 | 0.050 | RecvFuncs 1.750 | 0.004 | 0.000 | 0.000 | heap_form_tuple 3.114 |1.364 | 0.136 | 0.136 | triggers 9.984 | 6.870 | 0.687 | 0.687 | insert 10.003 | 0.019 | 0.002 | 0.002 | p17 COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY : Time | Delta | Row delta | Datum delta | Name (s) | (s) | (us) | (us) | -------|-------|-----------|-------------|---------------------- 1.701 | --- | --- | --- | reading data only3.122 | 1.421 | 1.421 | 0.055 | RecvFuncs 3.129 | 0.008 | 0.008 | 0.000 | heap_form_tuple 3.754 |0.624 | 0.624 | 0.024 | triggers 5.639 | 1.885 | 1.885 | 0.073 | insert 6.009 | 0.370 | 0.370 | 0.014 | p17 We can see that : - reading and parsing the data is still slow (actually, everything is copied something like 3-4 times) - RecvFuncs take quite long, too - triggers use some time, although the table has no triggers ....? This is suspicious... - the actual insertion (which is really what we are interested in when loading a table) is actually very fast Ideally in COPY the insertion time in the table should take most of the time used in the operation...
pgsql-hackers by date: