Thread: COPY speedup

COPY speedup

From

Pierre Frédéric Caillaud

Date:

11 August 2009, 09:48:22

Backups always take too long...
COPY TO is CPU bound...

A few days of coding later, I think I'm on to something.

First the results.
All tables are cached in RAM (not in shared_buffers though).
Timings are best of 4 tries.

- test_one_int is a table with 1 INT column and 10.000.000 rows (from
generate_series)

SELECT count(*) FROM test_one_int  :
  Time | Speedup |  Table |   KRows | MTuples | Name
   (s) |         |   MB/s |      /s |      /s |
------|---------|--------|---------|---------|-----------------------------
2.040 |     --- | 150.22 | 4903.03 |    4.90 | 8.4.0 / compiled from source

* count(*) gives a reference timing for scanning a table

COPY test_one_int TO '/dev/null' BINARY  :
  Time | Speedup |  Table |   KRows | MTuples | Name
   (s) |         |   MB/s |      /s |      /s |
------|---------|--------|---------|---------|-----------------------------
2.003 |  3.15 x | 152.94 | 4991.87 |    4.99 | 8.4.0 / copy to patch 4
6.318 |     --- |  48.49 | 1582.85 |    1.58 | 8.4.0 / compiled from source

* reduced per-row overhead
-> COPY BINARY faster than count(*) for a 1-column table
-> COPY BIANRY faster than "SELECT * WHERE x=-1" for a 1-column table
which doesn't contain the value "-1"

* reduced per-tuple overhead
-> COPY BINARY 3.15 times faster

COPY test_one_int TO '/dev/null'  :
  Time | Speedup | Table |   KRows | MTuples | Name
   (s) |         |  MB/s |      /s |      /s |
------|---------|-------|---------|---------|-----------------------------
4.879 |  1.48 x | 62.80 | 2049.78 |    2.05 | 8.4.0 / copy to patch 4
7.198 |     --- | 42.56 | 1389.25 |    1.39 | 8.4.0 / compiled from source

* reduced per-row and per-tuple overheads
-> COPY 1.48x faster

* Patched Binary mode is 3.4x faster than un-patched text mode

*******************************************************************

- test_many_ints is a table with 26 INT column and 1.000.000 rows

SELECT count(*) FROM test_many_ints  :
  Time | Speedup |  Table |   KRows | MTuples | Name
   (s) |         |   MB/s |      /s |      /s |
------|---------|--------|---------|---------|-----------------------------
0.275 |     --- | 465.88 | 3637.45 |   94.57 | 8.4.0 / copy to patch 4

COPY test_many_ints TO '/dev/null' BINARY  :
  Time | Speedup | Table |  KRows | MTuples | Name
   (s) |         |  MB/s |     /s |      /s |
------|---------|-------|--------|---------|-----------------------------
1.706 |  5.19 x | 75.08 | 586.23 |   15.24 | 8.4.0 / copy to patch 4
8.861 |     --- | 14.45 | 112.85 |    2.93 | 8.4.0 / compiled from source

COPY test_many_ints TO '/dev/null'  :
   Time | Speedup | Table |  KRows | MTuples | Name
    (s) |         |  MB/s |     /s |      /s |
-------|---------|-------|--------|---------|-----------------------------
  8.941 |  1.36 x | 14.32 | 111.84 |    2.91 | 8.4.0 / copy to patch 4
12.149 |     --- | 10.54 |  82.31 |    2.14 | 8.4.0 / compiled from source

* Patched Binary mode is 7.1x faster than un-patched text mode

*******************************************************************

- annonces is a 340MB table with a mix of ints, smallints, bools, date,
timestamp, etc, and a text field averaging 230 bytes

SELECT count(*) FROM annonces  :
  Time | Speedup |  Table |   KRows | MTuples | Name
   (s) |         |   MB/s |      /s |      /s |
------|---------|--------|---------|---------|-----------------------------
0.349 |     --- | 933.45 | 1184.91 |   46.21 | 8.4.0 / copy to patch 4

COPY annonces TO '/dev/null' BINARY  :
  Time | Speedup |  Table |  KRows | MTuples | Name
   (s) |         |   MB/s |     /s |      /s |
------|---------|--------|--------|---------|-----------------------------
2.149 |  2.60 x | 151.57 | 192.40 |    7.50 | 8.4.0 / copy to patch 4
5.579 |     --- |  58.39 |  74.12 |    2.89 | 8.4.0 / compiled from source

* Patched Binary mode is 4.7x faster than un-patched text mode

COPY annonces TO '/dev/null'  :
   Time | Speedup | Table | KRows | MTuples | Name
    (s) |         |  MB/s |    /s |      /s |
-------|---------|-------|-------|---------|-----------------------------
  9.600 |  1.06 x | 33.93 | 43.08 |    1.68 | 8.4.0 / copy to patch 4
10.147 |     --- | 32.10 | 40.75 |    1.59 | 8.4.0 / compiled from source

* Here, COPY isn't much faster : most of the time is actually spent
converting the DATE and TIMESTAMP columns to strings.
* In binary mode, such conversions are not needed.

*******************************************************************

- archive is 416MB, the same as annonces, without the text field, and many
more rows

SELECT count(*) FROM archive_data  :
  Time | Speedup |  Table |   KRows | MTuples | Name
   (s) |         |   MB/s |      /s |      /s |
------|---------|--------|---------|---------|-----------------------------
0.844 |     --- | 470.60 | 3135.89 |   87.81 | 8.4.0 / copy to patch 4

COPY archive_data TO '/dev/null' BINARY  :
   Time | Speedup | Table |  KRows | MTuples | Name
    (s) |         |  MB/s |     /s |      /s |
-------|---------|-------|--------|---------|-----------------------------
  5.372 |  3.75 x | 73.96 | 492.88 |   13.80 | 8.4.0 / copy to patch 4
20.165 |     --- | 19.70 | 131.29 |    3.68 | 8.4.0 / compiled from source

* Patched Binary mode is 6.4x faster than un-patched text mode

COPY archive_data TO '/dev/null'  :
   Time | Speedup | Table | KRows | MTuples | Name
    (s) |         |  MB/s |    /s |      /s |
-------|---------|-------|-------|---------|-----------------------------
28.471 |  1.21 x | 13.95 | 92.99 |    2.60 | 8.4.0 / copy to patch 4
34.344 |     --- | 11.57 | 77.09 |    2.16 | 8.4.0 / compiled from source

* Most of the time is again spent converting the DATE and TIMESTAMP
columns to strings.

*******************************************************************

* Why ?

COPY in text mode should be "fast enough" but will never be really fast
because many types need complicated conversions.
COPY BINARY has drawbacks (not very portable...) so, to justify its
existence, it should compensate with a massive speed increase over text
mode, which is not the case in 8.4.

* How ?

- Created a new "WBuf" auto-flushing buffer type. It looks like a
StringInfo, but :
    - it has a flush callback
    - you add data to it in little pieces
    - when it is full, it sends the buffer contents to the flush callback
    - it never makes any palloc calls except on creation

- fmgr.c
    - new way of calling SendFuncs and OutFuncs which uses the existing
"context" field
    - copy.c passes WBuf through this context
    - SendFuncs check if they are called with a context
    - if yes, write directly to the buffer
    - if no, previous behaviour remains, return a BYTEA

- copy.c
    - creates a WBuf
    - sets the flush callback to do the right thing (write file, send to
frontend, etc)
    - writes data like headers and delimiters to it
    - pass it to the SendFuncs
    - if a SendFunc returns a BYTEA (because it has not been updated to write
directly to the buffer), use the BYTEA
    - if not, do nothing, the data is already sent

- pqcomm.c
    - removed memcpy on large blocks

- others
    - removed some memcpy
    - inlines added at strategic points (total size of executable is
actually smaller with the patch)

* Results

See performance numbers above.
It does generate the same output as old COPY (ie there doesn't seem to be
any bugs)
Should be 100% backward compatible, no syscatalogs change.

* Side effects

Uses less memory for big TEXT or BYTEA fields, since less copies are made.
This could be extended to make serialisation of query results sent to the
frontend faster.
Breaks COPY CSV, I need to fix it (it's simple, I just didn't have time).
I have ideas for COPY FROM too.

* Thoughts

Not for commitfest ;) it's too late.
Patch needs refactoring.
Maybe fuse StringInfo and WBuf together, with bits of pq_send*, maybe not,
have to think about this.
Some types have extremely slow outfuncs (for instance, box).

COPY BINARY should include in its header (in the variable-length field
specified for this), a sample of all types used in the table, that are
non-portable.
For instance, put a TIMESTAMP of a known value. On reading, check this :
if it's wrong, perhaps the dump was generated by a postgres with float
timestamps ?...
This would have another advantage : with the column types stored in the
header, you'd no longer ask yourself "hmmmm..... how many alter tables did
I do since I made this dump that doesn't seem to want to load ?..."
Also, currently you can load a binary dump of INTs in a DATE column and it
will work perfectly OK (except the dates will be rubbish of course).

With this patch COPY BINARY gets fast enough to sometimes saturate a
gigabit ethernet link...

Patch is for 8.4.0, if someone wants to try it.

Attachment

pg_8.4.0_copyto_patch_4.txt

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

12 August 2009, 05:38:20

Replying to myself...

I've been examining the code path for COPY FROM too, and I think it is
possible to get the same kind of speedups on COPY FROM that the patch in
the previous message did for COPY TO, that is to say perhaps 2-3x faster
in BINARY mode and 10-20% faster in TEXT mode (these figures are
ballparks, only based on very quick checks however).

The idea is to avoid most (actually, all) palloc()'ing and memcpy()'ing
for types that are pass-by-value like INT.

Is there interest in such a patch (for 8.6) ?

Re: COPY speedup

From

Tom Lane

Date:

12 August 2009, 10:45:37

Pierre Frédéric Caillaud <lists@peufeu.com> writes:
> I've been examining the code path for COPY FROM too, and I think it is
> possible to get the same kind of speedups on COPY FROM that the patch in
> the previous message did for COPY TO, that is to say perhaps 2-3x faster
> in BINARY mode and 10-20% faster in TEXT mode (these figures are
> ballparks, only based on very quick checks however).

> The idea is to avoid most (actually, all) palloc()'ing and memcpy()'ing
> for types that are pass-by-value like INT.

> Is there interest in such a patch (for 8.6) ?

If you do as much damage to the I/O function API as the other patch
did, it will probably be rejected.  We don't touch datatype APIs
lightly, because it affects too much code.
        regards, tom lane

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

12 August 2009, 11:45:25

> If you do as much damage to the I/O function API as the other patch
> did, it will probably be rejected.
You mean, as the COPY patch in my previous message, or as another patch ?(I just search the archives and found one
aboutCopyReadLine, but that's  
 
probably not what you are talking about)

> We don't touch datatype APIs
> lightly, because it affects too much code.
>
>             regards, tom lane
I definitely agree with that.

Re: COPY speedup

From

Merlin Moncure

Date:

12 August 2009, 12:49:54

2009/8/12 Pierre Frédéric Caillaud <lists@peufeu.com>:
>
>
>> If you do as much damage to the I/O function API as the other patch
>> did, it will probably be rejected.
>
>        You mean, as the COPY patch in my previous message, or as another
> patch ?
>        (I just search the archives and found one about CopyReadLine, but
> that's probably not what you are talking about)
>
>> We don't touch datatype APIs
>> lightly, because it affects too much code.
>>
>>                        regards, tom lane
>
>        I definitely agree with that.

Is there any way to do this that is not as invasive?

merlin

Re: COPY speedup

From

Alvaro Herrera

Date:

12 August 2009, 13:01:01

Merlin Moncure escribió:
> 2009/8/12 Pierre Frédéric Caillaud <lists@peufeu.com>:
> >
> >
> >> If you do as much damage to the I/O function API as the other patch
> >> did, it will probably be rejected.
> >
> >        You mean, as the COPY patch in my previous message, or as another
> > patch ?
> >        (I just search the archives and found one about CopyReadLine, but
> > that's probably not what you are talking about)
> >
> >> We don't touch datatype APIs
> >> lightly, because it affects too much code.
> >>
> >>                        regards, tom lane
> >
> >        I definitely agree with that.
> 
> Is there any way to do this that is not as invasive?

Maybe add new methods, fastrecv/fastsend etc.  Types that don't
implement them would simply use the slow methods, maintaining
backwards compatibility.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

12 August 2009, 18:26:47

>> >> We don't touch datatype APIs
>> >> lightly, because it affects too much code.
>> >>
>> >>                        regards, tom lane
>> >
>> >        I definitely agree with that.

Actually, let me clarify:

When I modified the datatype API, I was feeling uneasy, like "I shouldn't  
really touch this".
But when I see a big red button, I just press it to see what happens.
Ugly hacks are useful to know how fast the thing can go ; then the  
interesting part is to reimplement it cleanly, trying to reach the same  
performance...

>> Is there any way to do this that is not as invasive?
>
> Maybe add new methods, fastrecv/fastsend etc.  Types that don't
> implement them would simply use the slow methods, maintaining
> backwards compatibility.

Well, this would certainly work, and it would be even faster.

I considered doing it like this, but it is a lot more work : adding  
entries to the system catalogs, creating all the new functions, deciding  
what to do with getTypeBinaryOutputInfo (since there would be 2 variants),  
etc. Types that don't support the new functions would need some form of  
indirection to call the old functions instead, etc. In a word, doable, but  
kludgy, and I would need help from a system catalog expert. Also, on  
upgrade, information about the new functions must be inserted in the  
system catalogs ? (I don't know how this process works). If you want to  
help...

The way I see COPY BINARY is that its speed should be really something  
massive.
COPY foo FROM ... BINARY should be as fast as CREATE TABLE foo AS SELECT *  FROM bar (which is extremely fast).
COPY foo TO ... BINARY should be as fast as the disk allows.

Why else would anyone use a binary format if it's slower than portable  
text ?

So, there are two other ways (besides fastsend/fastrecv) that I can see :

1- The way I implemented

I'm not saying it's the definitive solution : just a simple way to see how  
much overhead is introduced by the current API, returning BYTEAs and  
palloc()'ing every tuple of every row. I think this approach gave two  
interesting answers :

- once COPY's output buffer has been made more efficient, with things like  
removing fwrite() for every row etc (see patch), all that remains is the  
API overhead, which is very significant for binary mode, since I could get  
massive speedups (3-4x !) by bypassing it. The table scan itself is  
super-fast.

- however, for text mode, it is not so significant, as the speedups  
bypassing the API were roughly 0-20%, since most of the time is spent in  
datum to text conversions.

Now, I don't think the hack is so ugly. It does make me feel a bit uneasy,  
but :

- The context field in the fcinfo struct is there for a reason, so I used  
it.
- I checked every place in the code where SendFunctionCall() appears  
(which are quite few actually).
- The context field is never used for SendFuncs or ReceiveFuncs (it is  
always set to NULL)

2- Another way

- palloc() could be made faster for short blocks
- a generous sprinkling of inline's
- a few modifications to pq_send*
- a few modifications to StringInfo
- bits of my previous patch in copy.c (like not fwriting every row)

This would be less fast, but you'd still get a substantial speedup.

As a conclusion, I think :

- Alvaro's fastsend/fastrecv provides the cleanest solutin
- Method 2 is the easiest, but slower
- Method 1 is an intermediate, but the use of the context field is a  
touchy subject.

Also, I will work on COPY FROM ... BINARY. I should be able to make it  
also much faster. This will be useful for big imports.

Regards,
Pierre

Re: COPY speedup

From

Alvaro Herrera

Date:

12 August 2009, 19:20:59

Pierre Frédéric Caillaud escribió:

> But when I see a big red button, I just press it to see what happens.
> Ugly hacks are useful to know how fast the thing can go ; then the
> interesting part is to reimplement it cleanly, trying to reach the
> same performance...

Right -- now that you've shown a 6x speedup increase, it is clear that
it makes sense to attempt a reimplementation.  It also means it makes
sense to have an additional pair or two of input/output functions.


> >Maybe add new methods, fastrecv/fastsend etc.  Types that don't
> >implement them would simply use the slow methods, maintaining
> >backwards compatibility.

> I considered doing it like this, but it is a lot more work : adding
> entries to the system catalogs, creating all the new functions,
> deciding what to do with getTypeBinaryOutputInfo (since there would
> be 2 variants), etc. Types that don't support the new functions
> would need some form of indirection to call the old functions
> instead, etc. In a word, doable, but kludgy, and I would need help
> from a system catalog expert.

Right.

> Also, on upgrade, information about the new functions must be inserted
> in the system catalogs ? (I don't know how this process works).

No, that's not how pg_migrator works.  Catalog upgrades are handled by
pg_dump/pg_restore, so you don't need to worry about it at all.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

13 August 2009, 06:59:53

>> But when I see a big red button, I just press it to see what happens.
>> Ugly hacks are useful to know how fast the thing can go ; then the
>> interesting part is to reimplement it cleanly, trying to reach the
>> same performance...
>
> Right -- now that you've shown a 6x speedup increase, it is clear that
> it makes sense to attempt a reimplementation.  It also means it makes
> sense to have an additional pair or two of input/output functions.

Okay.

Here are some numbers. The tables are the same as in the previous email,
and it also contains the same results as "copy patch 4", aka "API hack"
for reference.

I benchmarked these :
* p5 = no api changes, COPY TO optimized :
- Optimizations in COPY (fast buffer, much less fwrite() calls, etc)
remain.
- SendFunction API reverted to original state (actually, the API changes
are still there, but deactivated, fcinfo->context = NULL).

=> small performance gain ; of course the lower per-row overhead is more
visible on "test_one_int", because that table has 1 column.
=> the (still huge) distance between p5 and "API hack" is split between
overhead in pq_send*+stringInfo (that we will tackle below) and palloc()
overhead (that was removed by the "API hack" by passing the destination
buffer directly).

* p6 = p5 + optimization of pq_send*
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain

* p7 = p6 + optimization of StringInfo
- inlining strategic functions
- probably benefits many other code paths

=> small incremental performance gain (they start to add up nicely)

* p8 = p7 + optimization of palloc()
- actually this is extremely dumb :
- int4send and int2send simply palloc() 16 bytes instead of 1024......
- the initial size of the allocset is 64K instead of 8K

=> still it has interesting results...

The three patches above are quite simple (especially the inlines) and yet,
speedup is already nice.

* p9 = p8 + monstrously ugly hack
copy looks at the sendfunc, notices it's int{2,4}send , and replaces it
with int{2,4}fastsend which is called directly from C, bypassing the fmgr
(urrrgghhhhhh)
of course it only works for ints.
This gives information about fmgr overhead : fmgr is pretty damn fast.

* p10 no copy
does everything except calling the SendFuncs, it writes dummy data instead.
This gives the time used in everything except the SendFuncs : table scan,
deform_tuple, file writes, etc, which is an interesting thing to know.

RESULTS :

COPY annonces TO '/dev/null' BINARY  :  Time | Speedup |  Table |  KRows | MTuples | Name   (s) |         |   MB/s |
/s |      /s |
 
------|---------|--------|--------|---------|---------------------------------------------
2.149 |  2.60 x | 151.57 | 192.40 |    7.50 | copy to patch 4
3.055 |  1.83 x | 106.64 | 135.37 |    5.28 | p8 = p7 +  optimization of  
palloc()
3.202 |  1.74 x | 101.74 | 129.15 |    5.04 | p7 = p6 +  optimization of  
StringInfo
3.754 |  1.49 x |  86.78 | 110.15 |    4.30 | p6 = p5 +  optimization of  
pq_send*
4.434 |  1.26 x |  73.47 |  93.26 |    3.64 | p5 no api changes,  COPY TO  
optimized
5.579 |     --- |  58.39 |  74.12 |    2.89 | compiled from source

COPY archive_data TO '/dev/null' BINARY  :  Time | Speedup | Table |  KRows | MTuples | Name   (s) |         |  MB/s |
  /s |      /s |
 
-------|---------|-------|--------|---------|--------------------------------------------- 5.372 |  3.75 x | 73.96 |
492.88|   13.80 | copy to patch 4 8.545 |  2.36 x | 46.49 | 309.83 |    8.68 | p8 = p7 +  optimization of  
 
palloc()
10.229 |  1.97 x | 38.84 | 258.82 |    7.25 | p7 = p6 +  optimization of  
StringInfo
12.869 |  1.57 x | 30.87 | 205.73 |    5.76 | p6 = p5 +  optimization of  
pq_send*
15.559 |  1.30 x | 25.54 | 170.16 |    4.76 | p5 no api changes,  COPY TO  
optimized
20.165 |     --- | 19.70 | 131.29 |    3.68 | 8.4.0 / compiled from source

COPY test_one_int TO '/dev/null' BINARY  : Time | Speedup |  Table |   KRows | MTuples | Name  (s) |         |   MB/s |
    /s |      /s |
 
------|---------|--------|---------|---------|---------------------------------------------
1.493 |  4.23 x | 205.25 | 6699.22 |    6.70 | p10 no copy
1.660 |  3.80 x | 184.51 | 6022.33 |    6.02 | p9 monstrously ugly  hack
2.003 |  3.15 x | 152.94 | 4991.87 |    4.99 | copy to patch 4
2.803 |  2.25 x | 109.32 | 3568.03 |    3.57 | p8 = p7 +  optimization of  
palloc()
2.976 |  2.12 x | 102.94 | 3360.05 |    3.36 | p7 = p6 +  optimization of  
StringInfo
3.165 |  2.00 x |  96.82 | 3160.05 |    3.16 | p6 = p5 +  optimization of  
pq_send*
3.698 |  1.71 x |  82.86 | 2704.43 |    2.70 | p5 no api changes,  COPY TO  
optimized
6.318 |     --- |  48.49 | 1582.85 |    1.58 | 8.4.0 / compiled from source

COPY test_many_ints TO '/dev/null' BINARY  : Time | Speedup |  Table |  KRows | MTuples | Name  (s) |         |   MB/s
|    /s |      /s |
 
------|---------|--------|--------|---------|---------------------------------------------
1.007 |  8.80 x | 127.23 | 993.34 |   25.83 | p10 no copy
1.114 |  7.95 x | 114.95 | 897.52 |   23.34 | p9 monstrously ugly  hack
1.706 |  5.19 x |  75.08 | 586.23 |   15.24 | copy to patch 4
3.396 |  2.61 x |  37.72 | 294.49 |    7.66 | p8 = p7 +  optimization of  
palloc()
4.588 |  1.93 x |  27.92 | 217.98 |    5.67 | p7 = p6 +  optimization of  
StringInfo
5.821 |  1.52 x |  22.00 | 171.80 |    4.47 | p6 = p5 +  optimization of  
pq_send*
6.890 |  1.29 x |  18.59 | 145.14 |    3.77 | p5 no api changes,  COPY TO  
optimized
8.861 |     --- |  14.45 | 112.85 |    2.93 | 8.4.0 / compiled from source

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

13 August 2009, 14:31:59

In the previous mails I made a mistake, writing "MTuples/s" instead of  
"MDatums/s", sorry about that. It is the number of rows x columns. The  
title was wrong, but the data was right.

I've been doing some tests on COPY FROM ... BINARY.

- inlines in various pg_get* etc
- a faster buffer handling for copy
- that's about it...

In the below tables, you have "p17" (ie test patch 17, the last one) and  
straight postgres compared.

COPY annonces_2 FROM 'annonces.bin' BINARY  :  Time | Speedup |  Table |  KRows | MDatums | Name   (s) |         |
MB/s|     /s |      /s |
 
-------|---------|--------|--------|---------|---------------------------------------------------- 8.417 |  1.40 x |
38.70|  49.13 |    1.92 | 8.4.0 / p17
 
11.821 |     --- |  27.56 |  34.98 |    1.36 | 8.4.0 / compiled from source


COPY archive_data_2 FROM 'archive_data.bin' BINARY  :  Time | Speedup | Table |  KRows | MDatums | Name   (s) |
| MB/s |     /s |      /s |
 
-------|---------|-------|--------|---------|----------------------------------------------------
15.314 |  1.93 x | 25.94 | 172.88 |    4.84 | 8.4.0 / p17 COPY FROM BINARY  
all
29.520 |     --- | 13.46 |  89.69 |    2.51 | 8.4.0 / compiled from source


COPY test_one_int_2 FROM 'test_one_int.bin' BINARY  :  Time | Speedup |  Table |   KRows | MDatums | Name   (s) |
 |   MB/s |      /s |      /s |
 
-------|---------|--------|---------|---------|----------------------------------------------------
10.003 |  1.39 x |  30.63 |  999.73 |    1.00 | 8.4.0 / p17 COPY FROM  
BINARY all
13.879 |     --- |  22.08 |  720.53 |    0.72 | 8.4.0 / compiled from  
source


COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY  :  Time | Speedup | Table |  KRows | MDatums | Name   (s) |
   |  MB/s |     /s |      /s |
 
-------|---------|-------|--------|---------|---------------------------------------------------- 6.009 |  2.08 x |
21.31| 166.42 |    4.33 | 8.4.0 / p17 COPY FROM BINARY  
 
all
12.516 |     --- | 10.23 |  79.90 |    2.08 | 8.4.0 / compiled from source


I thought it might be interesting to get split timings of the various  
steps in COPY FROM, so I simply commented out bits of code and ran tests.

The "delta" columns are differences between two lines, that is the time  
taken in the step mentioned on the right.


reading data only = reading all the data and parsing it into chunks, doing  
everything until the RecvFunc is called.
RecvFuncs = same, + RecvFunc is called
heap_form_tuple = same + heap_form_tuple is called
triggers = same + triggers are applied
insert = actual tuple insertion
p17 = total time (post insert triggers, constraint check, etc)
  Time | Delta | Row delta | Datum delta | Name   (s) |   (s) |      (us) |        (us) |
-------|-------|-----------|-------------|---------------------- 1.311 |   --- |       --- |         --- | reading data
only4.516 | 3.205 |     7.750 |       0.199 | RecvFuncs 4.534 | 0.018 |     0.043 |       0.001 | heap_form_tuple 5.323
|0.789 |     1.908 |       0.049 | triggers 8.182 | 2.858 |     6.912 |       0.177 | insert 8.417 | 0.236 |     0.570
|      0.015 | p17
 


COPY archive_data_2 FROM 'archive_data.bin' BINARY  :  Time |  Delta | Row delta | Datum delta | Name   (s) |    (s) |
   (us) |        (us) |
 
-------|--------|-----------|-------------|--------------------- 4.729 |    --- |       --- |         --- | reading
dataonly 8.508 |  3.778 |     1.427 |       0.051 | RecvFuncs 8.567 |  0.059 |     0.022 |       0.001 |
heap_form_tuple
10.804 |  2.237 |     0.845 |       0.030 | triggers
14.475 |  3.671 |     1.386 |       0.050 | insert
15.314 |  0.839 |     0.317 |       0.011 | p17


COPY test_one_int_2 FROM 'test_one_int.bin' BINARY  :  Time | Delta | Row delta | Datum delta | Name   (s) |   (s) |
 (us) |        (us) |
 
-------|-------|-----------|-------------|---------------------- 1.247 |   --- |       --- |         --- | reading data
only1.745 | 0.498 |     0.050 |       0.050 | RecvFuncs 1.750 | 0.004 |     0.000 |       0.000 | heap_form_tuple 3.114
|1.364 |     0.136 |       0.136 | triggers 9.984 | 6.870 |     0.687 |       0.687 | insert
 
10.003 | 0.019 |     0.002 |       0.002 | p17


COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY  :  Time | Delta | Row delta | Datum delta | Name   (s) |   (s)
|     (us) |        (us) |
 
-------|-------|-----------|-------------|---------------------- 1.701 |   --- |       --- |         --- | reading data
only3.122 | 1.421 |     1.421 |       0.055 | RecvFuncs 3.129 | 0.008 |     0.008 |       0.000 | heap_form_tuple 3.754
|0.624 |     0.624 |       0.024 | triggers 5.639 | 1.885 |     1.885 |       0.073 | insert 6.009 | 0.370 |     0.370
|      0.014 | p17
 

We can see that :

- reading and parsing the data is still slow (actually, everything is  
copied something like 3-4 times)
- RecvFuncs take quite long, too
- triggers use some time, although the table has no triggers ....? This is  
suspicious...
- the actual insertion (which is really what we are interested in when  
loading a table) is actually very fast

Ideally in COPY the insertion time in the table should take most of the  
time used in the operation...

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

17 August 2009, 22:27:41

I'm doing some more exploration with oprofile...

I've got the glibc-debug package installed (on kubuntu), but oprofile  
doesn't seem to know about it. I wonder what part of glibc gets 60% of the  
run time... do I have to set a magic option in the postgres config ?

samples  %        image name               app name                 symbol  
name
155312   61.7420  libc-2.7.so              libc-2.7.so               
/lib/tls/i686/cmov/libc-2.7.so
35921    14.2799  postgres                 postgres                  
CopyOneRowTo
7485      2.9756  postgres                 postgres                  
CopySendData
5626      2.2365  postgres                 postgres                  
MemoryContextAlloc
5174      2.0568  postgres                 postgres                  
FunctionCall1
5167      2.0541  no-vmlinux               no-vmlinux                
/no-vmlinux
5087      2.0223  postgres                 postgres                  
AllocSetAlloc
4340      1.7253  postgres                 postgres                 int4out
3896      1.5488  postgres                 postgres                  
heap_deform_tuple

Re: COPY speedup

From

Marko Kreen

Date:

18 August 2009, 07:45:42

On 8/18/09, Pierre Frédéric Caillaud <lists@peufeu.com> wrote:
>  I'm doing some more exploration with oprofile...
>
>  I've got the glibc-debug package installed (on kubuntu), but oprofile
> doesn't seem to know about it. I wonder what part of glibc gets 60% of the
> run time... do I have to set a magic option in the postgres config ?

AFAIK you need to run app with LD_LIBRARY_PATH=/usr/lib/debug,
otherwise the debug packages won't be used.

--
marko

Re: COPY speedup

From

Pierre Frédéric Caillaud

Date:

18 August 2009, 07:53:25

> AFAIK you need to run app with LD_LIBRARY_PATH=/usr/lib/debug,
> otherwise the debug packages won't be used.
I had stupidly put the LD_LIBRARY_PATH on make rather than on postgres,  
ahem.OK, it works, thanks.
I'm very carefully benchmarking everything everytime I make a  
modification : sometimes just a simple change creates an unexpected  
performance loss.So the process is slow, but performance patch should not make it slower ;)