Re: COPY FROM performance improvements - Mailing list pgsql-patches

From Tom Lane
Subject Re: COPY FROM performance improvements
Date
Msg-id 29007.1123387696@sss.pgh.pa.us
Whole thread Raw
In response to Re: COPY FROM performance improvements  ("Luke Lonergan" <llonergan@greenplum.com>)
Responses Re: COPY FROM performance improvements
Re: COPY FROM performance improvements
List pgsql-patches
"Luke Lonergan" <llonergan@greenplum.com> writes:
>> I had some difficulty in generating test cases that weren't largely
>> I/O-bound, but AFAICT the patch as applied is about the same speed
>> as what you submitted.

> You achieve the important objective of knocking the parsing stage down a
> lot, but your parsing code is actually about 20% slower than Alon's.

I would like to see the exact test case you are using to make this
claim; the tests I did suggested my code is the same speed or faster.
The particular test case I was using was the "tenk1" data from the
regression database, duplicated out to about 600K rows so as to run
long enough to measure with some degree of repeatability.

As best I can tell, my version of CopyReadAttributes is significantly
quicker than Alon's, approximately balancing out the fact that my
version of CopyReadLine is slower.  I did the latter first, and would
now be tempted to rewrite it in the same style as CopyReadAttributes,
ie one pass of memory-to-memory copy using pointers rather than buffer
indexes.

BTW, late today I figured out a way to get fairly reproducible
non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
the actual inserts, thus:

create table foo ...
create function noway() returns trigger as
'begin return null; end' language plpgsql;
create trigger noway before insert on foo
  for each row execute procedure noway();
then repeat:
copy foo from '/tmp/foo.data';

If the source file is not too large to fit in kernel disk cache, then
after the first iteration there is no I/O at all.  I got numbers
that were reproducible within less than 1%, as opposed to 5% or more
variation when the thing was partially I/O bound.  Pretty useless in the
real world, of course, but great for timing COPY's data-pushing.

            regards, tom lane

pgsql-patches by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: COPY FROM performance improvements
Next
From: "Luke Lonergan"
Date:
Subject: Re: COPY FROM performance improvements