Re: generic copy options - Mailing list pgsql-hackers

From Emmanuel Cecchet
Subject Re: generic copy options
Date
Msg-id 4AB2C6AE.7090806@asterdata.com
Whole thread Raw
In response to Re: generic copy options  (Dan Colish <dan@unencrypted.org>)
List pgsql-hackers
On that particular patch, as Robert mentioned, only the parsing has changed.
In the case of \copy, the parsing is much lighter than before in psql 
(remains the same in the server). The bigger the COPY operation the less 
you will see the impact of the parsing since it is done only once for 
the entire operation.

Emmanuel

Dan Colish wrote:
> On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote:
>   
>> Greg Smith wrote:
>>     
>>> On Thu, 17 Sep 2009, Dan Colish wrote:
>>>
>>>       
>>>>     - Performance appears to be the same although I don't have a good 
>>>> way for
>>>>       testing this at the moment
>>>>         
>>> Here's what I do to generate simple COPY performance test cases:
>>>
>>> CREATE TABLE t (i integer);
>>> INSERT INTO t SELECT x FROM generate_series(1,100000) AS x;
>>> \timing
>>> COPY t TO '/some/file' WITH [options];
>>> BEGIN;
>>> TRUNCATE TABLE t;
>>> COPY t FROM '/some/file' WITH [options];
>>> COMMIT;
>>>
>>> You can adjust the size of the generated table based on whether you  
>>> want to minimize (small number) or maximize (big number) the impact of  
>>> the setup overhead relative to actual processing time.  Big numbers  
>>> make sense if there's a per-row change, small ones if it's mainly COPY  
>>> setup that's been changed if you want a small bit of data to test  
>>> against.
>>>
>>> An example with one column in it is a good test case for seeing  
>>> whether per-row impact has gone up.  You'd want something with a wider  
>>> row for other types of performance tests.
>>>
>>> The reason for the BEGIN/COMMIT there is that form utilizes an  
>>> optimization that lowers WAL volume when doing the COPY insertion,  
>>> which makes it more likely you'll be testing performance of the right  
>>> thing.
>>>
>>>
>>>       
>> I usually prefer to test with a table that is more varied than anything  
>> you can make with generate_series. When I tested my ragged copy patch  
>> the other day I copied 1,000,000 rows out of a large table with a  
>> mixture of dates, strings, numbers and nulls.
>>
>> But then, it has a (tiny) per field overhead so I wanted to make sure  
>> that was well represented in the test.
>>
>> You are certainly right about wrapping it in begin/truncate/commit (and  
>> when you do make sure that archiving is not on).
>>
>> You probably want to make sure that the file is not on the same disk as  
>> the database, to avoid disk contention. Or, better, make sure that it is  
>> in OS file system cache, or on a RAM disk.
>>
>> cheers
>>
>> andrew
>>     
>
> If someone with a more significant setup can run tests that would ideal.
> I only have my laptop which is a single disk and fairly underpowered.
>
> That said, here are my results running the script above, it looks like
> the pach improves performance. I would really interested to see results
> on a larger data set and heavier iron.
>
> --
> --Dan
>
> Without Patch:
>
>     CREATE TABLE
>     INSERT 0 100000
>     Timing is on.
>     COPY 100000
>     Time: 83.273 ms
>     BEGIN
>     Time: 0.412 ms
>     TRUNCATE TABLE
>     Time: 0.357 ms
>     COPY 100000
>     Time: 140.911 ms
>     COMMIT
>     Time: 4.909 ms
>
> With Patch:
>
>     CREATE TABLE
>     INSERT 0 100000
>     Timing is on.
>     COPY 100000
>     Time: 80.205 ms
>     BEGIN
>     Time: 0.351 ms
>     TRUNCATE TABLE
>     Time: 0.346 ms
>     COPY 100000
>     Time: 124.303 ms
>     COMMIT
>     Time: 4.130 ms
>
>
>
>   


-- 
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com



pgsql-hackers by date:

Previous
From: Dan Colish
Date:
Subject: Re: generic copy options
Next
From: Andrew Dunstan
Date:
Subject: Re: generic copy options