Re: Improvements in Copy From - Mailing list pgsql-hackers

From vignesh C
Subject Re: Improvements in Copy From
Date
Msg-id CALDaNm0HioOL3f5+YjAywWWenDg=azTL8oC=7r56c7zCce12pA@mail.gmail.com
Whole thread Raw
In response to Re: Improvements in Copy From  (Surafel Temesgen <surafel3000@gmail.com>)
Responses Re: Improvements in Copy From  (Surafel Temesgen <surafel3000@gmail.com>)
List pgsql-hackers
On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
>
> Hi Vignesh
>
> On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>> Hi,
>>
>> While reviewing copy from I identified few  improvements for copy from
>> that can be done :
>> a) copy from stdin copies lesser amount of data to buffer even though
>> space is available in buffer because minread was passed as 1 to
>> CopyGetData, Hence it only reads until the data read from libpq is
>> less than minread. This can be fixed by passing the actual space
>> available in buffer, this reduces the unnecessary frequent calls to
>> CopyGetData.
>
>
> why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

>
>>
>> c) Copy from reads header line and do nothing for the header line, we
>> need not clear EOL & need not convert to server encoding for the
>> header line.
>
>
> We have a patch for column matching feature [1] that may need a header line to be further processed. Even without
thatI think it is preferable to process the header line for nothing than adding those checks to the loop,
performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: SIGQUIT handling, redux
Next
From: Oleg Bartunov
Date:
Subject: Re: Yet another fast GiST build