Home > mailing lists

Re: Improvements in Copy From - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: Improvements in Copy From
Date	September 10, 2020 10:16:58
Msg-id	CALDaNm0HioOL3f5+YjAywWWenDg=azTL8oC=7r56c7zCce12pA@mail.gmail.com Whole thread
In response to	Re: Improvements in Copy From (Surafel Temesgen <surafel3000@gmail.com>)
Responses	Re: Improvements in Copy From
List	pgsql-hackers

Tree view

On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
>
> Hi Vignesh
>
> On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>> Hi,
>>
>> While reviewing copy from I identified few  improvements for copy from
>> that can be done :
>> a) copy from stdin copies lesser amount of data to buffer even though
>> space is available in buffer because minread was passed as 1 to
>> CopyGetData, Hence it only reads until the data read from libpq is
>> less than minread. This can be fixed by passing the actual space
>> available in buffer, this reduces the unnecessary frequent calls to
>> CopyGetData.
>
>
> why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

>
>>
>> c) Copy from reads header line and do nothing for the header line, we
>> need not clear EOL & need not convert to server encoding for the
>> header line.
>
>
> We have a patch for column matching feature [1] that may need a header line to be further processed. Even without
thatI think it is preferable to process the header line for nothing than adding those checks to the loop,
performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 10 September 2020, 10:02:06
Subject: Re: SIGQUIT handling, redux

From: Oleg Bartunov
Date: 10 September 2020, 10:21:09
Subject: Re: Yet another fast GiST build

Re: Improvements in Copy From - Mailing list pgsql-hackers

Previous

Next