Re: Improvements in Copy From - Mailing list pgsql-hackers

From vignesh C
Subject Re: Improvements in Copy From
Date
Msg-id CALDaNm1MhZFSFEuuMTHyBziJ5N5JUB337na5e9fpkVqPG29e9A@mail.gmail.com
Whole thread Raw
In response to Re: Improvements in Copy From  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Improvements in Copy From  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Wed, Sep 9, 2020 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> My basic understanding of first part of your patch is that by
> adjusting the "minread" it now allows it to loop multiple times
> internally within the CopyGetData rather than calling CopyLoadRawBuf
> for every N lines. There doesn't seem to be much change to what other
> code gets executed so the saving is essentially whatever is the cost
> of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
> that understanding correct?

Yes you are right, we will avoid the function calls and try to get as
many records as possible from the buffer & insert it to the relation.

> But with that change there seems to be opportunity for yet another
> tiny saving possible. IIUC, now you are processing a lot more data
> within the CopyGetData so it is now very likely that you will also
> gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
> set. So this means the calling code of CopyReadLineText may no longer
> need to call the CopyLoadRawBuf one last time just to discover there
> are no more bytes to read - something that it already knows if
> cstate->reached_eof == true.
>
> For example, with your change can't you also modify CopyReadLineText like below:
>
> BEFORE
>             if (!CopyLoadRawBuf(cstate))
>                 hit_eof = true;
>
> AFTER
>             if (cstate->reached_eof)
>             {
>                 cstate->raw_buf[0] = '\0';
>                 cstate->raw_buf_index = cstate->raw_buf_len = 0;
>                 hit_eof = true;
>             }
>             else if (!CopyLoadRawBuf(cstate))
>             {
>                 hit_eof = true;
>             }
>
> Whether such a micro-optimisation is worth doing is another question.
Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Yet another fast GiST build
Next
From: Thomas Munro
Date:
Subject: Re: Two fsync related performance issues?