Re: Parallel copy - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel copy
Date
Msg-id CALDaNm3x-isUdoppJPAxjegLsFGKuTrs0pTSJai7K_TAPu-+nA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: Parallel copy
List pgsql-hackers
Thanks Ashutosh For your review, my comments are inline.
On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi,
>
> I just got some time to review the first patch in the list i.e.
0001-Copy-code-readjustment-to-support-parallel-copy.patch.As the patch name suggests, it is just trying to reshuffle
theexisting code for COPY command here and there. There is no extra changes added in the patch as such, but still I do
havesome review comments, please have a look: 
>
> 1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in
detail.Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM
suchas FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from
BeginCopy()to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to
giveit some better name, basically something that matches with what actually it is doing. 
>

There is no new code added in this function, some part of code from
BeginCopy was made in to a new function as this part of code will also
be required for the parallel copy workers before the workers start the
actual copy operation. This code was made into a function to avoid
duplication. Changed the function name to PopulateGlobalsForCopyFrom &
added few comments.

> 2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it
appearsas if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check
forthe target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more
sensible,I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and
variablenames :) 
>

Changed as suggested.
> 3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity
checkfor the target relation. 
>

I felt there is reasonable number of lines in the function & it is not
in performance intensive path, so I preferred function over macro.
Your thoughts?

> 4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data),
wewere not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your
patchchecks that before clearing the EOL marker. Any reason for this extra check? 
>

If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
nothing for the header line, server basically calls CopyReadLine
again, it is a kind of small optimization. Anyway server is not going
to do anything with header line, I felt no need to clear EOL marker
for header lines.
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}

cstate->cur_lineno++;

/* Actually read the line into memory here */
done = CopyReadLine(cstate);
I think no need to make a fix for this. Your thoughts?

> 5) I noticed the below spurious line removal in the patch.
>
> @@ -3839,7 +3953,6 @@ static bool
>  CopyReadLine(CopyState cstate)
>  {
>     bool        result;
> -
>

Fixed.
I have attached the patch for the same with the fixes.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead
Next
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions