Re: Parallel copy - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALDaNm3x-isUdoppJPAxjegLsFGKuTrs0pTSJai7K_TAPu-+nA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: Parallel copy
|
List | pgsql-hackers |
Thanks Ashutosh For your review, my comments are inline. On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hi, > > I just got some time to review the first patch in the list i.e. 0001-Copy-code-readjustment-to-support-parallel-copy.patch.As the patch name suggests, it is just trying to reshuffle theexisting code for COPY command here and there. There is no extra changes added in the patch as such, but still I do havesome review comments, please have a look: > > 1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in detail.Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM suchas FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from BeginCopy()to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to giveit some better name, basically something that matches with what actually it is doing. > There is no new code added in this function, some part of code from BeginCopy was made in to a new function as this part of code will also be required for the parallel copy workers before the workers start the actual copy operation. This code was made into a function to avoid duplication. Changed the function name to PopulateGlobalsForCopyFrom & added few comments. > 2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it appearsas if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check forthe target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more sensible,I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and variablenames :) > Changed as suggested. > 3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity checkfor the target relation. > I felt there is reasonable number of lines in the function & it is not in performance intensive path, so I preferred function over macro. Your thoughts? > 4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data), wewere not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your patchchecks that before clearing the EOL marker. Any reason for this extra check? > If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does nothing for the header line, server basically calls CopyReadLine again, it is a kind of small optimization. Anyway server is not going to do anything with header line, I felt no need to clear EOL marker for header lines. /* on input just throw the header line away */ if (cstate->cur_lineno == 0 && cstate->header_line) { cstate->cur_lineno++; if (CopyReadLine(cstate)) return false; /* done */ } cstate->cur_lineno++; /* Actually read the line into memory here */ done = CopyReadLine(cstate); I think no need to make a fix for this. Your thoughts? > 5) I noticed the below spurious line removal in the patch. > > @@ -3839,7 +3953,6 @@ static bool > CopyReadLine(CopyState cstate) > { > bool result; > - > Fixed. I have attached the patch for the same with the fixes. Thoughts? Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: