Re: Parallel copy - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CAE9k0P=478yAADdJ0nYqm3USjuWR4H2VTQmJxhO18rQC5=oZ5Q@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Parallel copy
|
List | pgsql-hackers |
Some review comments (mostly) from the leader side code changes:
1) Do we need a DSM key for the FORCE_QUOTE option? I think FORCE_QUOTE option is only used with COPY TO and not COPY FROM so not sure why you have added it.
PARALLEL_COPY_KEY_FORCE_QUOTE_LIST
2) Should we be allocating the parallel copy data structure only when it is confirmed that the parallel copy is allowed?
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
Or, if you want it to be allocated before confirming if Parallel copy is allowed or not, then I think it would be good to allocate it in *cstate->copycontext* memory context so that when EndCopy is called towards the end of the COPY FROM operation, the entire context itself gets deleted thereby freeing the memory space allocated for pcdata. In fact it would be good to ensure that all the local memory allocated inside the ctstate structure gets allocated in the *cstate->copycontext* memory context.
3) Should we allow Parallel Copy when the insert method is CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
I know we have added checks in CopyFrom() to ensure that if any trigger (before row or instead of) is found on any of partition being loaded with data, then COPY FROM operation would fail, but does it mean that we are okay to perform parallel copy on partitioned table. Have we done some performance testing with the partitioned table where the data in the input file needs to be routed to the different partitions?
4) There are lot of if-checks in IsParallelCopyAllowed function that are checked in CopyFrom function as well which means in case of Parallel Copy those checks will get executed multiple times (first by the leader and from second time onwards by each worker process). Is that required?
5) Should the worker process be calling this function when the leader has already called it once in ExecBeforeStmtTrigger()?
/* Verify the named relation is a valid target for INSERT */
CheckValidResultRel(resultRelInfo, CMD_INSERT);
6) I think it would be good to re-write the comments atop ParallelCopyLeader(). From the present comments it appears as if you were trying to put the information pointwise but somehow you ended up putting in a paragraph. The comments also have some typos like *line beaks* which possibly means line breaks. This is applicable for other comments as well where you
7) Is the following checking equivalent to IsWorker()? If so, it would be good to replace it with an IsWorker like macro to increase the readability.
1) Do we need a DSM key for the FORCE_QUOTE option? I think FORCE_QUOTE option is only used with COPY TO and not COPY FROM so not sure why you have added it.
PARALLEL_COPY_KEY_FORCE_QUOTE_LIST
2) Should we be allocating the parallel copy data structure only when it is confirmed that the parallel copy is allowed?
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
Or, if you want it to be allocated before confirming if Parallel copy is allowed or not, then I think it would be good to allocate it in *cstate->copycontext* memory context so that when EndCopy is called towards the end of the COPY FROM operation, the entire context itself gets deleted thereby freeing the memory space allocated for pcdata. In fact it would be good to ensure that all the local memory allocated inside the ctstate structure gets allocated in the *cstate->copycontext* memory context.
3) Should we allow Parallel Copy when the insert method is CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
I know we have added checks in CopyFrom() to ensure that if any trigger (before row or instead of) is found on any of partition being loaded with data, then COPY FROM operation would fail, but does it mean that we are okay to perform parallel copy on partitioned table. Have we done some performance testing with the partitioned table where the data in the input file needs to be routed to the different partitions?
4) There are lot of if-checks in IsParallelCopyAllowed function that are checked in CopyFrom function as well which means in case of Parallel Copy those checks will get executed multiple times (first by the leader and from second time onwards by each worker process). Is that required?
5) Should the worker process be calling this function when the leader has already called it once in ExecBeforeStmtTrigger()?
/* Verify the named relation is a valid target for INSERT */
CheckValidResultRel(resultRelInfo, CMD_INSERT);
6) I think it would be good to re-write the comments atop ParallelCopyLeader(). From the present comments it appears as if you were trying to put the information pointwise but somehow you ended up putting in a paragraph. The comments also have some typos like *line beaks* which possibly means line breaks. This is applicable for other comments as well where you
7) Is the following checking equivalent to IsWorker()? If so, it would be good to replace it with an IsWorker like macro to increase the readability.
(IsParallelCopy() && !IsLeader())
On Fri, Jul 17, 2020 at 2:09 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Please find the updated patch with the fixes included.
>
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: