Re: Parallel copy - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALj2ACVHo7+zTOFg814kFdm65w7QiXwdADUXdOE+dpruwM-X4g@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Parallel copy
Re: Parallel copy Re: Parallel copy |
List | pgsql-hackers |
Hi Vignesh, I took a look at the v8 patch set. Here are some comments: 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo() or PopulateCopyStateInfo()? And also EstimateCstateSize() -- EstimateCStateSize(), PopulateCstateCatalogInfo() -- PopulateCStateCatalogInfo()? 2. Instead of mentioning numbers like 1024, 64K, 10240 in the comments, can we represent them in terms of macros? /* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */ #define MAX_BLOCKS_COUNT 1024 /* * It can hold upto 10240 record information for worker to process. RINGSIZE 3. How about " Each worker at once will pick the WORKER_CHUNK_COUNT records from the DSM data blocks and store them in it's local memory. This is to make workers not contend much while getting record information from the DSM. Read RINGSIZE comments before changing this value. " instead of /* * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data * block to process to avoid lock contention. Read RINGSIZE comments before * changing this value. */ 4. How about one line gap before and after for comments: "Leader should operate in the following order:" and "Worker should operate in the following order:" 5. Can we move RAW_BUF_BYTES macro definition to the beginning of the copy.h where all the macro are defined? 6. I don't think we need the change in toast_internals.c with the temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed)); in GetCurrentCommandId() 7. I think /* Can't perform copy in parallel */ if (parallel_workers <= 0) return NULL; can be /* Can't perform copy in parallel */ if (parallel_workers == 0) return NULL; as parallel_workers can never be < 0 since we enter BeginParallelCopy only if cstate->nworkers > 0 and also we are not allowed to have negative values for max_worker_processes. 8. Do we want to pfree(cstate->pcdata) in case we failed to start any parallel workers, we would have allocated a good else { /* * Reset nworkers to -1 here. This is useful in cases where user * specifies parallel workers, but, no worker is picked up, so go * back to non parallel mode value of nworkers. */ cstate->nworkers = -1; *processed = CopyFrom(cstate); /* copy from file to database */ } 9. Instead of calling CopyStringToSharedMemory() for each string variable, can't we just create a linked list of all the strings that need to be copied into shm and call CopyStringToSharedMemory() only once? We could avoid 5 function calls? 10. Similar to above comment: can we fill all the required cstate->variables inside the function CopyNodeFromSharedMemory() and call it only once? In each worker we could save overhead of 5 function calls. 11. Looks like CopyStringFromSharedMemory() and CopyNodeFromSharedMemory() do almost the same things except stringToNode() and pfree(destptr);. Can we have a generic function CopyFromSharedMemory() or something else and handle with flag "bool isnode" to differentiate the two use cases? 12. Can we move below check to the end in IsParallelCopyAllowed()? /* Check parallel safety of the trigger functions. */ if (cstate->rel->trigdesc != NULL && !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc)) return false; 13. CacheLineInfo(): Instead of goto empty_data_line_update; how about having this directly inside the if block as it's being used only once? 14. GetWorkerLine(): How about avoiding goto statements and replacing the common code with a always static inline function or a macro? 15. UpdateSharedLineInfo(): Below line is misaligned. lineInfo->first_block = blk_pos; lineInfo->start_offset = offset; 16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the start of for (;;)? 17. Remove extra lines after #define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1) in copy.h With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: