Re: Parallel copy - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel copy
Date
Msg-id CALj2ACVHo7+zTOFg814kFdm65w7QiXwdADUXdOE+dpruwM-X4g@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (vignesh C <vignesh21@gmail.com>)
Responses Re: Parallel copy
Re: Parallel copy
Re: Parallel copy
List pgsql-hackers
Hi Vignesh,

I took a look at the v8 patch set. Here are some comments:

1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo()
or PopulateCopyStateInfo()? And also EstimateCstateSize() --
EstimateCStateSize(), PopulateCstateCatalogInfo() --
PopulateCStateCatalogInfo()?

2. Instead of mentioning numbers like 1024, 64K, 10240 in the
comments, can we represent them in terms of macros?
/* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */
#define MAX_BLOCKS_COUNT 1024
/*
 * It can hold upto 10240 record information for worker to process. RINGSIZE

3. How about
"
Each worker at once will pick the WORKER_CHUNK_COUNT records from the
DSM data blocks and store them in it's local memory.
This is to make workers not contend much while getting record
information from the DSM. Read RINGSIZE comments before
 changing this value.
"
instead of
/*
 * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
 * block to process to avoid lock contention. Read RINGSIZE comments before
 * changing this value.
 */

4.  How about one line gap before and after for comments: "Leader
should operate in the following order:" and "Worker should operate in
the following order:"

5. Can we move RAW_BUF_BYTES macro definition to the beginning of the
copy.h where all the macro are defined?

6. I don't think we need the change in toast_internals.c with the
temporary hack Assert(!(IsParallelWorker() && !currentCommandIdUsed));
in GetCurrentCommandId()

7. I think
    /* Can't perform copy in parallel */
    if (parallel_workers <= 0)
        return NULL;
can be
    /* Can't perform copy in parallel */
    if (parallel_workers == 0)
        return NULL;
as parallel_workers can never be < 0 since we enter BeginParallelCopy
only if cstate->nworkers > 0 and also we are not allowed to have
negative values for max_worker_processes.

8. Do we want to pfree(cstate->pcdata) in case we failed to start any
parallel workers, we would have allocated a good
    else
        {
            /*
             * Reset nworkers to -1 here. This is useful in cases where user
             * specifies parallel workers, but, no worker is picked up, so go
             * back to non parallel mode value of nworkers.
             */
            cstate->nworkers = -1;
            *processed = CopyFrom(cstate);    /* copy from file to database */
        }

9. Instead of calling CopyStringToSharedMemory() for each string
variable, can't we just create a linked list of all the strings that
need to be copied into shm and call CopyStringToSharedMemory() only
once? We could avoid 5 function calls?

10. Similar to above comment: can we fill all the required
cstate->variables inside the function CopyNodeFromSharedMemory() and
call it only once? In each worker we could save overhead of 5 function
calls.

11. Looks like CopyStringFromSharedMemory() and
CopyNodeFromSharedMemory() do almost the same things except
stringToNode() and pfree(destptr);. Can we have a generic function
CopyFromSharedMemory() or something else and handle with flag "bool
isnode" to differentiate the two use cases?

12. Can we move below check to the end in IsParallelCopyAllowed()?
    /* Check parallel safety of the trigger functions. */
    if (cstate->rel->trigdesc != NULL &&
        !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc))
        return false;

13. CacheLineInfo(): Instead of goto empty_data_line_update; how about
having this directly inside the if block as it's being used only once?

14. GetWorkerLine(): How about avoiding goto statements and replacing
the common code with a always static inline function or a macro?

15. UpdateSharedLineInfo(): Below line is misaligned.
                lineInfo->first_block = blk_pos;
        lineInfo->start_offset = offset;

16. ParallelCopyFrom(): Do we need CHECK_FOR_INTERRUPTS(); at the
start of  for (;;)?

17. Remove extra lines after #define IsHeaderLine()
(cstate->header_line && cstate->cur_lineno == 1) in copy.h

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Next
From: Michael Banck
Date:
Subject: Re: Online verification of checksums