Re: Parallel copy - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Parallel copy |
Date | |
Msg-id | CALDaNm3vatWC0cVzV9UXKiRdEjoBO9hGam66TTasuiqk-Cx5Ew@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel copy (Ashutosh Sharma <ashu.coek88@gmail.com>) |
List | pgsql-hackers |
Thanks Ashutosh for reviewing and providing your comments. On Fri, Oct 23, 2020 at 5:43 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Hi Vignesh, > > Thanks for the updated patches. Here are some more comments that I can > find after reviewing your latest patches: > > +/* > + * This structure helps in storing the common data from CopyStateData that are > + * required by the workers. This information will then be allocated and stored > + * into the DSM for the worker to retrieve and copy it to CopyStateData. > + */ > +typedef struct SerializedParallelCopyState > +{ > + /* low-level state data */ > + CopyDest copy_dest; /* type of copy source/destination */ > + int file_encoding; /* file or remote side's character encoding */ > + bool need_transcoding; /* file encoding diff from server? */ > + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ > + > ... > ... > + > + /* Working state for COPY FROM */ > + AttrNumber num_defaults; > + Oid relid; > +} SerializedParallelCopyState; > > Can the above structure not be part of the CopyStateData structure? I > am just asking this question because all the fields present in the > above structure are also present in the CopyStateData structure. So, > including it in the CopyStateData structure will reduce the code > duplication and will also make CopyStateData a bit shorter. > I have removed the common members from the structure, now there are no common members between CopyStateData & the new structure. I'm using CopyStateData to copy to/from directly in the new patch. > -- > > + pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist, > + relid); > > Do we need to pass cstate->nworkers and relid to BeginParallelCopy() > function when we are already passing cstate structure, using which > both of these information can be retrieved ? > nworkers need not be passed as you have suggested but relid need to be passed as we will be setting it to pcdata, modified nworkers as suggested. > -- > > +/* DSM keys for parallel copy. */ > +#define PARALLEL_COPY_KEY_SHARED_INFO 1 > +#define PARALLEL_COPY_KEY_CSTATE 2 > +#define PARALLEL_COPY_WAL_USAGE 3 > +#define PARALLEL_COPY_BUFFER_USAGE 4 > > DSM key names do not appear to be consistent. For shared info and > cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY", > but for WalUsage and BufferUsage structures, it is prefixed with > "PARALLEL_COPY". I think it would be better to make them consistent. > Modified as suggested > -- > > if (resultRelInfo->ri_TrigDesc != NULL && > (resultRelInfo->ri_TrigDesc->trig_insert_before_row || > resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) > { > /* > * Can't support multi-inserts when there are any BEFORE/INSTEAD OF > * triggers on the table. Such triggers might query the table we're > * inserting into and act differently if the tuples that have already > * been processed and prepared for insertion are not there. > */ > insertMethod = CIM_SINGLE; > } > else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL && > resultRelInfo->ri_TrigDesc->trig_insert_new_table) > { > /* > * For partitioned tables we can't support multi-inserts when there > * are any statement level insert triggers. It might be possible to > * allow partitioned tables with such triggers in the future, but for > * now, CopyMultiInsertInfoFlush expects that any before row insert > * and statement level insert triggers are on the same relation. > */ > insertMethod = CIM_SINGLE; > } > else if (resultRelInfo->ri_FdwRoutine != NULL || > cstate->volatile_defexprs) > { > ... > ... > > I think, if possible, all these if-else checks in CopyFrom() can be > moved to a single function which can probably be named as > IdentifyCopyInsertMethod() and this function can be called in > IsParallelCopyAllowed(). This will ensure that in case of Parallel > Copy when the leader has performed all these checks, the worker won't > do it again. I also feel that it will make the code look a bit > cleaner. > In the recent patch posted we have changed it to simplify the check for parallel copy, it is not an exact match. I feel this comment is not applicable on the latest patch > -- > > +void > +ParallelCopyMain(dsm_segment *seg, shm_toc *toc) > +{ > ... > ... > + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], > + &walusage[ParallelWorkerNumber]); > + > + MemoryContextSwitchTo(oldcontext); > + pfree(cstate); > + return; > +} > > It seems like you also need to delete the memory context > (cstate->copycontext) here. > Added it. > -- > > +void > +ExecBeforeStmtTrigger(CopyState cstate) > +{ > + EState *estate = CreateExecutorState(); > + ResultRelInfo *resultRelInfo; > > This function has a lot of comments which have been copied as it is > from the CopyFrom function, I think it would be good to remove those > comments from here and mention that this code changes done in this > function has been taken from the CopyFrom function. If any queries > people may refer to the CopyFrom function. This will again avoid the > unnecessary code in the patch. > Changed as suggested. > -- > > As Heikki rightly pointed out in his previous email, we need some high > level description of how Parallel Copy works somewhere in > copyparallel.c file. For reference, please see how a brief description > about parallel vacuum has been added in the vacuumlazy.c file. > > * Lazy vacuum supports parallel execution with parallel worker processes. In > * a parallel vacuum, we perform both index vacuum and index cleanup with > * parallel worker processes. Individual indexes are processed by one vacuum > ... Added it in copyparallel.c This is addressed in v9 patch shared at [1]. [1] - https://www.postgresql.org/message-id/CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: