Re: Parallel copy - Mailing list pgsql-hackers

From vignesh C
Subject Re: Parallel copy
Date
Msg-id CALDaNm3vatWC0cVzV9UXKiRdEjoBO9hGam66TTasuiqk-Cx5Ew@mail.gmail.com
Whole thread Raw
In response to Re: Parallel copy  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
Thanks Ashutosh for reviewing and providing your comments.

On Fri, Oct 23, 2020 at 5:43 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi Vignesh,
>
> Thanks for the updated patches. Here are some more comments that I can
> find after reviewing your latest patches:
>
> +/*
> + * This structure helps in storing the common data from CopyStateData that are
> + * required by the workers. This information will then be allocated and stored
> + * into the DSM for the worker to retrieve and copy it to CopyStateData.
> + */
> +typedef struct SerializedParallelCopyState
> +{
> +   /* low-level state data */
> +   CopyDest    copy_dest;      /* type of copy source/destination */
> +   int         file_encoding;  /* file or remote side's character encoding */
> +   bool        need_transcoding;   /* file encoding diff from server? */
> +   bool        encoding_embeds_ascii;  /* ASCII can be non-first byte? */
> +
> ...
> ...
> +
> +   /* Working state for COPY FROM */
> +   AttrNumber  num_defaults;
> +   Oid         relid;
> +} SerializedParallelCopyState;
>
> Can the above structure not be part of the CopyStateData structure? I
> am just asking this question because all the fields present in the
> above structure are also present in the CopyStateData structure. So,
> including it in the CopyStateData structure will reduce the code
> duplication and will also make CopyStateData a bit shorter.
>

I have removed the common members from the structure, now there are no
common members between CopyStateData & the new structure. I'm using
CopyStateData to copy to/from directly in the new patch.

> --
>
> +           pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
> +                                    relid);
>
> Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
> function when we are already passing cstate structure, using which
> both of these information can be retrieved ?
>

nworkers need not be passed as you have suggested but relid need to be
passed as we will be setting it to pcdata, modified nworkers as
suggested.

> --
>
> +/* DSM keys for parallel copy.  */
> +#define PARALLEL_COPY_KEY_SHARED_INFO              1
> +#define PARALLEL_COPY_KEY_CSTATE                   2
> +#define PARALLEL_COPY_WAL_USAGE                    3
> +#define PARALLEL_COPY_BUFFER_USAGE                 4
>
> DSM key names do not appear to be consistent. For shared info and
> cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY",
> but for WalUsage and BufferUsage structures, it is prefixed with
> "PARALLEL_COPY". I think it would be better to make them consistent.
>

Modified as suggested

> --
>
>     if (resultRelInfo->ri_TrigDesc != NULL &&
>         (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
>          resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
>     {
>         /*
>          * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
>          * triggers on the table. Such triggers might query the table we're
>          * inserting into and act differently if the tuples that have already
>          * been processed and prepared for insertion are not there.
>          */
>         insertMethod = CIM_SINGLE;
>     }
>     else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
>              resultRelInfo->ri_TrigDesc->trig_insert_new_table)
>     {
>         /*
>          * For partitioned tables we can't support multi-inserts when there
>          * are any statement level insert triggers. It might be possible to
>          * allow partitioned tables with such triggers in the future, but for
>          * now, CopyMultiInsertInfoFlush expects that any before row insert
>          * and statement level insert triggers are on the same relation.
>          */
>         insertMethod = CIM_SINGLE;
>     }
>     else if (resultRelInfo->ri_FdwRoutine != NULL ||
>              cstate->volatile_defexprs)
>     {
> ...
> ...
>
> I think, if possible, all these if-else checks in CopyFrom() can be
> moved to a single function which can probably be named as
> IdentifyCopyInsertMethod() and this function can be called in
> IsParallelCopyAllowed(). This will ensure that in case of Parallel
> Copy when the leader has performed all these checks, the worker won't
> do it again. I also feel that it will make the code look a bit
> cleaner.
>

In the recent patch posted we have changed it to simplify the check
for parallel copy, it is not an exact match. I feel this comment is
not applicable on the latest patch

> --
>
> +void
> +ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
> +{
> ...
> ...
> +   InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> +                         &walusage[ParallelWorkerNumber]);
> +
> +   MemoryContextSwitchTo(oldcontext);
> +   pfree(cstate);
> +   return;
> +}
>
> It seems like you also need to delete the memory context
> (cstate->copycontext) here.
>

Added it.

> --
>
> +void
> +ExecBeforeStmtTrigger(CopyState cstate)
> +{
> +   EState     *estate = CreateExecutorState();
> +   ResultRelInfo *resultRelInfo;
>
> This function has a lot of comments which have been copied as it is
> from the CopyFrom function, I think it would be good to remove those
> comments from here and mention that this code changes done in this
> function has been taken from the CopyFrom function. If any queries
> people may refer to the CopyFrom function. This will again avoid the
> unnecessary code in the patch.
>

Changed as suggested.

> --
>
> As Heikki rightly pointed out in his previous email, we need some high
> level description of how Parallel Copy works somewhere in
> copyparallel.c file. For reference, please see how a brief description
> about parallel vacuum has been added in the vacuumlazy.c file.
>
>  * Lazy vacuum supports parallel execution with parallel worker processes.  In
>  * a parallel vacuum, we perform both index vacuum and index cleanup with
>  * parallel worker processes.  Individual indexes are processed by one vacuum
> ...

Added it in copyparallel.c

This is addressed in v9 patch shared at [1].
[1] - https://www.postgresql.org/message-id/CALDaNm1cAONkFDN6K72DSiRpgqNGvwxQL7TjEiHZ58opnp9VoA@mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Parallel copy
Next
From: Stephen Frost
Date:
Subject: Re: automatic analyze: readahead - add "IO read time" log message