Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACWjymmyTvvhU20Er-LPLaBjzBQxMJwr4nzO7XWmOkxhsg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
Thanks for the comments.

How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?

+typedef enum ParallelInsertTupleCostOpt
+{
+ PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PINS_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PINS_TUP_COST_IGNORED = 1 << 2

My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.

For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
    PINS_UNDEF = 0,
    PINS_CREATE_TABLE_AS,
    PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;

typedef struct ParallelInsertCTASInfo
{
    IntoClause *intoclause;
    Oid objectid;
} ParallelInsertCTASInfo;

typedef struct ParallelInsertRMVInfo
{
    Oid objectid;
} ParallelInsertRMVInfo;

ExecInitParallelPlan(PlanState *planstate, EState *estate,
                      Bitmapset *sendParams, int nworkers,
-                     int64 tuples_needed)
+                     int64 tuples_needed, ParallelInsertKind pinskind,
+                     void *pinsinfo)

Change ExecParallelInsertInCTAS to

+static void
+ExecParallelInsert(GatherState *node)
+{

Change SetCTASParallelInsertState to
+void
+SetParallelInsertState(QueryDesc *queryDesc)

Change IsParallelInsertionAllowedInCTAS to

+bool
+IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
+{

Thoughts?

If okay, I can work on these points and add a new patch into the patch
set that will have changes for parallel inserts in REFRESH
MATERIALIZED VIEW.

On Wed, Dec 30, 2020 at 3:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Some comments in 0002
>
> 1.
> +/*
> + * Information sent to the planner from CTAS to account for the cost
> + * calculations in cost_gather. We need to do this because, no tuples will be
> + * received by the Gather node if the workers insert the tuples in parallel.
> + */
> +typedef enum CTASParallelInsertOpt
> +{
> + CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
> + CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
> + /*
> + * Turn on this while planning for upper Gather path to ignore parallel
> + * tuple cost in cost_gather.
> + */
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
> + /* Turn on this after the cost is ignored. */
> + CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
> +} CTASParallelInsertOpt;
>
>
> I don't like the naming of these flags.  Especially no need to define
> CTAS_PARALLEL_INS_UNDEF, we can directl use 0
> for that purpose instead of giving some weird name.  So I suggest
> first, just get rid of CTAS_PARALLEL_INS_UNDEF.

+1. I will change it in the next version of the patch.

> 2.
> + /*
> + * Turn on a flag to ignore parallel tuple cost by the Gather path in
> + * cost_gather if the SELECT is for CTAS and we are generating an upper
> + * level Gather path.
> + */
> + bool ignore = ignore_parallel_tuple_cost(root);
> +
>   generate_useful_gather_paths(root, rel, false);
>
> + /*
> + * Reset the ignore flag, in case we turned it on but
> + * generate_useful_gather_paths returned without reaching cost_gather.
> + * If we reached cost_gather, we would have been reset it there.
> + */
> + if (ignore && (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> + }
>
> I think th way we are using these cost ignoring flag, doesn't look clean.
>
> I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
> CTAS and then ignore_parallel_tuple_cost will
> set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
> condition which is fine.  Now, internally cost
> gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
> it outside.  Why do we need to remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

Yes we don't need to remove the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
flag. I will change it in the next version.

> 3.
> + if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
> + Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
>
> Instead of adding Assert inside an IF statement, you can convert whole
> statement as an assert.  Lets not add unnecessary
> if in the release mode.

+1. I will change it in the version.

> 4.
> + if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
> + (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + ignore_tuple_cost = true;
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> + root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
> + }
> +
> + if (!ignore_tuple_cost)
> + run_cost += parallel_tuple_cost * path->path.rows;
>
> Changes this to (if, else) as shown below, because if it goes to the
> IF part then ignore_tuple_cost will always be true
> so no need to have an extra if check.
>
> if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
> (root->parse->CTASParallelInsInfo &
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> {
> ignore_tuple_cost = true;
> root->parse->CTASParallelInsInfo &=
> ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
> }
> else
> run_cost += parallel_tuple_cost * path->path.rows;

+1. I will change it in the next version.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "赵锐"
Date:
Subject: Re: adding partitioned tables to publications
Next
From: Peter Eisentraut
Date:
Subject: Re: dynamic result sets support in extended query protocol