Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Zhihong Yu
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALNJ-vTXr_y0cA6kYDtTC0fTnRhdgWKyq48TOmczzRzTUfJo8A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are listed.

There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.

Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.

+     * Parallel insertions are possible only if the upper node is Gather.
      */
+    if (!IsA(gstate, GatherState))
         return;

+     * Parallelize inserts only when the upper Gather node has no projections.
      */
+    if (!gstate->ps.ps_ProjInfo)
+    {
+        /* Okay to parallelize inserts, so mark it. */
+        if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+            ((DR_intorel *) dest)->is_parallel = true;
+
+        /*
+         * For parallelizing inserts, we must send some information so that the
+         * workers can build their own dest receivers. For CTAS, this info is
+         * into clause, object id (to open the created table).
+         *
+         * Since the required information is available in the dest receiver,
+         * store a reference to it in the Gather state so that it will be used
+         * in ExecInitParallelPlan to pick the information.
+         */
+        gstate->dest = dest;
+    }
+    else
+    {
+        /*
+         * Upper Gather node has projections, so parallel insertions are not
+         * allowed.
+         */

> +   if ((root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_SELECT_QUERY) &&
> +       (root->parse->parallelInsCmdTupleCostOpt &
> +        PARALLEL_INSERT_CAN_IGN_TUP_COST))
> +   {
> +       /* We are ignoring the parallel tuple cost, so mark it. */
> +       root->parse->parallelInsCmdTupleCostOpt |=
> +                                           PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice.

The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.

void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;
    List       *useful_pathkeys_list = NIL;
    Path       *cheapest_partial_path = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;

    /* Should we override the rel's rowcount estimate? */
    if (override_rows)
        rowsp = &rows;

    /* generate the regular gather (merge) paths */
    generate_gather_paths(root, rel, override_rows);

void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
    Path       *cheapest_partial_path;
    Path       *simple_gather_path;
    ListCell   *lc;
    double        rows;
    double       *rowsp = NULL;

    /* If there are no partial paths, there's nothing to do here. */
    if (rel->partial_pathlist == NIL)
        return;


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Fujii Masao
Date:
Subject: Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion