RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel Inserts in CREATE TABLE AS
Date
Msg-id bc400813bc8a4a0194d21098c1b8276f@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to RE: Parallel Inserts in CREATE TABLE AS  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
> From: Hou, Zhijie [mailto:houzj.fnst@cn.fujitsu.com]
> Sent: Tuesday, December 15, 2020 7:30 PM
> To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
> Cc: Amit Kapila <amit.kapila16@gmail.com>; Luc Vlaming <luc@swarm64.com>;
> PostgreSQL-development <pgsql-hackers@postgresql.org>; Zhihong Yu
> <zyu@yugabyte.com>; Dilip Kumar <dilipbalaut@gmail.com>
> Subject: RE: Parallel Inserts in CREATE TABLE AS
> 
> > Thanks for the append use case.
> >
> > Here's my analysis on pushing parallel inserts down even in case the
> > top node is Append.
> >
> > For union cases which need to remove duplicate tuples, we can't push
> > the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> > is not doing duplicate removal(??), I saw that it's the HashAggregate
> > node (which is the top node that removes the duplicate tuples). And
> > also for except/except all/intersect/intersect all cases we receive
> > HashSetOp nodes on top of Append. So for both cases, our check for
> > Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
> >
> > For union all:
> > case 1: We can push the CTAS dest receiver to each Gather node  Append
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Gather
> >          ->Parallel Seq Scan
> >       ->Gather
> >          ->Parallel Seq Scan
> >
> > case 2: We can still push the CTAS dest receiver to each Gather node.
> > Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> > to Append and from there to CTAS dest receiver.
> >  Append
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Seq Scan / Join / any other non-Gather node
> >      ->Gather
> >          ->Parallel Seq Scan
> >      ->Seq Scan / Join / any other non-Gather node
> >
> > case 3:  We can push the CTAS dest receiver to Gather  Gather
> >      ->Parallel Append
> >          ->Parallel Seq Scan
> >          ->Parallel Seq Scan
> >
> > case 4: We can push the CTAS dest receiver to Gather  Gather
> >      ->Parallel Append
> >          ->Parallel Seq Scan
> >          ->Parallel Seq Scan
> >          ->Seq Scan / Join / any other non-Gather node
> >
> > Please let me know if I'm missing any other possible use case.
> >
> > Thoughts?
> 
> 
> Yes, The analysis looks right to me.
> 
> 
> > As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> > it doesn't have the code to influence the planner to consider parallel
> > tuple cost as 0. It works on the plan whatever gets generated and
> > decides to allow parallel inserts or not. And in the 0002 patch, I
> > added the code for influencing the planner to consider parallel tuple
> > cost as 0. Maybe we can have a 0003 patch for tests alone.
> >
> > Once we are okay with the above analysis and use cases, we can
> > incorporate the Append changes to respective patches.
> >
> > Hope that's okay.
> 
> A little explanation about how to push down the ctas info in append.
> 
> 1. about how to ignore tuple cost in this case.
> IMO, it create gather path under append like the following:
> query_planner
> -make_one_rel
> --set_base_rel_sizes
> ---set_rel_size
> ----set_append_rel_size (*)
> -----set_rel_size
> ------set_subquery_pathlist
> -------subquery_planner
> --------grouping_planner
> ---------apply_scanjoin_target_to_paths
> ----------generate_useful_gather_paths
> 
> set_append_rel_size seems the right place where we can check and set a flag
> to ignore tuple cost later.
> We can set the flag for two cases when there is no parent path will be
> created(such as : limit,sort,distinct...):
> i) query_level is 1
> ii) query_level > 1 and we have set the flag in the parent_root.
> 
> The case ii) is to check append under append:
> Append
>    ->Append
>        ->Gather
>    ->Other plan
> 
> 2.about how to push ctas info down.
> 
> We traversing the whole plans tree, and we only care Append and Gather type.
> Gather: It set the ctas dest info and returned true at once if the gathernode
> does not have projection.
> Append: It will recursively traversing the subplan of Appendnode and will
> reture true if one of the subplan can be parallel.
> 
> +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) {
> +    bool parallel = false;
> +
> +    if(ps == NULL)
> +        return parallel;
> +
> +    if(IsA(ps, AppendState))
> +    {
> +        AppendState *aps = (AppendState *) ps;
> +        for(int i = 0; i < aps->as_nplans; i++)
> +        {
> +            parallel |=
> PushDownCTASParallelInsertState(dest, aps->appendplans[i]);
> +        }
> +    }
> +    else if(IsA(ps, GatherState) && !ps->ps_ProjInfo)
> +    {
> +        GatherState *gstate = (GatherState *) ps;
> +        parallel = true;
> +
> +        ((DR_intorel *) dest)->is_parallel = true;
> +        gstate->dest = dest;
> +        ps->plan->plan_rows = 0;
> +    }
> +
> +    return parallel;
> +}

So sorry for my miss, my last patch has some mistakes.
Attatch the new one.

Best regards,
houzj



Attachment

pgsql-hackers by date:

Previous
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel Inserts in CREATE TABLE AS
Next
From: Bharath Rupireddy
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS