Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAJcOf-eOEfVtNt5VmUd3f8CVSMB8NJVBJA-jYX8N+Eekmc=ohw@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
List pgsql-hackers
On Fri, Jan 22, 2021 at 1:16 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
>
> Hi
>
> I took a look at v12-0001 patch, here are some comments:
>
> 1.
> +       /*
> +        * Setup the context used in finding the max parallel-mode hazard.
> +        */
> +       Assert(initial_max_parallel_hazard == 0 ||
> +                  initial_max_parallel_hazard == PROPARALLEL_SAFE ||
> +                  initial_max_parallel_hazard == PROPARALLEL_RESTRICTED);
> +       context.max_hazard = initial_max_parallel_hazard == 0 ?
> +               PROPARALLEL_SAFE : initial_max_parallel_hazard;
>
> I am not quiet sure when will " max_parallel_hazard == 0"
> Does it means the case max_parallel_hazard_context not initialized ?
>

That function doesn't accept a "max_parallel_hazard_context". It
accepts an initial "max_parallel_hazard" value (char).
The "0" value is just a way of specifying "use the default"
(PROPARALLEL_SAFE). It is not currently used, since currently we just
always pass the "context.max_parallel_hazard" value resulting from the
previous parallel-safety checks for SELECT (and otherwise don't call
that function anywhere else).

>
> 2.
> Some tiny code style suggestions
>
> +               if (con->contype == CONSTRAINT_CHECK)
> +               {
> +                       char       *conbin;
> +                       Datum           val;
> +                       bool            isnull;
> +                       Expr       *check_expr;
>
> How about :
>
> if (con->contype != CONSTRAINT_CHECK)
>         continue;
>
> 3.
> +                               if (keycol == 0)
> +                               {
> +                                       /* Found an index expression */
> +
> +                                       Node       *index_expr;
>
> Like 2, how about:
>
> If (keycol != 0)
>         Continue;
>

This is really a programmer style preference (plenty of discussions on
the internet about it), but it can be argued that use of "continue"
here is not quite as clear as the explicit "if" condition, especially
in this very simple one-condition case.
I'm inclined to leave it as is.

>
> 4.
> +                       ListCell   *index_expr_item = list_head(index_info->ii_Expressions);
> ...
> +                                       index_expr = (Node *) lfirst(index_expr_item);
> +                                       index_expr = (Node *) expression_planner((Expr *) index_expr);
>
> It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions,
> Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions
>
> So, IMO, we do not need to call expression_planner for the expr again.
>
>
> And there seems another solution for this:
>
> In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo,
> which seems can get from "Relation-> rd_index".
>
> Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo.
> It can avoid some unnecessary cost.
> And in this solution we do not need to remove expression_planner.
>

OK, maybe this is a good idea, but I do recall trying to minimize this
kind of processing before, but there were cases which broke it.
Have you actually tried your idea and run all regression tests and
verified that they passed?
In any case, I'll look into it.

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: PoC/WIP: Extended statistics on expressions
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.