Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1Kb=LRDUFow7238Mr_1VRK-dnpk2jpYkX7x1jDyx2qEKQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Posting an updated set of patches to address recent feedback:
>

Here is an additional review of
v11-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. There are
quite a few comments raised on the V11-0001* patch. I suggest first
post a revised version of V11-0001* patch addressing those comments
and then you can separately post a revised version of
v11-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO.

Few comments:
==============
1.
+char
+max_parallel_hazard_for_modify(Query *parse, const char
*initial_max_parallel_hazard)
{
..
+ return (rel_max_parallel_hazard_for_modify(rte->relid,
parse->commandType, &context, NoLock));
..
}

rel_max_parallel_hazard_for_modify()
{
..
+ rel = table_open(relid, lockmode);
..
+ if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
..
+ /*
+ * Column default expressions for columns in the target-list are
+ * already being checked for parallel-safety in the
+ * max_parallel_hazard() scan of the query tree in standard_planner().
+ */
+
+ tupdesc = RelationGetDescr(rel);
}

Here, it seems we are accessing the relation descriptor without any
lock on the table which is dangerous considering the table can be
modified in a parallel session. Is there a reason why you think this
is safe? Did you check anywhere else such a coding pattern?

2.
+ /*
+ * If there are any index expressions, check that they are parallel-mode
+ * safe.
+ */
+ max_hazard = index_expr_max_parallel_hazard_for_modify(rel, context);
+ if (max_parallel_hazard_test(max_hazard, context))
+ {
+ table_close(rel, lockmode);
+ return context->max_hazard;
+ }

Here and at all other similar places, the call to
max_parallel_hazard_test seems redundant because
index_expr_max_parallel_hazard_for_modify would have already done
that. Why can't we just return true/false from
index_expr_max_parallel_hazard_for_modify? The context would have been
already updated for max_hazard.

3.
@@ -342,6 +343,18 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  /* all the cheap tests pass, so scan the query tree */
  glob->maxParallelHazard = max_parallel_hazard(parse);
  glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+
+ /*
+ * Additional parallel-mode safety checks are required in order to
+ * allow an underlying parallel query to be used for a
+ * table-modification command that is supported in parallel-mode.
+ */
+ if (glob->parallelModeOK &&
+ IsModifySupportedInParallelMode(parse->commandType))
+ {
+ glob->maxParallelHazard = max_parallel_hazard_for_modify(parse,
&glob->maxParallelHazard);
+ glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);
+ }
  }

I don't like this way of checking parallel_hazard for modify. This not
only duplicates some code in max_parallel_hazard_for_modify from
max_parallel_hazard but also appears quite awkward. Can we move
max_parallel_hazard_for_modify inside max_parallel_hazard? Basically,
after calling max_parallel_hazard_walker, we can check for modify
statement and call the new function.

4.
domain_max_parallel_hazard_for_modify()
{
..
+ if (isnull)
+ {
+ /*
+ * This shouldn't ever happen, but if it does, log a WARNING
+ * and return UNSAFE, rather than erroring out.
+ */
+ elog(WARNING, "null conbin for constraint %u", con->oid);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ break;
+ }
..
}
index_expr_max_parallel_hazard_for_modify()
{
..
+ if (index_expr_item == NULL) /* shouldn't happen */
+ {
+ index_close(index_rel, lockmode);
+ context->max_hazard = PROPARALLEL_UNSAFE;
+ return context->max_hazard;
+ }
..
}

It is not clear why the above two are shouldn't happen cases and if so
why we want to treat them as unsafe. Ideally, there should be an
Assert if these can't happen but it is difficult to decide without
knowing why you have considered them unsafe?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Wrong HINT during database recovery when occur a minimal wal.
Next
From: Julien Rouhaud
Date:
Subject: Re: list of extended statistics on psql