Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CA+HiwqHJ3sp9hAZF1vM3F55=8kuDnMkESrTACN5A4TMw6JznUg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
Greg, all

Thanks a lot for your work on this.

On Mon, Feb 8, 2021 at 3:53 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Posting an updated set of patches.

I've been looking at these patches, initially with an intention to
review mainly any partitioning-related concerns, but have some general
thoughts as well concerning mostly the patches 0001 and 0002.

* I've seen review comments on this thread where I think it's been
suggested that whatever max_parallel_hazard_for_modify() does had
better have been integrated into max_parallel_hazard() such that
there's no particular need for that function to exist.  For example,
the following:

+   /*
+    * UPDATE is not currently supported in parallel-mode, so prohibit
+    * INSERT...ON CONFLICT...DO UPDATE...
+    * In order to support update, even if only in the leader, some
+    * further work would need to be done. A mechanism would be needed
+    * for sharing combo-cids between leader and workers during
+    * parallel-mode, since for example, the leader might generate a
+    * combo-cid and it needs to be propagated to the workers.
+    */
+   if (parse->onConflict != NULL && parse->onConflict->action ==
ONCONFLICT_UPDATE)
+       return PROPARALLEL_UNSAFE;

could be placed in the following block in max_parallel_hazard():

    /*
     * When we're first invoked on a completely unplanned tree, we must
     * recurse into subqueries so to as to locate parallel-unsafe constructs
     * anywhere in the tree.
     */
    else if (IsA(node, Query))
    {
        Query      *query = (Query *) node;

        /* SELECT FOR UPDATE/SHARE must be treated as unsafe */
        if (query->rowMarks != NULL)
        {
            context->max_hazard = PROPARALLEL_UNSAFE;
            return true;
        }

Furthermore, the following:

+   rte = rt_fetch(parse->resultRelation, parse->rtable);
+
+   /*
+    * The target table is already locked by the caller (this is done in the
+    * parse/analyze phase).
+    */
+   rel = table_open(rte->relid, NoLock);
+   (void) rel_max_parallel_hazard_for_modify(rel, parse->commandType,
&context);
+   table_close(rel, NoLock);

can itself be wrapped in a function that's called from
max_parallel_hazard() by adding a new block for RangeTblEntry nodes
and passing QTW_EXAMINE_RTES_BEFORE to query_tree_walker().

That brings me to to this part of the hunk:

+   /*
+    * If there is no underlying SELECT, a parallel table-modification
+    * operation is not possible (nor desirable).
+    */
+   hasSubQuery = false;
+   foreach(lc, parse->rtable)
+   {
+       rte = lfirst_node(RangeTblEntry, lc);
+       if (rte->rtekind == RTE_SUBQUERY)
+       {
+           hasSubQuery = true;
+           break;
+       }
+   }
+   if (!hasSubQuery)
+       return PROPARALLEL_UNSAFE;

The justification for this given in:

https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com

seems to be that the failure of a test case in
partition-concurrent-attach isolation suite is prevented if finding no
subquery RTEs in the query is flagged as parallel unsafe, which in
turn stops max_parallel_hazard_modify() from locking partitions for
safety checks in such cases.  But it feels unprincipled to have this
code to work around a specific test case that's failing.  I'd rather
edit the failing test case to disable parallel execution as
Tsunakawa-san suggested.

* Regarding function names:

+static bool trigger_max_parallel_hazard_for_modify(TriggerDesc *trigdesc,
+
max_parallel_hazard_context *context);
+static bool index_expr_max_parallel_hazard_for_modify(Relation rel,
+
max_parallel_hazard_context *context);
+static bool domain_max_parallel_hazard_for_modify(Oid typid,
max_parallel_hazard_context *context);
+static bool rel_max_parallel_hazard_for_modify(Relation rel,
+                                              CmdType command_type,
+
max_parallel_hazard_context *context)

IMO, it would be better to name these
target_rel_trigger_max_parallel_hazard(),
target_rel_index_max_parallel_hazard(), etc. rather than have
_for_modify at the end of these names to better connote that they
check the parallel safety of applying the modify operation to a given
target relation.  Also, put these prototypes just below that of
max_parallel_hazard() to have related things close by.

Attached please see v15_delta.diff showing the changes suggested above.

* I suspect that the following is broken in light of concurrent
attachment of partitions.

+
+       /* Recursively check each partition ... */
+       pdesc = RelationGetPartitionDesc(rel);

I think we'd need to use CreatePartitionDirectory() and retrieve the
PartitionDesc using PartitionDirectoryLookup().  Something we already
do when opening partitions for SELECT planning.

* I think that the concerns raised by Tsunakawa-san in:

https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com

regarding how this interacts with plancache.c deserve a look.
Specifically, a plan that uses parallel insert may fail to be
invalidated when partitions are altered directly (that is without
altering their root parent).  That would be because we are not adding
partition OIDs to PlannerGlobal.invalItems despite making a plan
that's based on checking their properties.  See this (tested with all
patches applied!):

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
create table foo (a) as select generate_series(1, 1000000);
prepare q as insert into rp select * from foo where a%2 = 0;
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- create a parallel unsafe trigger (that's actually marked so)
directly on a partition
create or replace function make_table () returns trigger language
plpgsql as $$ begin create table bar(); return null; end; $$ parallel
unsafe;
create trigger ai_rp2 after insert on rp2 for each row execute
function make_table();CREATE TRIGGER

-- plan still parallel
explain execute q;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather  (cost=1000.00..13041.54 rows=5642 width=0)
   Workers Planned: 2
   ->  Insert on rp  (cost=0.00..11477.34 rows=0 width=0)
         ->  Parallel Seq Scan on foo  (cost=0.00..11477.34 rows=2351 width=4)
               Filter: ((a % 2) = 0)
(5 rows)

-- and because it is
execute q;
ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "create table bar()"
PL/pgSQL function make_table() line 1 at SQL statement

-- OTOH, altering parent correctly discards the parallel plan
create trigger ai_rp after insert on rp for each row execute function
make_table();
explain execute q;
                           QUERY PLAN
----------------------------------------------------------------
 Insert on rp  (cost=0.00..19425.00 rows=0 width=0)
   ->  Seq Scan on foo  (cost=0.00..19425.00 rows=5000 width=4)
         Filter: ((a % 2) = 0)
(3 rows)

It's fair to argue that it would rarely make sense to use PREPARE for
bulk loads, but we need to tighten things up a bit here regardless.


--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Next
From: "Mead, Scott"
Date:
Subject: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay