RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel Inserts in CREATE TABLE AS
Date
Msg-id f4af0f3439b24ad48aceac3520c9160a@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS
List pgsql-hackers
Hi

+    /*
+     * Flag to let the planner know that the SELECT query is for CTAS. This is
+     * used to calculate the tuple transfer cost from workers to gather node(in
+     * case parallelism kicks in for the SELECT part of the CTAS), to zero as
+     * each worker will insert its share of tuples in parallel.
+     */
+    if (IsParallelInsertInCTASAllowed(into, NULL))
+        query->isForCTAS = true;


+    /*
+     * We do not compute the parallel_tuple_cost for CTAS because the number of
+     * tuples that are transferred from workers to the gather node is zero as
+     * each worker, in parallel, inserts the tuples that are resulted from its
+     * chunk of plan execution. This change may make the parallel plan cheap
+     * among all other plans, and influence the planner to consider this
+     * parallel plan.
+     */
+    if (!(root->parse->isForCTAS &&
+        root->query_level == 1))
+        run_cost += parallel_tuple_cost * path->path.rows;

I noticed that the parallel_tuple_cost will still be ignored,
When Gather is not the top node.

Example:
    Create table test(i int);
    insert into test values(generate_series(1,10000000,1));
    explain create table ntest3 as select * from test where i < 200 limit 10000;

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Limit  (cost=1000.00..97331.33 rows=1000 width=4)
   ->  Gather  (cost=1000.00..97331.33 rows=1000 width=4)
         Workers Planned: 2
         ->  Parallel Seq Scan on test  (cost=0.00..96331.33 rows=417 width=4)
               Filter: (i < 200)


The isForCTAS will be true because [create table as], the
query_level is always 1 because there is no subquery.
So even if gather is not the top node, parallel cost will still be ignored.

Is that works as expected ?

Best regards,
houzj






pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: Improving spin-lock implementation on ARM.
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently