Home > mailing lists

RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From	Hou, Zhijie
Subject	RE: Parallel Inserts in CREATE TABLE AS
Date	January 11, 2021 04:07:03
Msg-id	b7be257dbc784001af779e8d75bea5c2@G08CNEXMBPEKD05.g08.fujitsu.local Whole thread Raw
In response to	Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: Parallel Inserts in CREATE TABLE AS (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List	pgsql-hackers

Tree view

> Attaching v21 patch set, which has following changes:
> 1) 0001 - changed fpes->ins_cmd_type ==
> PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF
> 2) 0002 - reworded the commit message.
> 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> to ensure leader and worker insert the tuples in the same xact and replaced
> memory usage output in numbers like 25kB to NkB to make the tests stable.
> 4) 0004 - updated one of the test output to be in NkB and made the assertion
> in SetParallelInsertState to be not under an if condition.
> 
> There's one open point [1] on selective skipping of error "cannot insert
> tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> welcome.
> 
> Please consider the v21 patch set for further review.

Hi,

I took a look into the new patch and have some comments.

1.
+    /*
+     * Do not consider tuple cost in case of we intend to perform parallel
+     * inserts by workers. We would have turned on the ignore flag in
+     * apply_scanjoin_target_to_paths before generating Gather path for the
+     * upper level SELECT part of the query.
+     */
+    if ((root->parse->parallelInsCmdTupleCostOpt &
+         PARALLEL_INSERT_SELECT_QUERY) &&
+        (root->parse->parallelInsCmdTupleCostOpt &
+         PARALLEL_INSERT_CAN_IGN_TUP_COST))

Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set.


2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
...
+        info = (ParallelInsertCTASInfo *) ins_info;
+        intoclause_str = nodeToString(info->intoclause);
+        intoclause_len = strlen(intoclause_str) + 1;

+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+                       void *ins_info)
...
+        info = (ParallelInsertCTASInfo *)ins_info;
+        intoclause_str = nodeToString(info->intoclause);
+        intoclause_len = strlen(intoclause_str) + 1;
+        intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);

I noticed the above code will call nodeToString and strlen twice which seems unnecessary.
Do you think it's better to store the result of nodetostring and strlen first and pass them when used ?

3.
+    if (node->need_to_scan_locally || node->nworkers_launched == 0)
+    {
+        EState       *estate = node->ps.state;
+        TupleTableSlot *outerTupleSlot;
+
+        for(;;)
+        {
+            /* Install our DSA area while executing the plan. */
+            estate->es_query_dsa =
+                    node->pei ? node->pei->area : NULL;
...
+            node->ps.state->es_processed++;
+        }

How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;

Best regards,
houzj

pgsql-hackers by date:

From: "tsunakawa.takay@fujitsu.com"
Date: 11 January 2021, 03:14:04
Subject: RE: Disable WAL logging to speed up data loading

From: Thomas Munro
Date: 11 January 2021, 05:34:53
Subject: Re: pg_preadv() and pg_pwritev()

RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

Previous

Next