RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id TYAPR01MB29902F37B88951CFA7B9AB51FE8D9@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Parallel INSERT (INTO ... SELECT ...)  ("Hou, Zhijie" <houzj.fnst@cn.fujitsu.com>)
List pgsql-hackers
From: tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com>
> From: Hou, Zhijie/侯 志杰 <houzj.fnst@cn.fujitsu.com>
> > It did have performance gain, but I think it's not huge enough to ignore the
> > extra's index cost.
> > What do you think ?
> 
> Yes... as you suspect, I'm afraid the benefit from parallel bitmap scan may not
> compensate for the loss of the parallel insert operation.
> 
> The loss is probably due to 1) more index page splits, 2) more buffer writes
> (table and index), and 3) internal locks for things such as relation extension
> and page content protection.  To investigate 3), we should want something
> like [1], which tells us the wait event statistics (wait count and time for each
> wait event) per session or across the instance like Oracke, MySQL and EDB
> provides.  I want to continue this in the near future.

What would the result look like if you turn off parallel_leader_participation?  If the leader is freed from
reading/writingthe table and index, the index page splits and internal lock contention may decrease enough to recover
partof the loss.
 

https://www.postgresql.org/docs/devel/parallel-plans.html

"In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more
indexesand builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the
cooperatingprocesses as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the
underlyingindex scan is not."
 


BTW, the following sentences seem to be revisited, because "the work to be done" is not the same for parallel INSERT as
forserial INSERT - the order of rows stored, table and index sizes, and what else?
 

https://www.postgresql.org/docs/devel/using-explain.html#USING-EXPLAIN-ANALYZE

"It's worth noting that although the data-modifying node can take a considerable amount of run time (here, it's
consumingthe lion's share of the time), the planner does not currently add anything to the cost estimates to account
forthat work. That's because the work to be done is the same for every correct query plan, so it doesn't affect
planningdecisions."
 


Regards
Takayuki Tsunakawa



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Improvements and additions to COPY progress reporting
Next
From: Peter Geoghegan
Date:
Subject: Re: 64-bit XIDs in deleted nbtree pages