I have just notice that the parallelism is off even for the select part of the query mentioned in the $subject. I see the only reason it is not getting parallel because we block the parallelism if the query type is not SELECT. I don't see any reason for not selecting the parallelism for this query. I have quickly hacked the code to enable the parallelism for this query. I can see there is a significant improvement if we can enable the parallelism in this case. For an experiment, I have just relaxed a couple of checks, maybe if we think that it's good to enable the parallelism for this case we can try to put better checks which are specific for this quey.
+1 for the idea. For the given example also it shows a good performance
gain and I also don't any reason on restrict the parallel case for INSERT INTO SELECT.
No parallel: postgres[36635]=# explain analyze insert into t2 select * from t where a < 100; Insert on t2 (cost=0.00..29742.00 rows=100 width=105) (actual time=278.300..278.300 rows=0 loops=1) -> Seq Scan on t (cost=0.00..29742.00 rows=100 width=105) (actual time=0.061..277.330 rows=99 loops=1) Filter: (a < 100) Rows Removed by Filter: 999901 Planning Time: 0.093 ms Execution Time: 278.330 ms (6 rows)
With parallel Insert on t2 (cost=1000.00..23460.33 rows=100 width=105) (actual time=108.410..108.410 rows=0 loops=1) -> Gather (cost=1000.00..23460.33 rows=100 width=105) (actual time=0.306..108.973 rows=99 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t (cost=0.00..22450.33 rows=42 width=105) (actual time=66.396..101.979 rows=33 loops=3) Filter: (a < 100) Rows Removed by Filter: 333300 Planning Time: 0.154 ms Execution Time: 110.158 ms (9 rows)