Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com Whole thread Raw |
Responses |
Re: Parallel INSERT (INTO ... SELECT ...)
Re: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...) Re: Parallel INSERT (INTO ... SELECT ...) |
List | pgsql-hackers |
Hi Hackers, Following on from Dilip Kumar's POC patch for allowing parallelism of the SELECT part of "INSERT INTO ... SELECT ...", I have attached a POC patch for allowing parallelism of both the INSERT and SELECT parts, where it can be allowed. For cases where it can't be allowed (e.g. INSERT into a table with foreign keys, or INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE ...") it at least allows parallelism of the SELECT part. Obviously I've had to update the planner and executor and parallel-worker code to make this happen, hopefully not breaking too many things along the way. Examples with patch applied: (1) non-parallel: test=# explain analyze insert into primary_tbl select * from third_tbl; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Insert on primary_tbl (cost=0.00..154.99 rows=9999 width=12) (actual time=108.445..108.446 rows=0 loops=1) -> Seq Scan on third_tbl (cost=0.00..154.99 rows=9999 width=12) (actual time=0.009..5.282 rows=9999 loops=1) Planning Time: 0.132 ms Execution Time: 108.596 ms (4 rows) (2) parallel: test=# explain analyze insert into primary_tbl select * from third_tbl; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..16.00 rows=9999 width=12) (actual time=69.870..70.310 rows=0 loops=1) Workers Planned: 5 Workers Launched: 5 -> Parallel Insert on primary_tbl (cost=0.00..16.00 rows=500 width=12) (actual time=59.948..59.949 rows=0 loops=6) -> Parallel Seq Scan on third_tbl (cost=0.00..80.00 rows=2500 width=12) (actual time=0.014..0.922 rows=1666 loops=6) Planning Time: 0.121 ms Execution Time: 70.438 ms (7 rows) (3) parallel select only (insert into table with foreign key) test=# explain analyze insert into secondary_tbl select * from third_tbl; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on secondary_tbl (cost=0.00..80.00 rows=9999 width=12) (actual time=33.864..33.926 rows=0 loops=1) -> Gather (cost=0.00..80.00 rows=9999 width=12) (actual time=0.451..5.201 rows=9999 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on third_tbl (cost=0.00..80.00 rows=2500 width=12) (actual time=0.013..0.717 rows=2000 loops=5) Planning Time: 0.127 ms Trigger for constraint secondary_tbl_index_fkey: time=331.834 calls=9999 Execution Time: 367.342 ms (8 rows) Known issues/TODOs: - Currently only for "INSERT INTO ... SELECT ...". To support "INSERT INTO ... VALUES ..." would need additional Table AM functions for dividing up the INSERT work amongst the workers (currently only exists for scans). - When INSERTs are made parallel, currently the reported row-count in the "INSERT 0 <row-count>" status only reflects the rows that the leader has processed (not the workers) - so it is obviously less than the actual number of rows inserted. - Functions relating to computing the number of parallel workers for an INSERT, and the cost of an INSERT, need work. - "force_parallel_mode" handling was updated so that it only affects SELECT (not INSERT) - can't allow it for INSERT because we're only supporting "INSERT INTO .. SELECT ..." and don't support other types of INSERTs, and also can't allow attempted parallel UPDATEs resulting from "INSERT INTO ... SELECT ... ON CONFLICT ... DO UPDATE" etc. Thoughts and feedback? Regards, Greg Nancarrow Fujitsu Australia
Attachment
pgsql-hackers by date: