Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALj2ACWFq6Z4_jd9RPByURB8-Y8wccQWzLf+0-Jg+KYT7ZO-Ug@mail.gmail.com Whole thread Raw |
Responses |
Re: Parallel Inserts in CREATE TABLE AS
Re: Parallel Inserts in CREATE TABLE AS |
List | pgsql-hackers |
Hi,
The idea of this patch is to allow the leader and each worker insert the tuples in parallel if the SELECT part of the CTAS is parallelizable. Along with the parallel inserts, if the CTAS code path is allowed to do table_multi_insert()[1], then the gain we achieve is as follows:
For a table with 2 integer columns, 100million tuples(more testing results are at [2]), the exec time on the HEAD is 120sec, where as with the parallelism patch proposed here and multi insert patch [1], with 3 workers and leader participation the exec time is 22sec(5.45X). With the current CTAS code which does single tuple insert(see intorel_receive()), the performance gain is limited to ~1.7X with parallelism. This is due to the fact that the workers contend more for locks on buffer pages while extending the table. So, the maximum benefit we could get for CTAS is with both parallelism and multi tuple inserts.
The design:
Let the planner know that the SELECT is from CTAS in createas.c so that it can set the number of tuples transferred from the workers to Gather node to 0. With this change, there are chances that the planner may choose the parallel plan. After the planning, check if the upper plan node is Gather in createas.c and mark a parallelism flag in the CTAS dest receiver. Pass the into clause, object id, command id from the leader to workers, so that each worker can create its own CTAS dest receiver. Leader inserts it's share of tuples if instructed to do, and so are workers. Each worker writes atomically it's number of inserted tuples into a shared memory variable, the leader combines this with it's own number of inserted tuples and shares to the client.
2. How to represent the parallel insert for CTAS in explain plans? The explain CTAS shows the plan for only the SELECT part. How about having some textual info along with the Gather node?
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954 width=8)
Filter: (many < 10000)
Thoughts?
Credits:
1. Thanks to DIlip Kumar for the main design idea and the discussions. Thanks to Vignesh for the discussions.
2. Patch development, testing is by me.
3. Thanks to the authors of table_multi_insert() in CTAS patch [1].
[1] - For table_multi_insert() in CTAS, I used an in-progress patch available at https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com
[2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
The idea of this patch is to allow the leader and each worker insert the tuples in parallel if the SELECT part of the CTAS is parallelizable. Along with the parallel inserts, if the CTAS code path is allowed to do table_multi_insert()[1], then the gain we achieve is as follows:
For a table with 2 integer columns, 100million tuples(more testing results are at [2]), the exec time on the HEAD is 120sec, where as with the parallelism patch proposed here and multi insert patch [1], with 3 workers and leader participation the exec time is 22sec(5.45X). With the current CTAS code which does single tuple insert(see intorel_receive()), the performance gain is limited to ~1.7X with parallelism. This is due to the fact that the workers contend more for locks on buffer pages while extending the table. So, the maximum benefit we could get for CTAS is with both parallelism and multi tuple inserts.
The design:
Let the planner know that the SELECT is from CTAS in createas.c so that it can set the number of tuples transferred from the workers to Gather node to 0. With this change, there are chances that the planner may choose the parallel plan. After the planning, check if the upper plan node is Gather in createas.c and mark a parallelism flag in the CTAS dest receiver. Pass the into clause, object id, command id from the leader to workers, so that each worker can create its own CTAS dest receiver. Leader inserts it's share of tuples if instructed to do, and so are workers. Each worker writes atomically it's number of inserted tuples into a shared memory variable, the leader combines this with it's own number of inserted tuples and shares to the client.
Below things are still pending. Thoughts are most welcome:
1. How better we can lift the "cannot insert tuples in a parallel worker" from heap_prepare_insert() for only CTAS cases or for that matter parallel copy? How about having a variable in any of the worker global contexts and use that? Of course, we can remove this restriction entirely in case we fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.2. How to represent the parallel insert for CTAS in explain plans? The explain CTAS shows the plan for only the SELECT part. How about having some textual info along with the Gather node?
-----------------------------------------------------------------------------
Gather (cost=1000.00..108738.90 rows=0 width=8)
Workers Planned: 2
-> Parallel Seq Scan on t_test (cost=0.00..106748.00 rows=4954 width=8)
Filter: (many < 10000)
-----------------------------------------------------------------------------
3. Need to restrict parallel inserts, if CTAS tries to create temp/global tables as the workers will not have access to those tables. Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
4. Need to stop unnecessary parallel shared state such as tuple queue being created and shared to workers.
5. Addition of new test cases. Testing with more scenarios and different data sets, sizes, tablespaces, select into. Analysis on the 2 mismatches in write_parallel.sql regression test.Thoughts?
Credits:
1. Thanks to DIlip Kumar for the main design idea and the discussions. Thanks to Vignesh for the discussions.
2. Patch development, testing is by me.
3. Thanks to the authors of table_multi_insert() in CTAS patch [1].
[1] - For table_multi_insert() in CTAS, I used an in-progress patch available at https://www.postgresql.org/message-id/CAEET0ZG31mD5SWjTYsAt0JTLReOejPvusJorZ3kGZ1%3DN1AC-Fw%40mail.gmail.com
[2] - Table with 2 integer columns, 100million tuples, with leader participation,with default postgresql.conf file. All readings are of triplet form - (workers, exec time in sec, improvement).
case 1: no multi inserts - (0,120,1X),(1,91,1.32X),(2,75,1.6X),(3,67,1.79X),(4,72,1.66X),(5,77,1.56),(6,83,1.44X)
case 2: with multi inserts - (0,59,1X),(1,32,1.84X),(2,28,2.1X),(3,25,2.36X),(4,23,2.56X),(5,22,2.68X),(6,22,2.68X)
case 3: same table but unlogged with multi inserts - (0,50,1X),(1,28,1.78X),(2,25,2X),(3,22,2.27X),(4,21,2.38X),(5,21,2.38X),(6,20,2.5X)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: