Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Parallel Inserts in CREATE TABLE AS |
Date | |
Msg-id | CALj2ACVydhFUX1vNAb5zgwxepz1DVfop-L+Jm5+_hDjSBNhEhg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Inserts in CREATE TABLE AS (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Sat, May 29, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > While looking at results, I have observed one more thing that we are > trying to parallelize I/O due to which we might not be seeing benefit > in such cases. I think even for non-write queries there won't be any > (much) benefit if we can't parallelize CPU usage. Basically, the test > you are doing is for statement: explain analyze verbose create table > test as select * from tenk1;. Now, in this statement, there is no > qualification and still, the Gather node is generated for it, this > won't be the case if we check "select * from tenk1". Is it due to the > reason that the patch completely ignores the parallel_tuple_cost? But > still, it should prefer a serial plan due parallel_setup_cost, why is > that not happening? Anyway, I think we should not parallelize such > queries where we can't parallelize CPU usage. Have you tried the cases > without changing any of the costings for parallelism? Hi, I measured the execution timings for parallel inserts in CTAS in cases where the planner chooses parallelism for selects naturally. This means, I have used only 0001 patch from v23 patch set at [1]. I have not used the 0002 patch that makes parallel_tuple_cost 0. Query used for all these tests is below. Also, attached table creation sqls in the file "test_cases". EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1 t1, tenk2 t2 where t1.c1 = t2.d2; All the results are of the form (number of workers, exec time in milli sec). Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4 bytes) columns, tuple size 28 bytes, 100mn tuples master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms) with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448 ms), (4, 111533.731 ms) Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms) with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198 ms), (4, 203543.681 ms) Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms) with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322 ms), (4, 680793.305 ms) Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn tuples master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms) with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472 ms), (4, 317895.558 ms) Observation: parallel insert + parallel select gives good benefit wIth very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger serial insert + parallel select fares better, cases 3 and 4. In the coming days, I will try to work on more performance analysis and clarify some of the points raised upthread. [1] - https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com [2] - postgresql.conf changes I made: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = on checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off wal_level = replica With Regards, Bharath Rupireddy.
Attachment
pgsql-hackers by date: