Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACVydhFUX1vNAb5zgwxepz1DVfop-L+Jm5+_hDjSBNhEhg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sat, May 29, 2021 at 9:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> While looking at results, I have observed one more thing that we are
> trying to parallelize I/O due to which we might not be seeing benefit
> in such cases. I think even for non-write queries there won't be any
> (much) benefit if we can't parallelize CPU usage. Basically, the test
> you are doing is for statement: explain analyze verbose create table
> test as select * from tenk1;. Now, in this statement, there is no
> qualification and still, the Gather node is generated for it, this
> won't be the case if we check "select * from tenk1". Is it due to the
> reason that the patch completely ignores the parallel_tuple_cost? But
> still, it should prefer a serial plan due parallel_setup_cost, why is
> that not happening? Anyway, I think we should not parallelize such
> queries where we can't parallelize CPU usage. Have you tried the cases
> without changing any of the costings for parallelism?

Hi,

I measured the execution timings for parallel inserts in CTAS in cases
where the planner chooses parallelism for selects naturally. This
means, I have used only 0001 patch from v23 patch set at [1]. I have
not used the 0002 patch that makes parallel_tuple_cost 0.

Query used for all these tests is below. Also, attached table creation
sqls in the file "test_cases".
EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1
t1, tenk2 t2 where t1.c1 = t2.d2;

All the results are of the form (number of workers, exec time in milli sec).

Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4
bytes) columns, tuple size 28 bytes, 100mn tuples
master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms)
with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448
ms), (4, 111533.731 ms)

Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4
bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples
master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms)
with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198
ms), (4, 203543.681 ms)

Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8),
tuple size 241 bytes, 100mn tuples
master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms)
with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322
ms), (4, 680793.305 ms)

Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 16 name(of 64 bytes each) columns, tuple size
1064 bytes, 10mn tuples
master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms)
with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472
ms), (4, 317895.558 ms)

Observation: parallel insert + parallel select gives good benefit wIth
very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger
serial insert + parallel select fares better, cases 3 and 4.

In the coming days, I will try to work on more performance analysis
and clarify some of the points raised upthread.

[1] - https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com
[2] - postgresql.conf changes I made:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = on
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
wal_level = replica

With Regards,
Bharath Rupireddy.

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Multiple hosts in connection string failed to failover in non-hot standby mode
Next
From: Dilip Kumar
Date:
Subject: Re: [BUG]Update Toast data failure in logical replication