RE: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Tang, Haiying
Subject RE: Parallel Inserts in CREATE TABLE AS
Date
Msg-id 34549865667a4a3bb330ebfd035f85d3@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi Bharath,

I'm trying to take some performance measurements on you patch v23.  
But when I started, I found an issue about the tuples unbalance distribution among workers(99% tuples read by one
worker)under specified case which lead the "parallel select" part makes no performance gain.
 
Then I find it's not introduced by your patch, because it's also happening in master(HEAD). But I don't know how to
dealwith it , so I put it here to see if anybody know what's going wrong with this or have good ideas to deal this
issue.

Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also happen but not so obvious(40% tuples on
oneworker in my tests).
 
2. query plan is "serial insert + parallel select", I have reproduce this behavior in (CTAS, Select into, insert into
select).
3. select part needs to query large data size(e.g. query 100 million from 200 million).

According to above, IMHO, I guess it may be caused by the leader write rate can't catch the worker read rate, then the
tuplesof one worker blocked in the queue, become more and more.
 

Below is my test info:
1. test spec environment
  CentOS 8.2, 128G RAM, 40 processors, disk SAS 

2. test data prepare
  create table x(a int, b int, c int);
  create index on x(a);
  insert into x select generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);

3. test execute results
  *Patched CTAS*: please look at worker 2, 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.203..24023.686 rows=100006268 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.016..4367.035
rows=20001254loops=5)
 
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19998746
         Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
         Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
         Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
         Worker 3:  actual time=0.015..19.347 rows=94586 loops=1  Planning Time: 0.098 ms  Execution Time: 91054.828
ms

  *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
   Gather  (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.283..19216.157 rows=100003148 loops=1)
   Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.020..4380.360
rows=20000630loops=5)
 
         Output: a, c
         Filter: ((x.b % 2) = 0)
         Rows Removed by Filter: 19999370
         Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
         Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
         Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
         Worker 3:  actual time=0.035..19.849 rows=94679 loops=1  Planning Time: 0.083 ms  Execution Time: 91151.097
ms

I'm still working on the performance tests on your patch, if I make some progress, I will post my results here.

Regards,
Tang



pgsql-hackers by date:

Previous
From: Chengxi Sun
Date:
Subject: Re: Is it useful to record whether plans are generic or custom?
Next
From: Greg Nancarrow
Date:
Subject: Re: Parallel INSERT (INTO ... SELECT ...)