Home > mailing lists

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From	Bharath Rupireddy
Subject	Re: Parallel Inserts in CREATE TABLE AS
Date	January 22, 2021 15:38:14
Msg-id	CALj2ACVv=r9VzzJPePA+Hj5iAyNxqnp_AJJ4qXE9iBixcSE87Q@mail.gmail.com Whole thread Raw
In response to	RE: Parallel Inserts in CREATE TABLE AS ("Tang, Haiying" <tanghy.fnst@cn.fujitsu.com>)
Responses	RE: Parallel Inserts in CREATE TABLE AS
List	pgsql-hackers

Tree view

On Fri, Jan 22, 2021 at 5:16 PM Tang, Haiying <tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Bharath,
>
> I'm trying to take some performance measurements on you patch v23.
> But when I started, I found an issue about the tuples unbalance distribution among workers(99% tuples read by one worker) under specified case which lead the "parallel select" part makes no performance gain.
> Then I find it's not introduced by your patch, because it's also happening in master(HEAD). But I don't know how to deal with it , so I put it here to see if anybody know what's going wrong with this or have good ideas to deal this issue.
>
> Here are the conditions to produce the issue:
> 1. high CPU spec environment(say above 20 processors). In smaller CPU, it also happen but not so obvious(40% tuples on one worker in my tests).
> 2. query plan is "serial insert + parallel select", I have reproduce this behavior in (CTAS, Select into, insert into select).
> 3. select part needs to query large data size(e.g. query 100 million from 200 million).
>
> According to above, IMHO, I guess it may be caused by the leader write rate can't catch the worker read rate, then the tuples of one worker blocked in the queue, become more and more.
>
> Below is my test info:
> 1. test spec environment
> CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> 2. test data prepare
> create table x(a int, b int, c int);
> create index on x(a);
> insert into x select generate_series(1,200000000),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
>
> 3. test execute results
> *Patched CTAS*: please look at worker 2, 99% tuples read by it.
> explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.203..24023.686 rows=100006268 loops=1)
> Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
> Workers Planned: 4
> Workers Launched: 4
> -> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
> Output: a, c
> Filter: ((x.b % 2) = 0)
> Rows Removed by Filter: 19998746
> Worker 0: actual time=0.016..19.265 rows=94592 loops=1
> Worker 1: actual time=0.027..31.422 rows=94574 loops=1
> Worker 2: actual time=0.014..21744.549 rows=99627749 loops=1
> Worker 3: actual time=0.015..19.347 rows=94586 loops=1 Planning Time: 0.098 ms Execution Time: 91054.828 ms
>
> *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
> explain analyze verbose create table test(a,b,c) as select a,floor(random()*(10001-1)+1),c from x where b%2=0;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------
> Gather (cost=1000.00..1942082.77 rows=1000001 width=16) (actual time=0.283..19216.157 rows=100003148 loops=1)
> Output: a, floor(((random() * '10000'::double precision) + '1'::double precision)), c
> Workers Planned: 4
> Workers Launched: 4
> -> Parallel Seq Scan on public.x (cost=0.00..1831082.66 rows=250000 width=8) (actual time=0.020..4380.360 rows=20000630 loops=5)
> Output: a, c
> Filter: ((x.b % 2) = 0)
> Rows Removed by Filter: 19999370
> Worker 0: actual time=0.013..21805.647 rows=99624833 loops=1
> Worker 1: actual time=0.016..19.790 rows=94398 loops=1
> Worker 2: actual time=0.013..35.340 rows=94423 loops=1
> Worker 3: actual time=0.035..19.849 rows=94679 loops=1 Planning Time: 0.083 ms Execution Time: 91151.097 ms
>
> I'm still working on the performance tests on your patch, if I make some progress, I will post my results here.

Thanks a lot for the tests. In your test case, parallel insertions are not being picked because the Gather node has some projections(floor(((random() * '10000'::double precision) + '1'::double precision)) to perform. That's expected. Whenever parallel insertions are chosen for CTAS, we should see "Create target_table '' under Gather node [1] and also the actual row count for Gather node 0 (but in your test it is rows=100006268) in the explain analyze output. Coming to your test case, if it's modified to something like [1], where the Gather node has no projections, then parallel insertions will be chosen.

[1] - I did this test on my development system, I will run on some performance system and post my observations.

postgres=# explain (analyze, verbose) create table test(a,b,c) as select a,b,c from x where b%2=0;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..3846.71 rows=1000 width=12) (actual time=5581.308..5581.379 rows=0 loops=1)
Output: a, b, c
Workers Planned: 1
Workers Launched: 1
-> Create test
-> Parallel Seq Scan on public.x (cost=0.00..2846.71 rows=588 width=12) (actual time=0.014..29.512 rows=50023 loops=2)
Output: a, b, c
Filter: ((x.b % 2) = 0)
Rows Removed by Filter: 49977
Worker 0: actual time=0.015..29.751 rows=49419 loops=1
Planning Time: 1574.584 ms
Execution Time: 6437.562 ms
(12 rows)

With Regards,

Bharath Rupireddy.

EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: vignesh C
Date: 22 January 2021, 15:37:39
Subject: Re: Identify missing publications from publisher while create/alter subscription.

From: vignesh C
Date: 22 January 2021, 15:39:10
Subject: Re: Identify missing publications from publisher while create/alter subscription.

Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

Previous

Next