Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACXQ_wfQBzaVqrTa+Tg2PwVke78-Y_qQZYgVijtM1S_GEA@mail.gmail.com
Whole thread Raw
In response to RE: Parallel Inserts in CREATE TABLE AS  ("Tang, Haiying" <tanghy.fnst@cn.fujitsu.com>)
Responses Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
> I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30 times.
>
> Most of the tests execution become faster with this patch.
>
> However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table
size(2/10/20millions), they all have a 6%-10% declines. I think it may need some check at this problem.
 
>
>
>
> Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
>
> reg%=(patched-master)/master
>
> Test NO |  Test Case                     |reg%  | patched(ms)  | master(ms)
>
> --------|--------------------------------|------|--------------|-------------
>
> 1       |  CTAS select from table        | -9%  | 16709.50477  | 18370.76660
>
> 2       |  Append plan                   | -14% | 16542.97807  | 19305.86600
>
> 3       |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
>
> 4       |  CTAS table                    | 10%  | 20835.48800  | 18986.40350
>
> 5       |  CTAS select from execute      | -6%  | 16973.73890  | 18008.59789
>
>
>
> About Test NO 4:
>
> In master(HEAD), this test case picks serial seq scan.
>
> query plan likes:
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000
loops=1)
>
>    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even,
stringu1,stringu2, string4  Planning Time: 0.053 ms  Execution Time: 20165.023 ms
 
>
>
>
> With this patch, it will choose parallel seq scan and parallel insert.
>
> query plan likes:
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Gather  (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
>
>    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even,
stringu1,stringu2, string4
 
>
>    Workers Planned: 4
>
>    Workers Launched: 4
>
> ->  Create test
>
>    ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094
rows=2000000loops=5)
 
>
>          Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4
 
>
>          Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
>
>          Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
>
>          Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
>
>          Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning Time: 0.053 ms  Execution Time:
20437.643ms
 
>
>
>
> test machine spec:
>
> CentOS 8.2, 128G RAM, 40 processors, disk SAS

Thanks a lot for the performance tests and test cases. I will analyze
why the performance is degrading one case and respond soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [PATCH] remove pg_standby
Next
From: Takashi Menjo
Date:
Subject: Re: [PoC] Non-volatile WAL buffer