Re: Parallel Inserts in CREATE TABLE AS - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel Inserts in CREATE TABLE AS
Date
Msg-id CALj2ACW9BUoFqWkmTSeHjFD-W7_00s3orqSvtvUk+KD2H7ZmRg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Inserts in CREATE TABLE AS  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses RE: Parallel Inserts in CREATE TABLE AS  ("tanghy.fnst@fujitsu.com" <tanghy.fnst@fujitsu.com>)
Re: Parallel Inserts in CREATE TABLE AS  (Greg Nancarrow <gregn4422@gmail.com>)
Re: Parallel Inserts in CREATE TABLE AS  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
> <tanghy.fnst@cn.fujitsu.com> wrote:
> > I choose 5 cases which pick parallel insert plan in CTAS to measure the patched performance. Each case run 30
times.
> >
> > Most of the tests execution become faster with this patch.
> >
> > However, Test NO 4(create table xxx as table xxx.) appears performance degradation. I tested various table
size(2/10/20millions), they all have a 6%-10% declines. I think it may need some check at this problem.
 
> >
> >
> >
> > Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached test_ctas.sql file.
> >
> > reg%=(patched-master)/master
> >
> > Test NO |  Test Case                     |reg%  | patched(ms)  | master(ms)
> >
> > --------|--------------------------------|------|--------------|-------------
> >
> > 1       |  CTAS select from table        | -9%  | 16709.50477  | 18370.76660
> >
> > 2       |  Append plan                   | -14% | 16542.97807  | 19305.86600
> >
> > 3       |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
> >
> > 4       |  CTAS table                    | 10%  | 20835.48800  | 18986.40350
> >
> > 5       |  CTAS select from execute      | -6%  | 16973.73890  | 18008.59789
> >
> >
> >
> > About Test NO 4:
> >
> > In master(HEAD), this test case picks serial seq scan.
> >
> > query plan likes:
> >
> >
----------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=10000012 width=244) (actual time=0.005..1675.268 rows=10000000
loops=1)
> >
> >    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4  Planning Time: 0.053 ms  Execution Time: 20165.023 ms
 
> >
> >
> >
> > With this patch, it will choose parallel seq scan and parallel insert.
> >
> > query plan likes:
> >
> >
----------------------------------------------------------------------------------------------------------------------------------------------------------
> >
> > Gather  (cost=1000.00..370828.03 rows=10000012 width=244) (actual time=20428.823..20437.143 rows=0 loops=1)
> >
> >    Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd,
even,stringu1, stringu2, string4
 
> >
> >    Workers Planned: 4
> >
> >    Workers Launched: 4
> >
> > ->  Create test
> >
> >    ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=2500003 width=244) (actual time=0.021..411.094
rows=2000000loops=5)
 
> >
> >          Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous,
odd,even, stringu1, stringu2, string4
 
> >
> >          Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
> >
> >          Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
> >
> >          Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
> >
> >          Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning Time: 0.053 ms  Execution Time:
20437.643ms
 
> >
> >
> >
> > test machine spec:
> >
> > CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> Thanks a lot for the performance tests and test cases. I will analyze
> why the performance is degrading one case and respond soon.

I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk. I saw no regression when I incremented(for testing purpose) the
rate at which the extra blocks are added in RelationAddExtraBlocks to
extraBlocks = Min(1024, lockWaiters * 512); (currently it is
extraBlocks = Min(512, lockWaiters * 20); Incrementing the extra
blocks addition rate is not a practical solution to this problem
though.

In an offlist discussion with Robert and Dilip, using fallocate to
extend the relation may help to extend the relation faster. In regards
to this, it looks like the AIO/DIO patch set of Andres [1]  which
involves using fallocate() to extend files will surely be helpful.
Until then, we honestly feel that the parallel inserts in CTAS patch
set be put on hold and revive it later.

[1] - https://www.postgresql.org/message-id/flat/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: fdatasync performance problem with large number of DB files
Next
From: Ajin Cherian
Date:
Subject: Re: Logical Replication vs. 2PC