Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1K4xwT4gcO2AJF7i3xF6kpg70ZjuhjBJh8UG+YjKEGHzA@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("Tang, Haiying" <tanghy.fnst@cn.fujitsu.com>)
Responses RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
<tanghy.fnst@cn.fujitsu.com> wrote:
>
> Hi Tsunakawa-san
>
> > From: Tang, Haiying <tanghy.fnst@cn.fujitsu.com>
> > > (does this patch make some optimizes in serial insert? I'm a little
> > > confused here, Because the patched execution time is less than
> > > unpatched, but I didn't find information in commit messages about it.
> > > If I missed something, please kindly let me know.)
> >
> > I haven't thought of anything yet.  Could you show us the output of
> > EXPLAIN (ANALYZE, BUFFERS, VERBOSE) of 1,000 partitions case for the
> > patched and unpatched?  If it doesn't show any difference, the output
> > of perf may be necessary next.
>
> Execute EXPLAIN on Patched:
> postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=44.139..44.140 rows=0 loops=1)
>    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.007..0.201 rows=1000
loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5
>  Planning:
>    Buffers: shared hit=27011
>  Planning Time: 24.526 ms
>  Execution Time: 44.981 ms
>
> Execute EXPLAIN on non-Patched:
> postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part select * from test_data1;
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual time=72.656..72.657 rows=0 loops=1)
>    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
>    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000 width=8) (actual time=0.010..0.175 rows=1000
loops=1)
>          Output: test_data1.a, test_data1.b
>          Buffers: shared hit=5
>  Planning:
>    Buffers: shared hit=72
>  Planning Time: 0.135 ms
>  Execution Time: 79.058 ms
>

So, the results indicate that after the patch we touch more buffers
during planning which I think is because of accessing the partition
information, and during execution, the patch touches fewer buffers for
the same reason. But why this can reduce the time with patch? I think
this needs some investigation.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Tang, Haiying"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: Peter Eisentraut
Date:
Subject: Re: popcount