RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id TYAPR01MB2990DA4BCAEB495F999E0C36FEB39@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel INSERT (INTO ... SELECT ...)  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
List pgsql-hackers
From: Amit Kapila <amit.kapila16@gmail.com>
> On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> <tanghy.fnst@cn.fujitsu.com> wrote:
> > Execute EXPLAIN on Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >                                                        QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=44.139..44.140 rows=0 loops=1)
> >    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> >          Output: test_data1.a, test_data1.b
> >          Buffers: shared hit=5
> >  Planning:
> >    Buffers: shared hit=27011
> >  Planning Time: 24.526 ms
> >  Execution Time: 44.981 ms
> >
> > Execute EXPLAIN on non-Patched:
> > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> select * from test_data1;
> >                                                        QUERY PLAN
> >
> ---------------------------------------------------------------------------
> ---------------------------------------------
> >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> time=72.656..72.657 rows=0 loops=1)
> >    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> >          Output: test_data1.a, test_data1.b
> >          Buffers: shared hit=5
> >  Planning:
> >    Buffers: shared hit=72
> >  Planning Time: 0.135 ms
> >  Execution Time: 79.058 ms
> >
> 
> So, the results indicate that after the patch we touch more buffers
> during planning which I think is because of accessing the partition
> information, and during execution, the patch touches fewer buffers for
> the same reason. But why this can reduce the time with patch? I think
> this needs some investigation.

I guess another factor other than shared buffers is relcache and catcache.  The patched version loads those cached
entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched
versionhas to gradually build those cache entries during execution.  How can wee confirm its effect?
 


Regards
Takayuki Tsunakawa



pgsql-hackers by date:

Previous
From: Zhihong Yu
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Tomas Vondra
Date:
Subject: Re: WIP: BRIN multi-range indexes