Re: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id CAA4eK1+q0-greWrcgpsHU-QJZg7EGBDYcsnL9cqbn81A8V_Mpg@mail.gmail.com
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers
On Thu, Feb 4, 2021 at 6:26 AM tsunakawa.takay@fujitsu.com
<tsunakawa.takay@fujitsu.com> wrote:
>
>         From: Amit Kapila <amit.kapila16@gmail.com>
> > On Mon, Jan 18, 2021 at 2:40 PM Tang, Haiying
> > <tanghy.fnst@cn.fujitsu.com> wrote:
> > > Execute EXPLAIN on Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > >                                                        QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> > time=44.139..44.140 rows=0 loops=1)
> > >    Buffers: shared hit=1005 read=1000 dirtied=3000 written=2000
> > >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.007..0.201 rows=1000 loops=1)
> > >          Output: test_data1.a, test_data1.b
> > >          Buffers: shared hit=5
> > >  Planning:
> > >    Buffers: shared hit=27011
> > >  Planning Time: 24.526 ms
> > >  Execution Time: 44.981 ms
> > >
> > > Execute EXPLAIN on non-Patched:
> > > postgres=# explain (ANALYZE, BUFFERS, VERBOSE) insert into test_part
> > select * from test_data1;
> > >                                                        QUERY PLAN
> > >
> > ---------------------------------------------------------------------------
> > ---------------------------------------------
> > >  Insert on public.test_part  (cost=0.00..15.00 rows=0 width=0) (actual
> > time=72.656..72.657 rows=0 loops=1)
> > >    Buffers: shared hit=22075 read=1000 dirtied=3000 written=2000
> > >    ->  Seq Scan on public.test_data1  (cost=0.00..15.00 rows=1000
> > width=8) (actual time=0.010..0.175 rows=1000 loops=1)
> > >          Output: test_data1.a, test_data1.b
> > >          Buffers: shared hit=5
> > >  Planning:
> > >    Buffers: shared hit=72
> > >  Planning Time: 0.135 ms
> > >  Execution Time: 79.058 ms
> > >
> >
> > So, the results indicate that after the patch we touch more buffers
> > during planning which I think is because of accessing the partition
> > information, and during execution, the patch touches fewer buffers for
> > the same reason. But why this can reduce the time with patch? I think
> > this needs some investigation.
>
> I guess another factor other than shared buffers is relcache and catcache.  The patched version loads those cached
entriesfor all partitions of the insert target table during the parallel-safety check in planning, while the unpatched
versionhas to gradually build those cache entries during execution. 
>

Right.

>  How can wee confirm its effect?
>

I am not sure but if your theory is correct then won't in consecutive
runs both should have the same performance?


--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: logical replication worker accesses catalogs in error context callback
Next
From: David Rowley
Date:
Subject: Re: Tid scan improvements