RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers

From Hou, Zhijie
Subject RE: Parallel INSERT (INTO ... SELECT ...)
Date
Msg-id f47a6023dde14a4bb09a01bc17dfc41c@G08CNEXMBPEKD05.g08.fujitsu.local
Whole thread Raw
In response to RE: Parallel INSERT (INTO ... SELECT ...)  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: Parallel INSERT (INTO ... SELECT ...)
List pgsql-hackers
> > ----------------------
> > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
> >  pg_size_pretty
> > ----------------
> >  4048 kB
> > (1 row)
> >
> > postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
> >  pg_size_pretty
> > ----------------
> >  4768 kB
> > (1 row)
> 
> Which of the above shows the table size?  What does pg_indexes_size()
> against an index (testscan_index) return?

Sorry, Maybe the tablename is a little confused,
but 'testscan_index' is actually a table's name.

pg_indexes_size will return the index's size attatched to the table.
pg_relation_size will return the table's size.

Did I miss something ?



> > IMO, due to the difference of inserts with parallel execution, the
> > btree insert's cost is more than serial.
> >
> > At the same time, the parallel does not have a huge performance gain
> > with bitmapscan, So the extra cost of btree index will result in
> > performance degradation.
> 
> How did you know that the parallelism didn't have a huge performance gain
> with bitmap scan?
> 
> [serial]
>    ->  Bitmap Heap Scan on public.x  (cost=3272.20..3652841.26 rows=79918
> width=8) (actual time=8.096..41.005 rows=129999 loops=1)
> 
> [parallel]
>          ->  Parallel Bitmap Heap Scan on public.x
> (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787
> rows=26000 loops=5)

I tested the case without insert(Just the query use bitmapscan):

[serial]:
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on x  (cost=3258.59..3647578.53 rows=81338 width=4) (actual time=8.091..34.222 rows=129999 loops=1)
   Recheck Cond: ((a < 80000) OR (a > 199900000))
   Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
   Rows Removed by Filter: 50000
   Heap Blocks: exact=975
   ->  BitmapOr  (cost=3258.59..3258.59 rows=173971 width=0) (actual time=7.964..7.965 rows=0 loops=1)
         ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.451..3.451 rows=79999
loops=1)
               Index Cond: (a < 80000)
         ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.513..4.513
rows=100000loops=1)
 
               Index Cond: (a > 199900000)
 Planning Time: 0.108 ms
 Execution Time: 38.136 ms

[parallel]
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=4258.59..1266704.42 rows=81338 width=4) (actual time=9.177..22.592 rows=129999 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Bitmap Heap Scan on x  (cost=3258.59..1257570.62 rows=20334 width=4) (actual time=6.402..12.882
rows=26000loops=5)
 
         Recheck Cond: ((a < 80000) OR (a > 199900000))
         Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
         Rows Removed by Filter: 10000
         Heap Blocks: exact=1
         ->  BitmapOr  (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.785..8.786 rows=0 loops=1)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.871..3.871
rows=79999loops=1)
 
                     Index Cond: (a < 80000)
               ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.914..4.914
rows=100000loops=1)
 
                     Index Cond: (a > 199900000)
 Planning Time: 0.158 ms
 Execution Time: 26.951 ms
(15 rows)

It did have performance gain, but I think it's not huge enough to ignore the extra's index cost.
What do you think ?

Best regards,
houzj



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Next
From: Amit Kapila
Date:
Subject: Re: Single transaction in the tablesync worker?