RE: Parallel INSERT (INTO ... SELECT ...) - Mailing list pgsql-hackers
From | Hou, Zhijie |
---|---|
Subject | RE: Parallel INSERT (INTO ... SELECT ...) |
Date | |
Msg-id | 65b078232071452694b3b0d0f113040e@G08CNEXMBPEKD05.g08.fujitsu.local Whole thread Raw |
In response to | RE: Parallel INSERT (INTO ... SELECT ...) ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
RE: Parallel INSERT (INTO ... SELECT ...)
|
List | pgsql-hackers |
> > Till now, what I found is that: > > With tang's conf, when doing parallel insert, the walrecord is more > > than serial insert (IMO, this is the main reason why it has > > performance degradation) See the attatchment for the plan info. > > > > I have tried alter the target table to unlogged and then the > > performance degradation will not happen any more. > > > > And the additional walrecord seems related to the index on the target > table. > > If the target table does not have any index, the wal record is the > > same between parallel plan and serial plan. > > Also, it does not have performance degradation without index. > > > [serial] > Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) > (actual time=360.474..360.476 rows=0 loops=1) > Buffers: shared hit=392569 read=3 dirtied=934 written=933 > WAL: records=260354 bytes=16259841 > > [parallel] > -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 > width=0) (actual time=378.227..378.229 rows=0 loops=5) > Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 > WAL: records=260498 bytes=17019359 > > > More pages are dirtied and written in the parallel execution. Aren't the > index and possibly the target table bigger with parallel execution than > with serial execution? That may be due to the difference of inserts of > index keys. Yes, the table size and index size is bigger with parallel execution. table and index's size after parallel insert ---------------------- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty ---------------- 4048 kB (1 row) postgres=# postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty ---------------- 4768 kB (1 row) ---------------------- table and index's size after serial insert ---------------------- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty ---------------- 2864 kB (1 row) postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty ---------------- 4608 kB ---------------------- To Amit: > I think you might want to see which exact WAL records are extra by using pg_waldump? Yes, thanks for the hint, I was doing that and the result is as follow: Heap wal record is the same between parallel and serial: (129999 which is the number count of the query result). parallel Btree walrecord(130500 record): ---------------------- INSERT_LEAF:129500 INSERT_UPPER:497 SPLIT_L:172 SPLIT_R:328 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:3 REUSE_PAGE:0 META_CLEANUP:0 ---------------------- serial Btree walrecord(130355 record): ---------------------- INSERT_LEAF:129644 INSERT_UPPER:354 SPLIT_L:0 SPLIT_R:355 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:2 REUSE_PAGE:0 META_CLEANUP:0 ---------------------- IMO, due to the difference of inserts with parallel execution, the btree insert's cost is more than serial. At the same time, the parallel does not have a huge performance gain with bitmapscan, So the extra cost of btree index will result in performance degradation. Does it make sense ? Best regards, Houzj
pgsql-hackers by date: