Bharath-san, all,
Hmm, I didn't experience performance degradation on my poor-man's Linux VM (4 CPU, 4 GB RAM, HDD)...
[benchmark preparation]
autovacuum = off
shared_buffers = 1GB
checkpoint_timeout = 1h
max_wal_size = 8GB
min_wal_size = 8GB
(other settings to enable parallelism)
CREATE UNLOGGED TABLE a (c char(1100));
INSERT INTO a SELECT i FROM generate_series(1, 300000) i;
(the table size is 335 MB)
[benchmark]
CREATE TABLE b AS SELECT * FROM a;
DROP TABLE a;
CHECKPOINT;
(measure only CTAS)
[results]
parallel_leader_participation = off
workers time(ms)
0 3921
2 3290
4 3132
parallel_leader_participation = on
workers time(ms)
2 3266
4 3247
Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup. This is
becauseI thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and
flushingdirty buffers when the ring buffer is filled. Can we take advantage of this?
[GetBulkInsertState]
/* bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
bistate->strategy = NULL;
[results]
parallel_leader_participation = off
workers time(ms)
0 3695 (5% reduction)
2 3135 (4% reduction)
4 2767 (11% reduction)
Regards
Takayuki Tsunakawa