Hello hackers,
Parallel sequential scan relies on the kernel detecting sequential
access, but we don't make the job easy. The resulting striding
pattern works terribly on strict next-block systems like FreeBSD UFS,
and degrades rapidly when you add too many workers on sliding window
systems like Linux.
Demonstration using FreeBSD on UFS on a virtual machine, taking ball
park figures from iostat:
create table t as select generate_series(1, 200000000)::int i;
set max_parallel_workers_per_gather = 0;
select count(*) from t;
-> execution time 13.3s, average read size = ~128kB, ~500MB/s
set max_parallel_workers_per_gather = 1;
select count(*) from t;
-> execution time 24.9s, average read size = ~32kB, ~250MB/s
Note the small read size, which means that there was no read
clustering happening at all: that's the logical block size of this
filesystem.
That explains some complaints I've heard about PostgreSQL performance
on that filesystem: parallel query destroys I/O performance.
As a quick experiment, I tried teaching the block allocated to
allocate ranges of up 64 blocks at a time, ramping up incrementally,
and ramping down at the end, and I got:
set max_parallel_workers_per_gather = 1;
select count(*) from t;
-> execution time 7.5s, average read size = ~128kB, ~920MB/s
set max_parallel_workers_per_gather = 3;
select count(*) from t;
-> execution time 5.2s, average read size = ~128kB, ~1.2GB/s
I've attached the quick and dirty patch I used for that.