Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From Thomas Munro
Subject Parallel Seq Scan vs kernel read ahead
Date
Msg-id CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com
Whole thread Raw
Responses Re: Parallel Seq Scan vs kernel read ahead
List pgsql-hackers
Hello hackers,

Parallel sequential scan relies on the kernel detecting sequential
access, but we don't make the job easy.  The resulting striding
pattern works terribly on strict next-block systems like FreeBSD UFS,
and degrades rapidly when you add too many workers on sliding window
systems like Linux.

Demonstration using FreeBSD on UFS on a virtual machine, taking ball
park figures from iostat:

  create table t as select generate_series(1, 200000000)::int i;

  set max_parallel_workers_per_gather = 0;
  select count(*) from t;
  -> execution time 13.3s, average read size = ~128kB, ~500MB/s

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 24.9s, average read size = ~32kB, ~250MB/s

Note the small read size, which means that there was no read
clustering happening at all: that's the logical block size of this
filesystem.

That explains some complaints I've heard about PostgreSQL performance
on that filesystem: parallel query destroys I/O performance.

As a quick experiment, I tried teaching the block allocated to
allocate ranges of up 64 blocks at a time, ramping up incrementally,
and ramping down at the end, and I got:

  set max_parallel_workers_per_gather = 1;
  select count(*) from t;
  -> execution time 7.5s, average read size = ~128kB, ~920MB/s

  set max_parallel_workers_per_gather = 3;
  select count(*) from t;
  -> execution time 5.2s, average read size = ~128kB, ~1.2GB/s

I've attached the quick and dirty patch I used for that.

Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_stat_wal_receiver and flushedUpto/writtenUpto
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan vs kernel read ahead