Hi,
While discussing the next steps for AIO writes in Postgres, Andres
suggested that a good starting point would be to begin evicting more
than one buffer at a time in some of the buffer access strategies that
perform writes. This would make it easier to later combine these
writes and, eventually, issue them asynchronously.
The attached patch implements this behavior for the BAS_BULKWRITE
strategy. With the patch applied, I observe average performance
improvements of about 15-20% for parallel COPY FROM operations on the
same table.
After some analysis, this improvement appears to be primarily due to
reduced time spent by each backend waiting on the lock to flush WAL.
Since backends now issue more data file writes before each WAL flush
(using a heuristic that avoids eviction when it would require flushing
WAL), there is less interleaving between WAL flushes and data file
writes. With the patch applied, I observe client backends waiting
significantly less on the WALWriteLock. I also see lower f_await times
in iostat, suggesting reduced flush-related waiting at the kernel
level as well.
It's worth noting that for the serial COPY case (a single COPY FROM),
performance remains essentially unchanged with the patch. The benefit
seems to emerge only when multiple backends are concurrently writing
data and flushing WAL. In fact, the benefits go down the fewer
parallel COPY FROM operations are performed at a time.
The benchmark I did was simple:
-- make 16 source data files that are >= 1GB each
initdb
pg_ctl start
createdb
sudo fstrim -v /mnt/data
psql -c "drop table foo; create table foo(a int, b int) with
(autovacuum_enabled = off);"
time pgbench \
--no-vacuum \
-c 16 \
-j 16 \
-t 4 \
-f- <<EOF
COPY foo FROM '/mnt/data/foo:client_id.data';
EOF
master -> patch
6.2 minutes -> 5 minutes : ~20% reduction
A 15% improvement can be noticed with the same benchmark but 4 workers.
- Melanie