Changing default value of wal_sync_method to open_datasync on Linux - Mailing list pgsql-hackers
From | Tsunakawa, Takayuki |
---|---|
Subject | Changing default value of wal_sync_method to open_datasync on Linux |
Date | |
Msg-id | 0A3221C70F24FB45833433255569204D1F8CDEE0@G01JPEXMBYT05 Whole thread Raw |
Responses |
Re: Changing default value of wal_sync_method to open_datasync onLinux
Re: Changing default value of wal_sync_method to open_datasync onLinux Re: Changing default value of wal_sync_method to open_datasync on Linux |
List | pgsql-hackers |
Hello, I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux. The patch is attached. I'm feeling this may be controversial, so I'd like to hear your opinions. The reason for change is better performance. Robert Haas said open_datasync was much faster than fdatasync with NVRAM inthis thread: https://www.postgresql.org/message-id/flat/C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp#C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp pg_test_fsync shows higher figures for open_datasync: [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -------------------------------------------------- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 50829.597 ops/sec 20 usecs/op fdatasync 42094.381 ops/sec 24 usecs/op fsync 42209.972 ops/sec 24 usecs/op fsync_writethrough n/a open_sync 48669.605 ops/sec 21 usecs/op -------------------------------------------------- [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -------------------------------------------------- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 34648.778 ops/sec 29 usecs/op fdatasync 31570.947 ops/sec 32 usecs/op fsync 27783.283 ops/sec 36 usecs/op fsync_writethrough n/a open_sync 35238.866 ops/sec 28 usecs/op -------------------------------------------------- pgbench only shows marginally better results, although the difference is within an error range. The following is the tpsof the default read/write workload of pgbench. I ran the test with all the tables and indexes preloaded with pg_prewarm(except pgbench_history), and the checkpoint not happening. I ran a write workload before running the benchmarkso that no new WAL file would be created during the benchmark run. [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -------------------------------------------------- 1 2 3 avg fdatasync 17610 17164 16678 17150 open_datasync 17847 17457 17958 17754 (+3%) [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -------------------------------------------------- 1 2 3 avg fdatasync 4911 5225 5198 5111 open_datasync 4996 5284 5317 5199 (+1%) As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVAL ifthe ext4 volume is mounted with data=journal. That's because open() specifies O_DIRECT in that case. I don't think that'sa problem in practice, because data=journal will not be used for performance, and wal_level needs to be changed fromits default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used. Regards Takayuki Tsunakawa
Attachment
pgsql-hackers by date: