Thread: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
From
Salvatore Dipietro
Date:
Hi, we would like to propose the removal of the Instruction Synchronization Barrier (isb) for aarch64 architectures. Based on our testing on Graviton instances (m7g.16xlarge), we can see on average over multiple iterations up to 12% better performance using PGBench select-only and up to 9% with Sysbench oltp_read_only workloads. On Graviton4 (m8g.24xlarge) results are up to 8% better using PGBench select-only and up to 6% with Sysbench oltp_read_only workloads. We have also tested it putting more pressure on the spin_delay function, enabling pg_stat_statements.track_planning with PGBench read-only [0] and, on average, the patch shows up to 27% better performance on m6g.16xlarge and up to 37% on m7g.16xlarge. Testing environment: - PostgreSQL version: 17.2 - Operating System: Ubuntu 22.04 - Test Platform: AWS Graviton instances (m6g.16xlarge, m7g.16xlarge and m8g.24xlarge) Our benchmark results on PGBench select-only without pg_stat_statements.track_planning: ``` # Load DB on m7g.16xlarge $ pgbench -i --fillfactor=90 --scale=5644 --host=172.31.32.85 --username=postgres pgtest # Without patch $ pgbench --host 172.31.32.85 --username=postgres --protocol=prepared -P 10 -b select-only --time=600 --client=256 --jobs=96 pgtest ... "transaction type: <builtin: select only>", "scaling factor: 5644", "query mode: prepared", "number of clients: 256", "number of threads: 96", "duration: 600 s", "number of transactions actually processed: 359864937", "latency average = 0.420 ms", "latency stddev = 1.755 ms", "tps = 599770.727316 (including connections establishing)", "tps = 599826.788919 (excluding connections establishing)" # With patch $ pgbench --host 172.31.32.85 --username=postgres --protocol=prepared -P 10 -b select-only --time=600 --client=256 --jobs=96 pgtest ... "transaction type: <builtin: select only>", "scaling factor: 5644", "query mode: prepared", "number of clients: 256", "number of threads: 96", "duration: 600 s", "number of transactions actually processed: 405891881", "latency average = 0.371 ms", "latency stddev = 0.569 ms", "tps = 676480.900049 (including connections establishing)", "tps = 676523.557293 (excluding connections establishing)" ``` [0] https://www.postgresql.org/message-id/ZxgDEb_VpWyNZKB_%40nathan