Postgres IO sweet spot - Mailing list pgsql-performance

From Riaan Stander
Subject Postgres IO sweet spot
Date
Msg-id df8a9898-0ff9-4805-abc2-f469670ce92d@exa.co.za
Whole thread Raw
Responses Re: Postgres IO sweet spot
Re: Postgres IO sweet spot
List pgsql-performance
Good day

We host our own Postgres (v17) server on-prem as the backbone of our SaS 
application. It's a fairly busy OLTP application with a database per 
tenant strategy. This obviously does complicate our setup.
Our hosting platform is as follows:
3 x Host Servers running Microsoft Storage Spaces in a 3 way mirror
Ubuntu VM hosting Postgres

A few months ago we had some severe performance issues with lots of 
queries and writing operations just pending. After some deep 
investigation we started realizing that it was disk IO causing the 
issue. We used iostat and could see the write await was above 30ms and 
sometimes even spiking much higher. This was resolved by moving our 
backups (made with Veeam) from backing up the primary to a slave on 
other infrastructure. Our current happy state where clients are not 
experiencing issues is a iostat write await of 5ms and lower.

All was good for a few months until recently when this issue started 
again. This time it could not be the backups. We had various hardware 
vendors involved, but at some point it came to light that the Storage 
Spaces hardware are all mechanical disks with NVME only used for Storage 
Spaces journaling and caching. There are now some discussions of 
upgrading drives to SSD, but my concern is that this is not guaranteed 
to solve the issue. Especially with the 3 way mirror it seems all writes 
will go to the other hosts before returning. So latency is almost 
impossible to remove.

So now my question. I started running some IO tests using fio, 
pg_test_fsync & pg_test_timing. Before we spend days/months trying to 
tune Postgres settings I'm trying to get some definitive published 
information about what IO numbers I should expect when running plain 
hardware tests with Postgres completely out of the loop. I've seen some 
info about 1ms and less write latency is what you want for WAL. My logic 
says that if you have a stiffie drive for storage you can tune it, but 
you still have a stiffie drive.

These are the tests I've run so far
1. WAL-Style Latency Test (4K random sync writes)
fio --name=wal-latency --filename=$TESTDIR/fio_wal_test --size=2G 
--rw=randwrite --bs=4k --iodepth=1 --ioengine=libaio --direct=1 
--fsync=1 --runtime=60 --group_reporting

2. Random Read IOPS Test (index lookup simulation)
fio --name=index-read --filename=$TESTDIR/fio_index_test --size=8G 
--rw=randread --bs=4k --iodepth=32 --ioengine=libaio --direct=1 
--runtime=60 --group_reporting

3. Mixed OLTP Test (70% read / 30% write)
fio --name=oltp-mixed --filename=$TESTDIR/fio_oltp_mixed --size=8G 
--rw=randrw --rwmixread=70 --bs=8k --iodepth=32 --ioengine=libaio 
--direct=1 --runtime=60 --group_reporting

4. Checkpoint Burst Test (sequential write pressure)
fio --name=checkpoint-burst --filename=$TESTDIR/fio_checkpoint 
--size=20G --rw=write --bs=1M --iodepth=64 --ioengine=libaio --direct=1 
--runtime=60 --group_reporting

5. PostgreSQL fsync Code Path Test
pg_test_fsync -f $TESTDIR/pg_test_fsync

6. Timer / Scheduling Jitter Test
pg_test_timing -d 3

Regards
Riaan




pgsql-performance by date:

Previous
From: Greg Sabino Mullane
Date:
Subject: Re: performance impact on read replicas during heavy vacuums on primary
Next
From: Nyasha Chigwamba
Date:
Subject: LATERAL subquery predicate pushdown: filter applied after JSON construction instead of inside join (PG 18)