Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Should we update the random_page_cost default value?
Date
Msg-id aqz34bqmh6v6r6bplgflid3buhdkv45dkkbx6y6gq34dx4gp42@rcz2est27arz
Whole thread Raw
In response to Re: Should we update the random_page_cost default value?  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Should we update the random_page_cost default value?
List pgsql-hackers
Hi,

On 2025-10-07 16:23:36 +0200, Tomas Vondra wrote:
> On 10/7/25 14:08, Tomas Vondra wrote:
> > ...
> >>>>>> I think doing this kind of measurement via normal SQL query processing is
> >>>>>> almost always going to have too much other influences. I'd measure using fio
> >>>>>> or such instead.  It'd be interesting to see fio numbers for your disks...
> >>>>>>
> >>>>>> fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0
--time_based=1--runtime=5 --ioengine pvsync  --iodepth 1
 
> >>>>>> vs --rw randread
> >>>>>>
> >>>>>> gives me 51k/11k for sequential/rand on one SSD and 92k/8.7k for another.
> >>>>>>
> >>>>>
> >>>>> I can give it a try. But do we really want to strip "our" overhead with
> >>>>> reading data?
> >
> > I got this on the two RAID devices (NVMe and SATA):
> >
> > NVMe: 83.5k / 15.8k
> > SATA: 28.6k /  8.5k
> >
> > So the same ballpark / ratio as your test. Not surprising, really.
> >
>
> FWIW I do see about this number in iostat. There's a 500M test running
> right now, and iostat reports this:
>
>   Device      r/s     rkB/s  ...  rareq-sz  ...  %util
>   md1    15273.10 143512.80  ...      9.40  ...  93.64
>
> So it's not like we're issuing far fewer I/Os than the SSD can handle.

Not really related to this thread:

IME iostat's utilization is pretty much useless for anything other than "is
something happening at all", and even that is not reliable. I don't know the
full reason for it, but I long learned to just discount it.

I ran
fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1
--runtime=100--ioengine pvsync  --iodepth 1 --rate_iops=40000
 

a few times in a row, while watching iostat. Sometimes utilization is 100%,
sometimes it's 0.2%.  Whereas if I run without rate limiting, utilization
never goes above 71%, despite doing more iops.


And then gets completely useless if you use a deeper iodepth, because there's
just not a good way to compute something like a utilization number once
you take parallel IO processing into account.

fio --directory /srv/fio --size=8GiB --name test --invalidate=0 --bs=$((8*1024)) --rw read --buffered 0 --time_based=1
--runtime=100--ioengine io_uring  --iodepth 1 --rw randread
 
iodepth        util    iops
1               94%     9.3k
2               99.6%   18.4k
4               100%    35.9k
8               100%    68.0k
16              100%    123k

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Should we update the random_page_cost default value?
Next
From: Robert Treat
Date:
Subject: Re: Should we update the random_page_cost default value?