On Wed, Feb 12, 2025 at 1:03 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2025-02-11 13:12:17 +1300, Thomas Munro wrote:
> > Tomas queried[1] the limit of 256kB (or really 32 blocks) for
> > io_combine_limit. Yeah, I think we should increase it and allow
> > experimentation with larger numbers. Note that real hardware and
> > protocols have segment and size limits that can force the kernel to
> > split your I/Os, so it's not at all a given that it'll help much or at
> > all to use very large sizes, but YMMV.
+0.02 to the initiative, I've been always wondering why the IOs were
so capped, I know :)
> FWIW, I see substantial performance *regressions* with *big* IO sizes using
> fio. Just looking at cached buffered IO.
>
> for s in 4 8 16 32 64 128 256 512 1024 2048 4096 8192;do echo -ne "$s\t\t"; numactl --physcpubind 3 fio --directory
/srv/dev/fio/--size=32GiB --overwrite 1 --time_based=0 --runtime=10 --name test --rw read --buffered 0 --ioengine psync
--buffered1 --invalidate 0 --output-format json --bs=$((1024*${s})) |jq '.jobs[] | .read.bw_mean';done
>
> io size kB throughput in MB/s
[..]
> 256 16864
> 512 19114
> 1024 12874
[..]
> It's worth noting that if I boot with mitigations=off clearcpuid=smap I get
> *vastly* better performance:
>
> io size kB throughput in MB/s
[..]
> 128 23133
> 256 23317
> 512 25829
> 1024 15912
[..]
> Most of the gain isn't due to mitigations=off but clearcpuid=smap. Apparently
> SMAP, which requires explicit code to allow kernel space to access userspace
> memory, to make exploitation harder, reacts badly to copying lots of memory.
>
> This seems absolutely bonkers to me.
There are two bizarre things there, +35% perf boost just like that due
to security drama, and that io_size=512kb being so special to give a
10-13% boost in Your case? Any ideas, why? I've got on that Lsv2
individual MS nvme under Hyper-V, on ext4, which seems to be much more
real world and average Joe situation, and it is much slower, but it is
not showing advantage for blocksize beyond let's say 128:
io size kB throughput in MB/s
4 1070
8 1117
16 1231
32 1264
64 1249
128 1313
256 1323
512 1257
1024 1216
2048 1271
4096 1304
8192 1214
top hitter on of course stuff like clear_page_rep [k] and
rep_movs_alternative [k] (that was with mitigations=on).
-J.