On Wed, 8 Jan 2025 at 22:58, Andres Freund <andres@anarazel.de> wrote:
> master: ~18 GB/s
> patch, buffered: ~20 GB/s
> patch, direct, worker: ~28 GB/s
> patch, direct, uring: ~35 GB/s
>
>
> This was with io_workers=32, io_max_concurrency=128,
> effective_io_concurrency=1000 (doesn't need to be that high, but it's what I
> still have the numbers for).
>
>
> This was without data checksums enabled as otherwise the checksum code becomes
> a *huge* bottleneck.
I'm curious about this because the checksum code should be fast enough
to easily handle that throughput. I remember checksum overhead being
negligible even when pulling in pages from page cache. Is it just that
the calculation is slow, or is it the fact that checksumming needs to
bring the page into the CPU cache. Did you notice any hints which
might be the case? I don't really have a machine at hand that can do
anywhere close to this amount of I/O.
I'm asking because if it's the calculation that is slow then it seems
like it's time to compile different ISA extension variants of the
checksum code and select the best one at runtime.
--
Ants Aasma