Thanks for the extensive testing! Did you see the same syscall pattern in strace output, as I did?
If yes, then the only reason I can think of that excuses the regression with my patch is that the SATA interface was
maxedout when reading sequentially, while the very short latency of SSDs guarantees thousands of seek() operations per
second.
I was using an HDD, and in older measurements I was using a VM with mounted volume over iSCSI. The first imposes
physicallimits in the amount of seeks, and the second network round-trip limits.
So you are right, it's probably very platform dependent, and the most important fix was to enlarge the underlying block
size,that you have done.
Dimitris