So I think it's kind of cute that you've implemented these as agnostic
wrappers that work with any allocator ... but why?
I would have expected the functionality to just be added directly to
the allocator to explicitly request whole aligned pages which IIRC
it's already capable of doing but just doesn't have any way to
explicitly request.
DirectIO doesn't really need a wide variety of allocation sizes or
alignments, it's always going to be the physical block size which
apparently can be as low as 512 bytes but I'm guessing we're always
going to be using 4kB alignment and multiples of 8kB allocations.
Wouldn't just having a pool of 8kB pages all aligned on 4kB or 8kB
alignment be simpler and more efficient than working around misaligned
pointers and having all these branches and arithmetic happening?