On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote:
> On 8/14/25 01:50, Peter Geoghegan wrote:
>> I first made the order of the table random, except among groups of index tuples
>> that have exactly the same value. Those will still point to the same 1 or 2 heap
>> blocks in virtually all cases, so we have "heap clustering without any heap
>> correlation" in the newly rewritten table. To set things up this way, I first
>> made another index, and then clustered the table using that new index:
> Interesting. It's really surprising random I/O beats the sequential.
It should be noted that the effect seems to be limited to io_method=io_uring.
I find that with io_method=worker, the execution time of the original
"sequential heap access" backwards scan is very similar to the execution time
of the variant with the index that exhibits "heap clustering without any heap
correlation" (the variant where individual heap blocks appear in random order).
Benchmark that includes both io_uring and worker
================================================
I performed the usual procedure of prewarming the index and evicting the heap
relation, and then actually running the relevant query through EXPLAIN
ANALYZE. Direct I/O was used throughout.
io_method=worker
----------------
Original backwards scan: 1498.024 ms (shared read=48.080)
"No heap correlation" backwards scan: 1483.348 ms (shared read=22.036)
Original forwards scan: 656.884 ms (shared read=19.904)
"No heap correlation" forwards scan: 578.076 ms (shared read=10.159)
io_method=io_uring
------------------
Original backwards scan: 1052.807 ms (shared read=187.876)
"No heap correlation" backwards scan: 649.473 ms (shared read=365.802)
Original forwards scan: 593.126 ms (shared read=55.837)
"No heap correlation" forwards scan: 429.888 ms (shared read=188.619)
Summary
-------
As of this morning, io_method=io_uring also shows that the forwards scan is
faster with random heap accesses than without (not just the backwards scan).
I double-checked, to make sure that the effect was real; it seems to be.
I'm aware that some of these numbers (those for the original/sequential
forward scan case) don't match what I reported on Tuesday. I believe that
this is due to changes I made to my SSD's readahead using blockdev, though
it's possible that there's some other explanation. (In case it matters, I'm
running Debian unstable with liburing2 "2.9-1".)
The important point remains: at least with io_uring, the backwards scan query
is much faster with random I/O than it is with descending sequential I/O. It
might make sense if they were at least at parity, but clearly they're not.
--
Peter Geoghegan