On 7/23/25 02:59, Andres Freund wrote:
> Hi,
>
> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
>> But I don't see why would this have any effect on the prefetch distance,
>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve
>> that. I'd have expected exactly the opposite behavior.
>>
>> Could be bug, of course. But it'd be helpful to see the dataset/query.
>
> Pgbench scale 500, with the simpler query from my message.
>
I tried to reproduce this, but I'm not seeing behavior. I'm not sure how
you monitor the queue depth (presumably iostat?), but I added a basic
prefetch info to explain (see the attached WIP patch), reporting the
average prefetch distance, number of stalls (with distance=0) and stream
resets (after filling INDEX_SCAN_MAX_BATCHES).
And I see this (there's a complete explain output attached) for the two
queries from your message [1]. The
simple query:
SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid
LIMIT 10000000);
complex query:
SELECT max(abalance), min(abalance), sum(abalance::numeric),
avg(abalance::numeric), avg(aid::numeric), avg(bid::numeric) FROM
(SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);
The stats actually look *exactly* the same, which makes sense because
it's reading the same index.
max_batches distance stalls resets stalls/reset
--------------------------------------------------------------------
64 272 3 3 1
32 59 122939 653 188
16 36 108101 1190 90
8 21 98775 2104 46
4 11 95627 4556 20
I think this behavior mostly matches my expectations, although it's
interesting the stalls jump so much between 64 and 32 batches.
I did test both with buffered I/O (io_method=sync) and direct I/O
(io_method=worker), and the results are exactly the same for me. Not the
timings, of course, but the prefetch stats.
Of course, maybe there's something wrong in how the stats are collected.
I wonder if maybe we should update the distance in get_block() and not
in next_buffer().
Or maybe there's some interference from having to read the leaf pages
sooner. But I don't see why that would affect the queue depth, fewer
reset should keep the queues fuller I think.
I'll think about adding some sort of distance histogram to the stats.
Maybe something like tinyhist [2] would work here.
[1]
https://www.postgresql.org/message-id/h2n7d7zb2lbkdcemopvrgmteo35zzi5ljl2jmk32vz5f4pziql%407ppr6r6yfv4z
[2] https://github.com/tvondra/tinyhist
regards
--
Tomas Vondra