Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id 01fa8229-3235-4ce4-85a5-2adbaa90da64@vondra.me
Whole thread Raw
In response to Re: index prefetching  (Andres Freund <andres@anarazel.de>)
Responses Re: index prefetching
List pgsql-hackers
On 7/23/25 02:59, Andres Freund wrote:
> Hi,
> 
> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote:
>> But I don't see why would this have any effect on the prefetch distance,
>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve
>> that. I'd have expected exactly the opposite behavior.
>>
>> Could be bug, of course. But it'd be helpful to see the dataset/query.
> 
> Pgbench scale 500, with the simpler query from my message.
> 

I tried to reproduce this, but I'm not seeing behavior. I'm not sure how
you monitor the queue depth (presumably iostat?), but I added a basic
prefetch info to explain (see the attached WIP patch), reporting the
average prefetch distance, number of stalls (with distance=0) and stream
resets (after filling INDEX_SCAN_MAX_BATCHES).

And I see this (there's a complete explain output attached) for the two
queries from your message [1]. The

simple query:

SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid
LIMIT 10000000);

complex query:

SELECT max(abalance), min(abalance), sum(abalance::numeric),
avg(abalance::numeric), avg(aid::numeric), avg(bid::numeric) FROM
(SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000);

The stats actually look *exactly* the same, which makes sense because
it's reading the same index.


   max_batches      distance      stalls      resets      stalls/reset
  --------------------------------------------------------------------
            64           272           3           3                 1
            32            59      122939         653               188
            16            36      108101         1190               90
             8            21       98775         2104               46
             4            11       95627         4556               20

I think this behavior mostly matches my expectations, although it's
interesting the stalls jump so much between 64 and 32 batches.

I did test both with buffered I/O (io_method=sync) and direct I/O
(io_method=worker), and the results are exactly the same for me. Not the
timings, of course, but the prefetch stats.

Of course, maybe there's something wrong in how the stats are collected.
I wonder if maybe we should update the distance in get_block() and not
in next_buffer().

Or maybe there's some interference from having to read the leaf pages
sooner. But I don't see why that would affect the queue depth, fewer
reset should keep the queues fuller I think.


I'll think about adding some sort of distance histogram to the stats.
Maybe something like tinyhist [2] would work here.



[1]
https://www.postgresql.org/message-id/h2n7d7zb2lbkdcemopvrgmteo35zzi5ljl2jmk32vz5f4pziql%407ppr6r6yfv4z

[2] https://github.com/tvondra/tinyhist


regards

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Frédéric Yhuel
Date:
Subject: Re: vacuumdb changes for stats import/export
Next
From: Ashutosh Bapat
Date:
Subject: Re: Document transition table triggers are not allowed on views/foreign tables