Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | cffa6e3e-f9c4-468f-b218-a3a5e172c226@vondra.me Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On 7/23/25 17:09, Andres Freund wrote: > Hi, > > On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote: >> On 7/23/25 02:59, Andres Freund wrote: >>> Hi, >>> >>> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: >>>> But I don't see why would this have any effect on the prefetch distance, >>>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve >>>> that. I'd have expected exactly the opposite behavior. >>>> >>>> Could be bug, of course. But it'd be helpful to see the dataset/query. >>> >>> Pgbench scale 500, with the simpler query from my message. >>> >> >> I tried to reproduce this, but I'm not seeing behavior. I'm not sure how >> you monitor the queue depth (presumably iostat?) > > Yes, iostat, since I was looking at what the "actually required" lookahead > distance is. > > Do you actually get the query to be entirely CPU bound? What amount of IO > waiting do you see EXPLAIN (ANALYZE, TIMING OFF) with track_io_timing=on > report? > No, it definitely needs to wait for I/O (FWIW it's on the xeon, with a single NVMe SSD). > Ah - I was using a very high effective_io_concurrency. With a high > effective_io_concurrency value I see a lot of stalls, even at > INDEX_SCAN_MAX_BATCHES = 64. And a lower prefetch distance, which seems > somewhat odd. > I think that's a bug in the explain patch. The counters were updated at the beginning of _next_buffer(), but that's wrong - a single call to _next_buffer() can prefetch multiple blocks. This skewed the stats, as the prefetches are not counted with "distance=0". With higher eic this happens sooner, so the average distance seemed to decrease. The attached patch does the updates in _get_block(), which I think is better. And "stall" now means (distance == 1), which I think detects requests without prefetching. I also added a separate "Count" for the actual number of prefetched blocks, and "Skipped" for duplicate blocks skipped (which the read stream never even sees, because it's skipped in the callback). > > FWIW, in my tests I was just evicting lineitem from shared buffers, since I > wanted to test the heap prefetching, without stalls induced by blocking on > index reads. But what I described happens with either. > > ;SET effective_io_concurrency = 256;SELECT pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs off,timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000); > ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Aggregate (actual rows=1.00 loops=1) │ > │ Buffers: shared hit=27369 read=164191 │ > │ I/O Timings: shared read=358.795 │ > │ -> Limit (actual rows=10000000.00 loops=1) │ > │ Buffers: shared hit=27369 read=164191 │ > │ I/O Timings: shared read=358.795 │ > │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) │ > │ Index Searches: 1 │ > │ Prefetch Distance: 256.989 │ > │ Prefetch Stalls: 3 │ > │ Prefetch Resets: 3 │ > │ Buffers: shared hit=27369 read=164191 │ > │ I/O Timings: shared read=358.795 │ > │ Planning Time: 0.086 ms │ > │ Execution Time: 4194.845 ms │ > └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > ;SET effective_io_concurrency = 512;SELECT pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs off,timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts ORDER BY aid LIMIT 10000000); > ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Aggregate (actual rows=1.00 loops=1) │ > │ Buffers: shared hit=27368 read=164190 │ > │ I/O Timings: shared read=832.515 │ > │ -> Limit (actual rows=10000000.00 loops=1) │ > │ Buffers: shared hit=27368 read=164190 │ > │ I/O Timings: shared read=832.515 │ > │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) │ > │ Index Searches: 1 │ > │ Prefetch Distance: 56.778 │ > │ Prefetch Stalls: 160569 │ > │ Prefetch Resets: 423 │ > │ Buffers: shared hit=27368 read=164190 │ > │ I/O Timings: shared read=832.515 │ > │ Planning Time: 0.084 ms │ > │ Execution Time: 4413.058 ms │ > └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > Greetings, > The attached v2 explain patch should fix that. I'm also attaching logs from my explain, for 64 and 16 batches. I think the output makes much more sense now. cheers -- Tomas Vondra
Attachment
pgsql-hackers by date: