Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | DC0RJDW1PEAX.NDGIXR6WPM94@bowt.ie Whole thread Raw |
In response to | Re: index prefetching (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: index prefetching
Re: index prefetching Re: index prefetching |
List | pgsql-hackers |
On Tue Aug 12, 2025 at 1:06 AM EDT, Thomas Munro wrote: > I'd be interested to hear more about reverse scans. Bilal was > speculating about backwards I/O combining in read_stream.c a while > back, but we didn't have anything interesting to use it yet. You'll > probably see a flood of uncombined 8KB IOs in the pg_aios view while > travelling up the heap with cache misses today. I suspect Linux does > reverse sequential prefetching with buffered I/O (less sure about > other OSes) which should help but we'd still have more overheads than > we could if we combined them, not to mention direct I/O. Doesn't look like Linux will do this, if what my local testing shows is anything to go on. I'm a bit surprised by this (I also thought that OS readahead on linux was quite sophisticated). There does seem to be something fishy going on with the patch here. I can see strange inconsistencies in EXPLAIN ANALYZE output when the server is started with --debug_io_direct=data with the master, compared to what I see with the patch. Test case ========= My test case is a minor refinement of Tomas' backwards scan test case from earlier today, though with one important difference: I ran "alter index idx set (deduplicate_items = off); reindex index idx;" to get a pristine index without any posting lists (since the unrelated issue with posting list TIDs otherwise risks obscuring something relevant). master ------ pg@regression:5432 [2390630]=# select pg_buffercache_evict_relation('t'); select pg_prewarm('idx'); ***SNIP*** pg@regression:5432 [2390630]=# EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a; ┌────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using idx on t (actual time=0.117..982.469 rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10353 read=49933 │ │ I/O Timings: shared read=861.953 │ │ Planning: │ │ Buffers: shared hit=63 read=20 │ │ I/O Timings: shared read=1.898 │ │ Planning Time: 2.131 ms │ │ Execution Time: 1015.679 ms │ └────────────────────────────────────────────────────────────────────────────────┘ (10 rows) pg@regression:5432 [2390630]=# select pg_buffercache_evict_relation('t'); select pg_prewarm('idx'); ***SNIP*** pg@regression:5432 [2390630]=# EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a desc; ┌──────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using idx on t (actual time=7.919..6340.579 rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10350 read=49933 │ │ I/O Timings: shared read=6219.776 │ │ Planning: │ │ Buffers: shared hit=5 │ │ Planning Time: 0.076 ms │ │ Execution Time: 6374.008 ms │ └──────────────────────────────────────────────────────────────────────────────────────────┘ (9 rows) Notice that readahead seems to be effective with the forwards scan only (even though I'm using debug_io_direct=data for this). Also notice that each query shows identical "Buffers:" output -- that detail is exactly as expected. Prefetch patch -------------- Same pair of queries/prewarming/eviction steps with my working copy of the prefetching patch: pg@regression:5432 [2400564]=# select pg_buffercache_evict_relation('t'); select pg_prewarm('idx'); ***SNIP*** pg@regression:5432 [2400564]=# EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a; ┌────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using idx on t (actual time=0.136..298.301 rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=6619 read=49933 │ │ I/O Timings: shared read=45.313 │ │ Planning: │ │ Buffers: shared hit=63 read=20 │ │ I/O Timings: shared read=2.232 │ │ Planning Time: 2.634 ms │ │ Execution Time: 330.379 ms │ └────────────────────────────────────────────────────────────────────────────────┘ (10 rows) pg@regression:5432 [2400564]=# select pg_buffercache_evict_relation('t'); select pg_prewarm('idx'); ***SNIP*** pg@regression:5432 [2400564]=# EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a desc; ┌──────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using idx on t (actual time=7.926..1201.988 rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10350 read=49933 │ │ I/O Timings: shared read=194.774 │ │ Planning: │ │ Buffers: shared hit=5 │ │ Planning Time: 0.097 ms │ │ Execution Time: 1236.655 ms │ └──────────────────────────────────────────────────────────────────────────────────────────┘ (9 rows) It looks like the patch does significantly better with the forwards scan, compared to the backwards scan (though both are improved by a lot). But that's not the main thing about these results that I find interesting. The really odd thing is that we get "shared hit=6619 read=49933" for the forwards scan, and "shared hit=10350 read=49933" for the backwards scan. The latter matches master (regardless of the scan direction used on master), while the former just looks wrong. What explains the "missing buffer hits" seen with the forwards scan? Discrepancies ------------- All 4 query executions agree that "rows=1048576.00", so the patch doesn't appear to simply be broken/giving wrong answers. Might it be that the "Buffers" instrumentation is broken? The premise of my original complaint was that big inconsistencies in performance shouldn't happen between similar forwards and backwards scans (at least not with direct I/O). I now have serious doubts about that premise, since it looks like OS readahead remains a big factor with direct I/O. Did I just miss something obvious? >> I wonder if today's commit b4212231 from Thomas ("Fix rare bug in >> read_stream.c's split IO handling") fixed the issue, without anyone >> realizing that the bug in question could manifest like this. > > I can't explain that. If you can consistently reproduce the change at > the two base commits, maybe bisect? Commit b4212231 was a wild guess on my part. Probably should have refrained from that. -- Peter Geoghegan
pgsql-hackers by date: