Re: index prefetching - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | x3b5pjpttpwz74fpr5zw7avhjmiti3us5g57f2jizabrv23e57@lmo6yiuxnnjj Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
Hi, On 2025-08-12 18:53:13 +0200, Tomas Vondra wrote: > I'm running some tests looking for these weird changes, not just with > the patches, but on master too. And I don't think b4212231 changed the > situation very much. > > FWIW this issue is not caused by the index prefetching patches, I can > reproduce it with master (on b227b0bb4e032e19b3679bedac820eba3ac0d1cf > from yesterday). So maybe we should split this into a separate thread. > > Consider for example the dataset built by create.sql - it's randomly > generated, but the idea is that it's correlated, but not perfectly. The > table is ~3.7GB, and it's a cold run - caches dropped + restart). > > Anyway, a simple range query look like this: > > EXPLAIN (ANALYZE, COSTS OFF) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a ASC; > > QUERY PLAN > ------------------------------------------------------------------------ > Index Scan using idx on t > (actual time=0.584..433.208 rows=1048576.00 loops=1) > Index Cond: ((a >= 16336) AND (a <= 49103)) > Index Searches: 1 > Buffers: shared hit=7435 read=50872 > I/O Timings: shared read=332.270 > Planning: > Buffers: shared hit=78 read=23 > I/O Timings: shared read=2.254 > Planning Time: 3.364 ms > Execution Time: 463.516 ms > (10 rows) > > EXPLAIN (ANALYZE, COSTS OFF) > SELECT * FROM t WHERE a BETWEEN 16336 AND 49103 ORDER BY a DESC; > > QUERY PLAN > ------------------------------------------------------------------------ > Index Scan Backward using idx on t > (actual time=0.566..22002.780 rows=1048576.00 loops=1) > Index Cond: ((a >= 16336) AND (a <= 49103)) > Index Searches: 1 > Buffers: shared hit=36131 read=50872 > I/O Timings: shared read=21217.995 > Planning: > Buffers: shared hit=82 read=23 > I/O Timings: shared read=2.375 > Planning Time: 3.478 ms > Execution Time: 22231.755 ms > (10 rows) > > That's a pretty massive difference ... this is on my laptop, and the > timing changes quite a bit, but it's always a multiple of the first > query with forward scan. I suspect what you're mainly seeing here is that the OS can do readahead for us for forward scans, but not for backward scans. Indeed, if I look at iostat, the forward scan shows: Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme6n1 3352.00 400.89 0.00 0.00 0.18 122.47 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.62 47.90 whereas the backward scan shows: Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme6n1 10958.00 85.57 0.00 0.00 0.06 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.69 63.80 Note the different read sizes... > I did look into pg_aios, but there's only 8kB requests in both cases. I > didn't have time to look closer yet. That's what we'd expect, right? There's nothing on master that'd perform read combining for index scans... Greetings, Andres Freund
pgsql-hackers by date: