Re: index prefetching - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: index prefetching
Date
Msg-id 51b5f71b-5f19-4453-91ff-2b9f2a840c58@vondra.me
Whole thread Raw
In response to Re: index prefetching  ("Peter Geoghegan" <pg@bowt.ie>)
Responses Re: index prefetching
List pgsql-hackers
On 8/12/25 23:22, Peter Geoghegan wrote:
> ...
>
> It looks like the patch does significantly better with the forwards scan,
> compared to the backwards scan (though both are improved by a lot).  But that's
> not the main thing about these results that I find interesting.
> 
> The really odd thing is that we get "shared hit=6619 read=49933" for the
> forwards scan, and "shared hit=10350 read=49933" for the backwards scan.  The
> latter matches master (regardless of the scan direction used on master), while
> the former just looks wrong.  What explains the "missing buffer hits" seen with
> the forwards scan?
> 
> Discrepancies
> -------------
> 
> All 4 query executions agree that "rows=1048576.00", so the patch doesn't appear
> to simply be broken/giving wrong answers.  Might it be that the "Buffers"
> instrumentation is broken?
> 

I think a bug in the prefetch patch is more likely. I tried with a patch
that adds various prefetch-related counters to explain, and I see this:


test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a
BETWEEN 16336 AND 49103 ORDER BY a;

                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan using idx on public.t (actual time=0.682..527.055
rows=1048576.00 loops=1)
   Output: a, b
   Index Cond: ((t.a >= 16336) AND (t.a <= 49103))
   Index Searches: 1
   Prefetch Distance: 271.263
   Prefetch Count: 60888
   Prefetch Stalls: 1
   Prefetch Skips: 991211
   Prefetch Resets: 3
   Prefetch Histogram: [2,4) => 2, [4,8) => 8, [8,16) => 17, [16,32) =>
24, [32,64) => 34, [64,128) => 52, [128,256) => 82, [256,512) => 60669
   Buffers: shared hit=5027 read=50872
   I/O Timings: shared read=33.528
 Planning:
   Buffers: shared hit=78 read=23
   I/O Timings: shared read=2.349
 Planning Time: 3.686 ms
 Execution Time: 559.659 ms
(17 rows)


test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a
BETWEEN 16336 AND 49103 ORDER BY a DESC;
                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan Backward using idx on public.t (actual time=1.110..4116.201
rows=1048576.00 loops=1)
   Output: a, b
   Index Cond: ((t.a >= 16336) AND (t.a <= 49103))
   Index Searches: 1
   Prefetch Distance: 271.061
   Prefetch Count: 118806
   Prefetch Stalls: 1
   Prefetch Skips: 962515
   Prefetch Resets: 3
   Prefetch Histogram: [2,4) => 2, [4,8) => 7, [8,16) => 12, [16,32) =>
17, [32,64) => 24, [64,128) => 3, [128,256) => 4, [256,512) => 118737
   Buffers: shared hit=30024 read=50872
   I/O Timings: shared read=581.353
 Planning:
   Buffers: shared hit=82 read=23
   I/O Timings: shared read=3.168
 Planning Time: 4.289 ms
 Execution Time: 4185.407 ms
(17 rows)

These two parts are interesting:

   Prefetch Count: 60888
   Prefetch Skips: 991211

   Prefetch Count: 118806
   Prefetch Skips: 962515

It looks like the backwards scan skips fewer blocks. This is based on
the lastBlock optimization, i.e. looking for runs of the same block
number. I don't quite see why would it affect just the backwards scan,
though. Seems weird.

> The premise of my original complaint was that big inconsistencies in performance
> shouldn't happen between similar forwards and backwards scans (at least not with
> direct I/O).  I now have serious doubts about that premise, since it looks like
> OS readahead remains a big factor with direct I/O.  Did I just miss something
> obvious?
> 

I don't think you missed anything. It does seem the assumption relies on
the OS handling the underlying I/O patterns equally, and unfortunately
that does not seem to be the case. Maybe we could "invert" the data set,
i.e. make it "descending" instead of "ascending"? That would make the
heap access direction "forward" again ...

regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Greg Burd
Date:
Subject: Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events