Re: index prefetching - Mailing list pgsql-hackers

From Andres Freund
Subject Re: index prefetching
Date
Msg-id obwrjsclfm2dcymo6izs3dz55h364nifkcqof7hjw4fmw3kdqe@yddoyzbrizly
Whole thread Raw
In response to Re: index prefetching  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: index prefetching
Re: index prefetching
List pgsql-hackers
Hi,

On 2025-07-16 14:30:05 -0400, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 2:27 PM Andres Freund <andres@anarazel.de> wrote:
> > Could you share the current version of the complex patch (happy with a git
> > tree)? Afaict it hasn't been posted, which makes this pretty hard follow along
> > / provide feedback on, for others.
>
> Sure:
>
> https://github.com/petergeoghegan/postgres/tree/index-prefetch-2025-pg-revisions-v0.11
>
> I think that the version that Tomas must have used is a few days old,
> and might be a tiny bit different. But I don't think that that's
> likely to matter, especially not if you just want to get the general
> idea.

As a first thing I just wanted to get a feel for the improvements we can get.
I had a scale 5 tpch already loaded, so I ran a bogus query on that to see.

The improvement with either of the patchsets with a quick trial query is
rather impressive when using direct IO (presumably also with an empty cache,
but DIO is more predictable).

As Peter's branch doesn't seem to have an enable_* GUC, I used
SET effective_io_concurrency=0 to test the non-prefetching results (and
verified with master that the results are similar).

Test:

Peter's:

Without prefetching:

SET effective_io_concurrency=0;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem
ORDERBY l_shipdate LIMIT 10000;
 

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                     QUERY PLAN
                             │
 

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.611..957.874 rows=10000.00 loops=1)
                             │
 
│   Buffers: shared hit=1213 read=8626
                             │
 
│   I/O Timings: shared read=943.344
                             │
 
│   ->  Index Scan using i_l_shipdate on lineitem  (cost=0.44..6994824.33 rows=29999796 width=106) (actual
time=0.611..956.593rows=10000.00 loops=1) │
 
│         Index Searches: 1
                             │
 
│         Buffers: shared hit=1213 read=8626
                             │
 
│         I/O Timings: shared read=943.344
                             │
 
│ Planning Time: 0.083 ms
                             │
 
│ Execution Time: 958.508 ms
                             │
 

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)


With prefetching:

SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem
ORDERBY l_shipdate LIMIT 10000;
 

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                     QUERY PLAN
                            │
 

├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.497..67.737 rows=10000.00 loops=1)
                            │
 
│   Buffers: shared hit=1227 read=8667
                            │
 
│   I/O Timings: shared read=48.473
                            │
 
│   ->  Index Scan using i_l_shipdate on lineitem  (cost=0.44..6994824.33 rows=29999796 width=106) (actual
time=0.496..66.471rows=10000.00 loops=1) │
 
│         Index Searches: 1
                            │
 
│         Buffers: shared hit=1227 read=8667
                            │
 
│         I/O Timings: shared read=48.473
                            │
 
│ Planning Time: 0.090 ms
                            │
 
│ Execution Time: 68.965 ms
                            │
 

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Tomas':

With prefetching:

SET effective_io_concurrency=64;SELECT pg_buffercache_evict_relation('lineitem');EXPLAIN ANALYZE SELECT * FROM lineitem
ORDERBY l_shipdate LIMIT 10000;
 

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                     QUERY PLAN
                            │ 

├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.44..2332.06 rows=10000 width=106) (actual time=0.278..70.609 rows=10000.00 loops=1)
                            │
 
│   Buffers: shared hit=1227 read=8668
                            │
 
│   I/O Timings: shared read=52.578
                            │
 
│   ->  Index Scan using i_l_shipdate on lineitem  (cost=0.44..6994824.33 rows=29999796 width=106) (actual
time=0.277..69.304rows=10000.00 loops=1) │
 
│         Index Searches: 1
                            │
 
│         Buffers: shared hit=1227 read=8668
                            │
 
│         I/O Timings: shared read=52.578
                            │
 
│ Planning Time: 0.072 ms
                            │
 
│ Execution Time: 71.549 ms
                            │
 

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

The wins are similar without DIO and a cold OS cache, but i don't like
emptying out the entire OS cache all the time...


I call that a hell of an impressive improvement with either patch - it's
really really hard to find order of magnitude improvements in anything close
to realistic cases.

And that's on a local reasonably fast NVMe - with networked storage we'll see
much bigger wins.

This also doesn't just repro with toy queries, e.g. TPCH Q02 shows a 2X
improvement too (with either patch) - the only reason it's not bigger is that
all the remaining IO time is on the inner side of a nestloop that isn't
currently prefetchable.


Peter, it'd be rather useful if your patch also had an enable/disable GUC,
otherwise it's more work to study the performance effects. The
effective_io_concurrency approach isn't great, because it also affects
bitmap scans, seqscans etc.


Just playing around, there are many cases where there is effectively no
difference between the two approaches, from a runtime perspective.  There,
unsurprisingly, are some where the complex approach clearly wins, mostly
around IN(list-of-constants) so far.


Looking at the actual patches now.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix PQport to never return NULL if the connection is valid
Next
From: "DINESH NAIR"
Date:
Subject: Re: Composite types for updatable views