Re: index prefetching - Mailing list pgsql-hackers

From Alexandre Felipe
Subject Re: index prefetching
Date
Msg-id CAE8JnxOn4+xUAnce+M7LfZWOqfrMMxasMaEmSKwiKbQtZr65uA@mail.gmail.com
Whole thread
In response to Re: index prefetching  (Andres Freund <andres@anarazel.de>)
Responses Re: index prefetching
List pgsql-hackers


On Fri, Feb 27, 2026 at 4:18 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

I'm planning to do some reviewing in the next days. In preparation I just
retried a benchmark and saw some odd results.  After a while I was able to
reproduce even with a simpler setup:


> I'm planning to do some reviewing in the next days. In preparation I just

> retried a benchmark and saw some odd results.


Since we are talking about results I will share mine too :)


The bottomline is: Prefetch is working, but it might make some things slower.


It is obvious that this should better exploit IO for one single heavy query, in 

one single table.

It is not so obvious, to me, how this would behave when there are multiple 

concurrent queries. It is not so obvious how this will impact when multiple 

tables are queried at the same time. My feeling is that it should greatly 

improve on a disk with a mechanical head, if it performs the same reads 

reducing the number of times it has to jump from one to another. Is there much 

interest in special optimisations for those or is the focus more on SSDs?


On my previous review I wasted way to much time trying to improve read_stream, 

to end up getting just some mixed results. This time I tried to step back and 

try to look at various functions that could have changed. Initially I tried 

compiler function instrumentation, but then the profiling overhead of 33k 

functions dominated.


This time what I did (1) added a indexscan_prefetch_distance, maybe a better 

name would be just prefetch_distance, it limits the growth of distance in 

read_stream (distance-limit.diff). (2) captured execution statistics for 15 

functions (profiling-instrumentation.diff). At exit each process will create a 

log with its configuration and call statistics.


The benchmark was with full index scan on a sequential column, executed 

repeatedly and no cache eviction: buffer hit path.


BENCHMARK RESULTS


MacOS in normal (for me) use

Prefetch     Avg Time     Min Time     Max Time    

------------------------------------------------

off           6.03s        5.12s        11.70s

  1          59.44s       25.33s       257.60s

  4          19.74s       12.66s        44.36s

 16          11.87s        7.49s        19.13s

 64           8.77s        6.05s        13.97s

128           6.40s        4.33s        11.74s


MacOS idle, after reboot

Prefetch     Avg Time     Min Time     Max Time    

------------------------------------------------

off           2.17s        2.12s         2.26s

            5.53s        5.44s         5.57s

            3.17s        3.04s         3.39s

 16           3.13s        3.04s         3.29s

 64           2.82s        2.66s         2.88s

128           2.83s        2.69s         2.90s


Docker on MacOS, idle, after reboot

Prefetch     Avg Time     Min Time     Max Time    

------------------------------------------------

off          1.38s       1.36s       1.46s

  1          3.65s       3.56s       3.70s

  4          2.00s       1.98s       2.09s

 16          1.56s       1.53s       1.59s

 64          1.29s       1.25s       1.33s

128          1.28s       1.26s       1.32s


Docker on Linux

Prefetch     Avg Time     Min Time     Max Time    

------------------------------------------------

off           6.07s        5.92s        6.29s

            6.85s        6.67s        7.04s

            6.26s        6.10s        6.41s

 16           6.14s        5.95s        6.30s

 64           5.74s        5.62s        5.91s

128           5.72s        5.63s        5.86s


The linux execution presented very little degradation. On MacOS host the 

degradation was more noticeable than on MacOS docker running a debian, 

suggesting that software ecosystem contributes, docker on MacOS (arm), was 

slower than docker on a native linux (x86_64), here I could be it is CPU 

architecture or OS kernel differences.



WHAT CHANGED


The benchmark will produced, 195 autovac_worker, and 3293 backend and one 

bgworker log. For prefetch off the number of calls is constant. For prefetch on 

they vary widely, but I am looking at the total time per function, assuming 

that the differences in the number of calls changes only how the work was 

partitioned but the final work was the same.


With Docker version 28.3.0, build 38b7060, Python 3.10.18


$ docker compose up --build benchmark

$ docker cp docker-postgres-1:/tmp/profiling ./docker-profiling

$ python compare_profiles.py docker-profiling


Function                  off,d=0 on,d=128     Diff        % z-statistic

------------------------------------------------------------------------

read_stream_next_buffer       0.0   3944.9  +3944.9     NEW%  +654.88

read_stream_look_ahead        0.0   2999.3  +2999.3     NEW%  +624.00

WaitReadBuffers              98.3    754.6   +656.3  +667.7%  +414.56

_bt_next                    748.7   1072.8   +324.1   +43.3%   +20.35

heapam_batch_getnext        788.4   1114.6   +326.2   +41.4%   +20.18

btgetbatch                  777.0   1096.7   +319.7   +41.2%   +20.14

IndexNext                 17031.7  10400.3  -6631.5   -38.9%  -249.51

_bt_first                    17.2     13.0     -4.2   -24.6%    10.56


Function                  off,d=0   on,d=1     Diff        % z-statistic

------------------------------------------------------------------------

read_stream_look_ahead        0.0  28135.9 +28135.9      NEW    N/A

read_stream_next_buffer       0.0 199245.7 +199245.7     NEW    N/A

IndexNext                 17031.7 211861.0 +194829.2   12.4x  +283.00

WaitReadBuffers              98.3 169641.9 +169543.6   1724x  +275.63

heapam_index_fetch_tuple  13564.5 205828.2 +192263.7     15x  +172.56

heapam_batch_getnext        788.4   1944.0  +1155.6  +146.6%   +25.39

_bt_next                    748.7   1833.9  +1085.2  +144.9%   +24.85

btgetbatch                  777.0   1881.2  +1104.2  +142.1%   +24.74

_bt_first                    17.2     19.3     +2.1   +12.0%   +10.33

PS.: The docker environment cache eviction requires adjustments.


Attachment

pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: Partial Mode in Aggregate Functions
Next
From: Chao Li
Date:
Subject: launch_backend: Remove duplicate include under EXEC_BACKEND