Hi,
On 2/15/26 01:13, Alexandre Felipe wrote:
> Hi,
> I decided to test this PR.
>
> I didn't take much time to go through the thread or the code in detail
> yet. But I have my first benchmark results and I would like to share.
>
I'm quite confused by the scripts you shared, it seems incomplete. The
run_regression.py is meant to call purge_cache.sh (which is missing),
and the run_benchmark tries to call all sorts of missing .sql scripts.
So how do we use that?
> EXPERIMENT
>
> I tested [CF 4351] v10 - Index Prefetching
>
> I created a table with 100k rows and
> Sequential is, as guessed, 1,2,3,4 (indexed)
> Periodic is a quasi random (i * jump) % num_rows, where gcd(jump,
> num_rows) = 1, guarantee that there are no repeated entries (indexed)
> Random is a `row_number() over (order by random())` (indexed)
> The payload is a fixed 200 character long string, just to make it more
> realistic.
>
> For the tests, I disable sorting, sequential scans, index only scans and
> bitmap scans.
> Since buffer cache always has a significant impact on the query
> performance, I shuffled the tests, and tried to adjust for the number of
> buffer hit/read, but later I found that the best way to control that was
> to use a table small enough to be entirely held in cache, and evict the
> buffers.
>
That seems a bit bizarre. The whole point of index prefetching is better
I/O scheduling (ahead of time), but if you "control" the impact of cache
by making sure everything is cached, that kinda defeats the whole thing.
A table that is just 24MB and fits into buffers is a bit useless. It
means that even with random pattern (which is generally about the best
for prefetching), only about ~1/30 of pages will require I/O. Each page
has ~32 items, but only the first item from each page will incur an I/O.
> * off: buffers are kept in cache
> * pg: buffers evicted from postgres pg_buffercache_evict from
> pg_buffercache extension.
> * os: supported only in python, I separated the buffer eviction in
> purge_cache as it requires sudo (tested only in MacOS).
>
> I varied
> * max_parallel_workers_per_gather (although I guess it wasn't exploited),
> * enable_index_prefetch
> * the column used as sorting key and, as a result, the index used.
> * and buffer eviction mode.
>
> Running from python with psycopg
>
On what kind of hardware? How much variance is in the results?
regards
--
Tomas Vondra