Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: index prefetching
Date	February 15 10:57:19
Msg-id	9411f220-007d-4f1e-9c8f-ca8eb09e6788@vondra.me Whole thread Raw
In response to	Re: index prefetching (Alexandre Felipe <o.alexandre.felipe@gmail.com>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

Hi,

On 2/15/26 01:13, Alexandre Felipe wrote:
> Hi,
> I decided to test this PR.
> 
> I didn't take much time to go through the thread or the code in detail
> yet. But I have my first benchmark results and I would like to share.
> 

I'm quite confused by the scripts you shared, it seems incomplete. The
run_regression.py is meant to call purge_cache.sh (which is missing),
and the run_benchmark tries to call all sorts of missing .sql scripts.

So how do we use that?

> EXPERIMENT
> 
> I tested [CF 4351] v10 - Index Prefetching
> 
> I created a table with 100k rows and
> Sequential is, as guessed, 1,2,3,4 (indexed)
> Periodic is a quasi random (i * jump) % num_rows, where gcd(jump,
> num_rows) = 1, guarantee that there are no repeated entries (indexed)
> Random is a `row_number() over (order by random())` (indexed)
> The payload is a fixed 200 character long string, just to make it more
> realistic.
> 
> For the tests, I disable sorting, sequential scans, index only scans and
> bitmap scans.
> Since buffer cache always has a significant impact on the query
> performance, I shuffled the tests, and tried to adjust for the number of
> buffer hit/read, but later I found that the best way to control that was
> to use a table small enough to be entirely held in cache, and evict the
> buffers.
> 

That seems a bit bizarre. The whole point of index prefetching is better
I/O scheduling (ahead of time), but if you "control" the impact of cache
by making sure everything is cached, that kinda defeats the whole thing.

A table that is just 24MB and fits into buffers is a bit useless. It
means that even with random pattern (which is generally about the best
for prefetching), only about ~1/30 of pages will require I/O. Each page
has ~32 items, but only the first item from each page will incur an I/O.

> * off: buffers are kept in cache
> * pg: buffers evicted from postgres pg_buffercache_evict from
> pg_buffercache extension.
> * os: supported only in  python, I separated the buffer eviction in
> purge_cache as it requires sudo (tested only in MacOS).
> 
> I varied 
>  * max_parallel_workers_per_gather (although I guess it wasn't exploited), 
>  * enable_index_prefetch 
>  * the column used as sorting key and, as a result, the index used.
>  * and buffer eviction mode.
> 
> Running from python with psycopg
> 

On what kind of hardware? How much variance is in the results?

regards

-- 
Tomas Vondra

pgsql-hackers by date:

From: Marcos Magueta
Date: 15 February, 07:17:20
Subject: CREATE ASSERTION: database level assertions feature

From: Alexandre Felipe
Date: 15 February, 11:00:23
Subject: Re: index prefetching

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next