Re: pgstattuple: Use streaming read API in pgstatindex functions - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: pgstattuple: Use streaming read API in pgstatindex functions
Date
Msg-id CABPTF7V8kVJcEsYKA94h4xBQAvY6shKwW98c=Wa2CuRnNYU8ow@mail.gmail.com
Whole thread Raw
In response to pgstattuple: Use streaming read API in pgstatindex functions  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: pgstattuple: Use streaming read API in pgstatindex functions
List pgsql-hackers
Hi,

On Mon, Oct 13, 2025 at 10:07 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi hackers,
>
> While reading the code related to streaming reads and their current
> use cases, I noticed that pgstatindex could potentially benefit from
> adopting the streaming read API. The required change is relatively
> simple—similar to what has already been implemented in the pg_warm and
> pg_visibility extensions. I also ran some performance tests on an
> experimental patch to validate the improvement.
>
> Summary
> Cold cache performance (the typical use case for diagnostic tools):
>   - Medium indexes (~21MB): 1.21x - 1.79x faster (20-44% speedup)
>   - Large indexes (~214MB): 1.50x - 1.90x faster (30-47% speedup)
>   - Xlarge indexes (~1351MB):1.4x–1.9x gains. (29–47% speedup)
>
> Hardware: AX162-R from hetzner
>
> Test matrix:
>   - Index types: Primary key, timestamp, float, composite (3 columns)
>   - Sizes: Medium (1M rows, ~21MB), Large (10M rows, ~214MB), XLarge
> (50M rows, ~ 1351MB))
>   - Layouts: Unfragmented (sequential) and Fragmented (random insert order)
>   - Cache states: Cold (dropped OS cache) and Warm (pg_prewarm)
>
> Xlarge fragmented example:
> ==> Creating secondary indexes on test_xlarge
>      Created 3 secondary indexes: created_at, score, composite
>      Created test_xlarge_pkey: 1351 MB
>      Fragmentation stats (random insert order):
>  leaf_frag_pct | avg_density_pct | leaf_pages |  size
> ---------------+-----------------+------------+---------
>           49.9 |            71.5 |     172272 | 1351 MB
> (1 row)
>
> configuration:
>   - shared_buffers = 16GB
>   - effective_io_concurrency = 500
>   - io_combine_limit = 16
>   - autovacuum = off
>   - checkpoint_timeout = 1h
>   - bgwriter_delay = 10000ms (minimize background writes)
>   - jit = off
>   - max_parallel_workers_per_gather = 0
>
> Unfragmented Indexes (Cold Cache)
>
> Index Type    Size  Baseline  Patched  Speedup
> Primary Key    Medium  31.5 ms  19.6 ms  1.58×
> Primary Key    Large  184.0 ms  119.0 ms  1.54×
> Timestamp     Medium  13.4 ms  10.5 ms  1.28×
> Timestamp     Large  85.0 ms  45.6 ms  1.86×
> Float (score)   Medium  13.7 ms  11.4 ms  1.21×
> Float (score)   Large  84.0 ms  45.0 ms  1.86×
> Composite (3 col) Medium  26.7 ms  17.2 ms  1.56×
> Composite (3 col) Large  89.8 ms  51.2 ms  1.75×
>
> ⸻
>
> Fragmented Indexes (Cold Cache)
>
> To address concerns about filesystem fragmentation, I tested indexes built
> with random inserts (ORDER BY random()) to trigger page splits and create
> fragmented indexes:
>
> Index Type    Size  Baseline  Patched  Speedup
> Primary Key    Medium  41.9 ms  23.5 ms  1.79×
> Primary Key    Large  236.0 ms  148.0 ms  1.58×
> Primary Key    XLarge  953.4 ms  663.1 ms  1.43×
> Timestamp     Medium  32.1 ms  18.8 ms  1.70×
> Timestamp     Large  188.0 ms  117.0 ms  1.59×
> Timestamp     XLarge  493.0 ms  518.6 ms  0.95×
> Float (score)   Medium  14.0 ms  10.9 ms  1.28×
> Float (score)   Large  85.8 ms  45.2 ms  1.89×
> Float (score)   XLarge  263.2 ms  176.5 ms  1.49×
> Composite (3 col) Medium  42.3 ms  24.1 ms  1.75×
> Composite (3 col) Large  245.0 ms  162.0 ms  1.51×
> Composite (3 col) XLarge  1052.5 ms  716.5 ms  1.46×
>
> Summary: Fragmentation generally does not hurt streaming reads; most
> fragmented cases still see 1.4×–1.9× gains. One outlier (XLarge
> Timestamp) shows a slight regression (0.95×).
>
> ⸻
>
> Warm Cache Results
> When indexes are fully cached in shared_buffers:
> Unfragmented:  infrequent little regression for small to medium size
> index(single digit ms variance, barely noticeable); small gains for
> large size index
> Fragmented:   infrequent little regression for small to medium size
> index(single digit ms variance, barely noticeable); small gains for
> large size index
>

Fix indentation issue in v1.

Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Xuneng Zhou
Date:
Subject: pgstattuple: Use streaming read API in pgstatindex functions
Next
From: Chao Li
Date:
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward