On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote:
> Here’s v5 of the patchset. The wal_logging_large patch has been
> removed, as no performance gains were observed in the benchmark runs.
Looking at the numbers you are posting, it is harder to get excited
about the hash, gin, bloom_vacuum and wal_logging. The worker method
seems more efficient, may show that we are out of noise level.
The results associated to pgstattuple and the bloom scans are on a
different level for the three methods.
Saying that, it is really nice that you have sent the benchmark. The
measurement method looks in line with the goal here after review (IO
stats, calculations), and I have taken some time to run it to get an
idea of the difference for these five code paths, as of (slightly
edited the script for my own environment, result is the same):
./run_streaming_benchmark --baseline --io-method=io_uring/worker
I am not much interested in the sync case, so I have tested the two
other methods:
1) method=IO-uring
bloom_scan_large base= 725.3ms patch= 99.9ms 7.26x
( 86.2%) (reads=19676->1294, io_time=688.36->33.69ms)
bloom_vacuum_large base= 7414.9ms patch= 7455.2ms 0.99x
( -0.5%) (reads=48361->11597, io_time=459.02->257.51ms)
pgstattuple_large base= 12642.9ms patch= 11873.5ms 1.06x
( 6.1%) (reads=206945->12983, io_time=6516.70->143.46ms)
gin_vacuum_large base= 3546.8ms patch= 2317.9ms 1.53x
( 34.6%) (reads=20734->17735, io_time=3244.40->2021.53ms)
hash_vacuum_large base= 12268.5ms patch= 11751.1ms 1.04x
( 4.2%) (reads=76677->15606, io_time=1483.10->315.03ms)
wal_logging_large base= 33713.0ms patch= 32773.9ms 1.03x
( 2.8%) (reads=21641->21641, io_time=81.18->77.25ms)
2) method=worker io-workers=3
bloom_scan_large base= 725.0ms patch= 465.7ms 1.56x
( 35.8%) (reads=19676->1294, io_time=688.70->52.20ms)
bloom_vacuum_large base= 7138.3ms patch= 7156.0ms 1.00x
( -0.2%) (reads=48361->11597, io_time=284.56->64.37ms)
pgstattuple_large base= 12429.3ms patch= 11916.8ms 1.04x
( 4.1%) (reads=206945->12983, io_time=6501.91->32.24ms)
gin_vacuum_large base= 3769.4ms patch= 3716.7ms 1.01x
( 1.4%) (reads=20775->17684, io_time=3562.21->3528.14ms)
hash_vacuum_large base= 11750.1ms patch= 11289.0ms 1.04x
( 3.9%) (reads=76677->15606, io_time=1296.03->98.72ms)
wal_logging_large base= 32862.3ms patch= 33179.7ms 0.99x
( -1.0%) (reads=21641->21641, io_time=91.42->90.59ms)
The bloom scan case is a winner in runtime for both cases, and in
terms of stats we get much better numbers for all of them. These feel
rather in line with what you have, except for pgstattuple's runtime,
still its IO numbers feel good. That's just to say that I'll review
them and try to do something about at least some of the pieces for
this release.
--
Michael