Re: Streamify more code paths - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Streamify more code paths
Date
Msg-id CABPTF7XD51Qx2043p80npKmYEd67qMagK5AW=s6LNXyZt5s2nw@mail.gmail.com
Whole thread
In response to Re: Streamify more code paths  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Streamify more code paths
Re: Streamify more code paths
List pgsql-hackers
Hi Michael,

On Tue, Mar 10, 2026 at 6:28 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote:
> > Here’s v5 of the patchset. The wal_logging_large patch has been
> > removed, as no performance gains were observed in the benchmark runs.
>
> Looking at the numbers you are posting, it is harder to get excited
> about the hash, gin, bloom_vacuum and wal_logging.  The worker method
> seems more efficient, may show that we are out of noise level.
> The results associated to pgstattuple and the bloom scans are on a
> different level for the three methods.
>
> Saying that, it is really nice that you have sent the benchmark.  The
> measurement method looks in line with the goal here after review (IO
> stats, calculations), and I have taken some time to run it to get an
> idea of the difference for these five code paths, as of (slightly
> edited the script for my own environment, result is the same):
> ./run_streaming_benchmark --baseline --io-method=io_uring/worker
>
> I am not much interested in the sync case, so I have tested the two
> other methods:
>
> 1) method=IO-uring
> bloom_scan_large           base=   725.3ms  patch=    99.9ms   7.26x
> ( 86.2%)  (reads=19676->1294, io_time=688.36->33.69ms)
> bloom_vacuum_large         base=  7414.9ms  patch=  7455.2ms   0.99x
> ( -0.5%)  (reads=48361->11597, io_time=459.02->257.51ms)
> pgstattuple_large          base= 12642.9ms  patch= 11873.5ms   1.06x
> (  6.1%)  (reads=206945->12983, io_time=6516.70->143.46ms)
> gin_vacuum_large           base=  3546.8ms  patch=  2317.9ms   1.53x
> ( 34.6%)  (reads=20734->17735, io_time=3244.40->2021.53ms)
> hash_vacuum_large          base= 12268.5ms  patch= 11751.1ms   1.04x
> (  4.2%)  (reads=76677->15606, io_time=1483.10->315.03ms)
> wal_logging_large          base= 33713.0ms  patch= 32773.9ms   1.03x
> (  2.8%)  (reads=21641->21641, io_time=81.18->77.25ms)
>
> 2) method=worker io-workers=3
> bloom_scan_large           base=   725.0ms  patch=   465.7ms   1.56x
> ( 35.8%)  (reads=19676->1294, io_time=688.70->52.20ms)
> bloom_vacuum_large         base=  7138.3ms  patch=  7156.0ms   1.00x
> ( -0.2%)  (reads=48361->11597, io_time=284.56->64.37ms)
> pgstattuple_large          base= 12429.3ms  patch= 11916.8ms   1.04x
> (  4.1%)  (reads=206945->12983, io_time=6501.91->32.24ms)
> gin_vacuum_large           base=  3769.4ms  patch=  3716.7ms   1.01x
> (  1.4%)  (reads=20775->17684, io_time=3562.21->3528.14ms)
> hash_vacuum_large          base= 11750.1ms  patch= 11289.0ms   1.04x
> (  3.9%)  (reads=76677->15606, io_time=1296.03->98.72ms)
> wal_logging_large          base= 32862.3ms  patch= 33179.7ms   0.99x
> ( -1.0%)  (reads=21641->21641, io_time=91.42->90.59ms)
>
> The bloom scan case is a winner in runtime for both cases, and in
> terms of stats we get much better numbers for all of them.  These feel
> rather in line with what you have, except for pgstattuple's runtime,
> still its IO numbers feel good.

Thanks for running the benchmarks! The performance gains for hash,
gin, bloom_vacuum, and wal_logging is insignificant, likely because
these workloads are not I/O-bound. The default number of I/O workers
is three, which is fairly conservative. When I ran the benchmark
script with a higher number of I/O workers, some runs showed improved
performance.

> pgstattuple_large          base= 12429.3ms  patch= 11916.8ms   1.04x
> (  4.1%)  (reads=206945->12983, io_time=6501.91->32.24ms)

> pgstattuple_large          base= 12642.9ms  patch= 11873.5ms   1.06x
> (  6.1%)  (reads=206945->12983, io_time=6516.70->143.46ms)

Yeah, this looks somewhat strange. The io_time has been reduced
significantly, which should also lead to a substantial reduction in
runtime.

method=io_uring
pgstattuple_large          base=  5551.5ms  patch=  3498.2ms   1.59x
( 37.0%)  (reads=206945→12983, io_time=2323.49→207.14ms)

I ran the benchmark for this test again with io_uring, and the result
is consistent with previous runs. I’m not sure what might be
contributing to this behavior.

Another code path that showed significant performance improvement is
pgstatindex [1]. I've incorporated the test into the script too. Here
are the results from my testing:

method=worker io-workers=12
pgstatindex_large          base=   233.8ms  patch=    54.1ms   4.32x
( 76.8%)  (reads=27460→1757, io_time=213.94→6.31ms)

method=io_uring
pgstatindex_large          base=   224.2ms  patch=    56.4ms   3.98x
( 74.9%)  (reads=27460→1757, io_time=204.41→4.88ms)

>That's just to say that I'll review
> them and try to do something about at least some of the pieces for
> this release.

Thanks for that.

[1] https://www.postgresql.org/message-id/flat/CABPTF7UeN2o-trr9r7K76rZExnO2M4SLfvTfbUY2CwQjCekgnQ%40mail.gmail.com

--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Potential security risk associated with function call
Next
From: Robert Haas
Date:
Subject: Re: Potential security risk associated with function call