Re: Streamify more code paths - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Streamify more code paths
Date
Msg-id CABPTF7U4CvFp55wVg51b8b24J1GQfjB62SzvfXY0ZgjaRWPYfA@mail.gmail.com
Whole thread
In response to Re: Streamify more code paths  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Streamify more code paths
List pgsql-hackers
On Sat, Mar 14, 2026 at 5:56 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 13, 2026 at 10:39:52AM +0800, Xuneng Zhou wrote:
> > Thanks for fixing this and for taking the time to review and test
> > the patches.
>
> Looking at the rest, I have produced some numbers:
> pgstattuple_small (20k tuples, io_uring) base= 60839.9ms
> patch=10949.9ms 5.56x ( 82.0%) (reads=4139->260,
> io_time=49616.97->55.25ms)
> pgstattuple_small (20k tuples, worker=3) base= 60577.5ms
> patch=11470.0ms 5.28x ( 81.1%) (reads=4139->260,
> io_time=49359.79->69.60ms)
> hash_vacuum (1M tuples, io_uring)  base=199929.0ms patch=161747.0ms
> 1.24x ( 19.1%) (reads=4665->1615, io_time=47084.8->9925.77ms)
> hash_vacuum (1M tuples, worker=12) base=203417.0ms patch=161687.0ms
> 1.26x ( 20.5%) (reads=4665->1615, io_time=48356.3->9917.24ms)
>
> The hash vacuum numbers are less amazing here than yours.  Trying out
> various configurations does not change the results much (I was puzzled
> for a couple of hours that I did not see any performance impact but
> forgot the eviction of the index pages from the shared buffers, that
> influences the numbers to what I have here), but I'll take it anyway.

My guess is that the results are influenced by the write delay. Vacuum
operations can be write-intensive, so when both read and write delays
are set to 2 ~ 5 ms, a large portion of the runtime may be spent on
writes. According to Amdahl’s Law, the overall performance improvement
from optimizing a single component is limited by the fraction of time
that component actually contributes to the total execution time. In
this case, the potential rate of speedup from streaming the read path
could be masked by the time spent performing writes.

To investigate this, I added a new option, write-delay. When it is set
to zero, the benchmark simulates a system with a fast write device and
a slow read device, reducing the proportion of time spent on writes.
Admittedly, this setup is somewhat artificial—we would not normally
expect such a large discrepancy between read and write performance in
real systems.

-- worker 12, write-delay 2 ms
hash_vacuum_medium         base= 33743.2ms  patch= 27371.3ms   1.23x
( 18.9%)  (reads=4662→1612, read_time=8242.51→1725.03ms,
writes=12689→12651, write_time=25144.87→25041.75ms)

-- worker 12, write-delay 0 ms
hash_vacuum_medium         base=  8601.1ms  patch=  2234.0ms   3.85x
( 74.0%)  (reads=4662→1612, read_time=8021.65→1637.87ms,
writes=12689→12651, write_time=337.38→288.15ms)

To better understand the behavior, the latest version of the script
separates the I/O time into read time and write time. This allows us
to directly observe their respective contributions and how they change
across runs. A further improvement would be to report the speedup for
the read and write components separately, making it easier to
understand where and how much the performance gains actually occur.

> One thing that I was wondering for the pgstattuple patch is if we
> should have "scanned" put outside the private data of the callback as
> we get back to the main loop once we know that the page is not
> all-visible, so we could increment the counter in the main loop
> instead of the callback.  Now I get that you have done that as it
> feels cleaner for the "default" return path of the callback, while the
> logic remains the same, so I have kept it as-is at the end, tweaked a
> few things, and applied this one.

Thanks for the review and for applying it. My reasoning for putting
scanned inside the callback was to keep all per-block accounting in
one place — the callback is already the point where the skip-vs-read
decision is made, so it seemed natural to count reads there as well.
But I agree the main loop would also be a clean spot for it.

> I have not been able to review yet the patch for the hash VACUUM
> proposal, which would be the last one.
> --
> Michael



--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Alexandre Felipe
Date:
Subject: Re: Addressing buffer private reference count scalability issue
Next
From: Xuneng Zhou
Date:
Subject: Re: Streamify more code paths