Re: Streamify more code paths - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Streamify more code paths |
| Date | |
| Msg-id | CABPTF7U4CvFp55wVg51b8b24J1GQfjB62SzvfXY0ZgjaRWPYfA@mail.gmail.com Whole thread |
| In response to | Re: Streamify more code paths (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: Streamify more code paths
|
| List | pgsql-hackers |
On Sat, Mar 14, 2026 at 5:56 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Mar 13, 2026 at 10:39:52AM +0800, Xuneng Zhou wrote: > > Thanks for fixing this and for taking the time to review and test > > the patches. > > Looking at the rest, I have produced some numbers: > pgstattuple_small (20k tuples, io_uring) base= 60839.9ms > patch=10949.9ms 5.56x ( 82.0%) (reads=4139->260, > io_time=49616.97->55.25ms) > pgstattuple_small (20k tuples, worker=3) base= 60577.5ms > patch=11470.0ms 5.28x ( 81.1%) (reads=4139->260, > io_time=49359.79->69.60ms) > hash_vacuum (1M tuples, io_uring) base=199929.0ms patch=161747.0ms > 1.24x ( 19.1%) (reads=4665->1615, io_time=47084.8->9925.77ms) > hash_vacuum (1M tuples, worker=12) base=203417.0ms patch=161687.0ms > 1.26x ( 20.5%) (reads=4665->1615, io_time=48356.3->9917.24ms) > > The hash vacuum numbers are less amazing here than yours. Trying out > various configurations does not change the results much (I was puzzled > for a couple of hours that I did not see any performance impact but > forgot the eviction of the index pages from the shared buffers, that > influences the numbers to what I have here), but I'll take it anyway. My guess is that the results are influenced by the write delay. Vacuum operations can be write-intensive, so when both read and write delays are set to 2 ~ 5 ms, a large portion of the runtime may be spent on writes. According to Amdahl’s Law, the overall performance improvement from optimizing a single component is limited by the fraction of time that component actually contributes to the total execution time. In this case, the potential rate of speedup from streaming the read path could be masked by the time spent performing writes. To investigate this, I added a new option, write-delay. When it is set to zero, the benchmark simulates a system with a fast write device and a slow read device, reducing the proportion of time spent on writes. Admittedly, this setup is somewhat artificial—we would not normally expect such a large discrepancy between read and write performance in real systems. -- worker 12, write-delay 2 ms hash_vacuum_medium base= 33743.2ms patch= 27371.3ms 1.23x ( 18.9%) (reads=4662→1612, read_time=8242.51→1725.03ms, writes=12689→12651, write_time=25144.87→25041.75ms) -- worker 12, write-delay 0 ms hash_vacuum_medium base= 8601.1ms patch= 2234.0ms 3.85x ( 74.0%) (reads=4662→1612, read_time=8021.65→1637.87ms, writes=12689→12651, write_time=337.38→288.15ms) To better understand the behavior, the latest version of the script separates the I/O time into read time and write time. This allows us to directly observe their respective contributions and how they change across runs. A further improvement would be to report the speedup for the read and write components separately, making it easier to understand where and how much the performance gains actually occur. > One thing that I was wondering for the pgstattuple patch is if we > should have "scanned" put outside the private data of the callback as > we get back to the main loop once we know that the page is not > all-visible, so we could increment the counter in the main loop > instead of the callback. Now I get that you have done that as it > feels cleaner for the "default" return path of the callback, while the > logic remains the same, so I have kept it as-is at the end, tweaked a > few things, and applied this one. Thanks for the review and for applying it. My reasoning for putting scanned inside the callback was to keep all per-block accounting in one place — the callback is already the point where the skip-vs-read decision is made, so it seemed natural to count reads there as well. But I agree the main loop would also be a clean spot for it. > I have not been able to review yet the patch for the hash VACUUM > proposal, which would be the last one. > -- > Michael -- Best, Xuneng
Attachment
pgsql-hackers by date: