Re: Should io_method=worker remain the default? - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Should io_method=worker remain the default? |
Date | |
Msg-id | f34fb0bbacc1eb25a04e946880251cff87ab6291.camel@j-davis.com Whole thread Raw |
In response to | Re: Should io_method=worker remain the default? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Should io_method=worker remain the default?
Re: Should io_method=worker remain the default? |
List | pgsql-hackers |
On Wed, 2025-09-03 at 11:55 -0400, Andres Freund wrote: > 32 parallel seq scans of a large relations, with default shared > buffers, fully > cached in the OS page cache, seems like a pretty absurd workload. It's the default settings, and users often just keep going with the defaults as long as it works, giving little thought to any kind of tuning or optimization until they hit a wall. Fully cached data is common, as are scan-heavy workloads. Calling it "absurd" is an exaggeration. > That's not > to say we shouldn't spend some effort to avoid regressions for it, > but it also > doesn't seem to be worth focusing all that much on it. Fair, but we should acknowledge the places where the new defaults do better vs worse, and provide some guidance on what to look for and how to tune it. We should also not be in too much of a rush to get rid of "sync" mode until we have a better idea about where the tradeoffs are. > Or is there a > real-world scenario this actually emulating? This test was my first try at reproducing a smaller (but still noticeable) regression seen on a more realistic benchmark. I'm not 100% sure whether I reproduced the same effect or a different one, but I don't think we should dismiss it so quickly. > *If* we actually care about this workload, we can make > pgaio_worker_submit_internal() acquire that lock conditionally, and > perform > the IOs synchronously instead. I like the idea of some kind of fallback for multiple reasons. I noticed that if I set io_workers=1, and then I SIGSTOP that worker, then sequential scans make no progress at all until I send SIGCONT. A fallback to synchronous sounds more robust, and more similar to what we do with walwriter and bgwriter. (That may be 19 material, though.) > But I'm really not sure doing > 30GB/s of repeated reads from the > page cache > is a particularly useful thing to optimize. A long time ago, the expectation was that Postgres might be running on a machine along with other software, and perhaps many instances of Postgres on the same machine. In that case, low shared_buffers compared with the overall system memory makes sense, which would cause a lot of back-and-forth into shared buffers. That was also the era of magnetic disks, where such memory copies seemed almost free by comparison -- perhaps we just don't care about that case any more? > If I instead just increase s_b, I get 2x the throughput... Increase to what? I tried a number of settings. Obviously >32GB makes it a non-issue because everything is cached. Values between 128MB and 32GB didn't seem to help, and were in some cases lower, but I didn't look into why yet. It might have something to do with crowding out the page cache. Regards, Jeff Davis
pgsql-hackers by date: