Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | w6uiicyou7hzq47mbyejubtcyb2rngkkf45fk4q7inue5kfbeo@bbfad3qyubvs Whole thread Raw |
In response to | Re: AIO v2.5 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-03-11 20:57:43 -0700, Noah Misch wrote: > > I think we'll really need to do something about this for BSD users regardless > > of AIO. Or maybe those OSs should fix something, but somehow I am not having > > high hopes for an OS that claims to have POSIX confirming unnamed semaphores > > due to having a syscall that always returns EPERM... [1]. > > I won't mind a project making things better for non-root BSD users. I do > think such a project should not block other projects making things better for > everything else (like $SUBJECT). Oh, I strongly agree. The main reason I would like it to be addressed that I'm pretty tired of having to think about open/netbsd whenever we update some default setting. > > > On Mon, Mar 10, 2025 at 02:23:12PM -0400, Andres Freund wrote: > > > > Attached is v2.6 of the AIO patchset. > > > > > > > - 0005, 0006 - io_uring support - close, but we need to do something about > > > > set_max_fds(), which errors out spuriously in some cases > > > > > > What do we know about those cases? I don't see a set_max_fds(); is that > > > set_max_safe_fds(), or something else? > > > > Sorry, yes, set_max_safe_fds(). The problem basically is that with io_uring we > > will have a large number of FDs already allocated by the time > > set_max_safe_fds() is called. set_max_safe_fds() subtracts already_open from > > max_files_per_process allowing few, and even negative, IOs. > > > > I think we should redefine max_files_per_process to be about the number of > > files each *backend* will additionally open. Jelte was working on related > > patches, see [2] > > Got it. max_files_per_process is a quaint setting, documented as follows (I > needed the reminder): > > If the kernel is enforcing > a safe per-process limit, you don't need to worry about this setting. > But on some platforms (notably, most BSD systems), the kernel will > allow individual processes to open many more files than the system > can actually support if many processes all try to open > that many files. If you find yourself seeing <quote>Too many open > files</quote> failures, try reducing this setting. > > I could live with > v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but would lean > against it since it feels unduly novel to have a setting where we use the > postgresql.conf value to calculate a value that becomes the new SHOW-value of > the same setting. I think we may update some other GUCs, but not sure. > Options I'd consider before that: > - Like you say, "redefine max_files_per_process to be about the number of > files each *backend* will additionally open". It will become normal that > each backend's actual FD list length is max_files_per_process + MaxBackends > if io_method=io_uring. Outcome is not unlike > v6-0002-Bump-postmaster-soft-open-file-limit-RLIMIT_NOFIL.patch + > v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but we don't > mutate max_files_per_process. Benchmark results should not change beyond > the inter-major-version noise level unless one sets io_method=io_uring. I'm > feeling best about this one, but I've not been thinking about it long. Yea, I think that's something probably worth doing separately from Jelte's patch. I do think that it'd be rather helpful to have jelte's patch to increase NOFILE in addition though. > > > > +static void > > > > +maybe_adjust_io_workers(void) > > > > > > This also restarts workers that exit, so perhaps name it > > > start_io_workers_if_missing(). > > > > But it also stops IO workers if necessary? > > Good point. Maybe just add a comment like "start or stop IO workers to close > the gap between the running count and the configured count intent". It's now /* * Start or stop IO workers, to close the gap between the number of running * workers and the number of configured workers. Used to respond to change of * the io_workers GUC (by increasing and decreasing the number of workers), as * well as workers terminating in response to errors (by starting * "replacement" workers). */ > > > > +{ > > > ... > > > > + /* Try to launch one. */ > > > > + child = StartChildProcess(B_IO_WORKER); > > > > + if (child != NULL) > > > > + { > > > > + io_worker_children[id] = child; > > > > + ++io_worker_count; > > > > + } > > > > + else > > > > + break; /* XXX try again soon? */ > > > > > > Can LaunchMissingBackgroundProcesses() become the sole caller of this > > > function, replacing the current mix of callers? That would be more conducive > > > to promptly doing the right thing after launch failure. > > > > I'm not sure that'd be a good idea - right now IO workers are started before > > the startup process, as the startup process might need to perform IO. If we > > started it only later in ServerLoop() we'd potentially do a fair bit of work, > > including starting checkpointer, bgwriter, bgworkers before we started IO > > workers. That shouldn't actively break anything, but it would likely make > > things slower. > > I missed that. How about keeping the two calls associated with PM_STARTUP but > replacing the assign_io_workers() and process_pm_child_exit() calls with one > in LaunchMissingBackgroundProcesses()? I think replacing the call in assign_io_workers() is a good idea, that way we don't need assign_io_workers(). Less convinced it's a good idea to do the same for process_pm_child_exit() - if IO workers errored out we'll launch backends etc before we get to LaunchMissingBackgroundProcesses(). That's not a fundamental problem, but seems a bit odd. I think LaunchMissingBackgroundProcesses() should be split into one that starts aux processes and one that starts bgworkers. The one maintaining aux processes should be called before we start backends, the latter not. > In the event of a launch failure, I think that would retry the launch > quickly, as opposed to maybe-never. That's a fair point. > > > > + /* > > > > + * It's very unlikely, but possible, that reopen fails. E.g. due > > > > + * to memory allocations failing or file permissions changing or > > > > + * such. In that case we need to fail the IO. > > > > + * > > > > + * There's not really a good errno we can report here. > > > > + */ > > > > + error_errno = ENOENT; > > > > > > Agreed there's not a good errno, but let's use a fake errno that we're mighty > > > unlikely to confuse with an actual case of libc returning that errno. Like > > > one of EBADF or EOWNERDEAD. > > > > Can we rely on that to be present on all platforms, including windows? > > I expect EBADF is universal. EBADF would be fine. Hm, that's actually an error that could happen for other reasons, and IMO would be more confusing than ENOENT. The latter describes the issue to a reasonable extent, EBADFD seems like it would be more confusing. I'm not sure it's worth investing time in this - it really shouldn't happen, and we probably have bigger problems than the error code if it does. But if we do want to do something, I think I can see a way to report a dedicated error message for this. > EOWNERDEAD is from 2006. > https://learn.microsoft.com/en-us/cpp/c-runtime-library/errno-constants?view=msvc-140 > says VS2015 had EOWNERDEAD (the page doesn't have links for older Visual > Studio versions, so I consider them unknown). Oh, that's a larger list than I'd have though. > https://github.com/coreutils/gnulib/blob/master/doc/posix-headers/errno.texi > lists some OSs not having it, the newest of which looks like NetBSD 9.3 > (2022). We could use it and add a #define for platforms lacking it. What would we define it as? I guess we could just pick a high value, but... Greetings, Andres Freund
pgsql-hackers by date: