Thread: EXEC_BACKEND vs bgworkers without BGWORKER_SHMEM_ACCESS
Hi, While looking at a patch I noticed that SubPostmasterMain() for bgworkers unconditionally does /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */ InitProcess(); which presents a problem, because StartBackgroundWorker() then does /* * If we're not supposed to have shared memory access, then detach from * shared memory. If we didn't request shared memory access, the * postmaster won't force a cluster-wide restart if we exit unexpectedly, * so we'd better make sure that we don't mess anything up that would * require that sort of cleanup. */ if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0) { ShutdownLatchSupport(); dsm_detach_all(); PGSharedMemoryDetach(); } which presents a problem: We've initialized all kind of references to shared memory, own a PGPROC, but have detached from shared memory. In practice this will lead pretty quickly to a segfault, because process exit will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or logging dereferences MyProc, or ... It seems the above code block would need to at least do shmem_exit() before the PGSharedMemoryDetach()? This code has been introduced in commit 4d155d8b08fe08c1a1649fdbad61c6dcf4a8671f Author: Robert Haas <rhaas@postgresql.org> Date: 2014-05-07 14:54:43 -0400 Detach shared memory from bgworkers without shmem access. Since the postmaster won't perform a crash-and-restart sequence for background workers which don't request shared memory access, we'd better make sure that they can't corrupt shared memory. Patch by me, review by Tom Lane. but before that things were just slightly differently broken... I tested both the crash and that a shmem_exit() fixes it with an ugly hack in regress.c. I don't really know how to write a good testcase for this, given that the communication facilities of a unconnected bgworker are quite limited... I guess the bgworker could create files or something :(. Greetings, Andres Freund
Hi, On 2021-08-01 23:51:16 -0700, Andres Freund wrote: > In practice this will lead pretty quickly to a segfault, because process exit > will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or > logging dereferences MyProc, or ... > > It seems the above code block would need to at least do shmem_exit() before > the PGSharedMemoryDetach()? > I tested both the crash and that a shmem_exit() fixes it with an ugly hack in > regress.c. I don't really know how to write a good testcase for this, given > that the communication facilities of a unconnected bgworker are quite > limited... I guess the bgworker could create files or something :(. Hm. There may be another way: Because BackgroundWorkerEntry() does not actually need a lock, we could transport BackgroundWorkerData via backend_save_variables(). To make the connected case work, we'd of course continue to call CreateSharedMemoryAndSemaphores (and thus InitProcess()) for those. But that doesn't really seem better? Sure, it's nice to not need a pgproc entry unnecessarily, but it's not like unconnected bgworkers are commonly used and it does seem like it'd end up with more complicated code? Greetings, Andres Freund
On Mon, Aug 2, 2021 at 2:51 AM Andres Freund <andres@anarazel.de> wrote: > which presents a problem: We've initialized all kind of references to shared > memory, own a PGPROC, but have detached from shared memory. > > In practice this will lead pretty quickly to a segfault, because process exit > will run proc_exit callbacks, which in turn will try to do a ProcKill(). Or > logging dereferences MyProc, or ... > > It seems the above code block would need to at least do shmem_exit() before > the PGSharedMemoryDetach()? > > This code has been introduced in > > commit 4d155d8b08fe08c1a1649fdbad61c6dcf4a8671f > Author: Robert Haas <rhaas@postgresql.org> > Date: 2014-05-07 14:54:43 -0400 > > Detach shared memory from bgworkers without shmem access. > > Since the postmaster won't perform a crash-and-restart sequence > for background workers which don't request shared memory access, > we'd better make sure that they can't corrupt shared memory. > > Patch by me, review by Tom Lane. > > but before that things were just slightly differently broken... If you're saying that this code has been 100% broken for 7 years and nobody's noticed until now, then that suggests that nobody actually uses non-shmem-connected bgworkers. I sort of hate to give up on that concept but if we've really gone that many years without anyone noticing obvious breakage then maybe we should. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > If you're saying that this code has been 100% broken for 7 years and > nobody's noticed until now, then that suggests that nobody actually > uses non-shmem-connected bgworkers. I sort of hate to give up on that > concept but if we've really gone that many years without anyone > noticing obvious breakage then maybe we should. Well, the problem only exists on Windows so maybe this indeed escaped notice. Still, this is good evidence that the case isn't used *much*, and TBH I don't see many applications for it. I can't say I'm excited about putting effort into fixing it. regards, tom lane
Hi, On 2021-08-02 11:00:49 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > If you're saying that this code has been 100% broken for 7 years and > > nobody's noticed until now, then that suggests that nobody actually > > uses non-shmem-connected bgworkers. I sort of hate to give up on that > > concept but if we've really gone that many years without anyone > > noticing obvious breakage then maybe we should. > > Well, the problem only exists on Windows so maybe this indeed > escaped notice. Right. I did briefly look around and I didn't find bgworkers without shmem attachement... > Still, this is good evidence that the case isn't used *much*, and TBH > I don't see many applications for it. I can't say I'm excited about > putting effort into fixing it. Yea, I don't think it adds that much - without e.g. sharing a file descriptor with the unconnected bgworker one can't implement something like syslogger. Greetings, Andres Freund
On 2021-Aug-02, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > If you're saying that this code has been 100% broken for 7 years and > > nobody's noticed until now, then that suggests that nobody actually > > uses non-shmem-connected bgworkers. I sort of hate to give up on that > > concept but if we've really gone that many years without anyone > > noticing obvious breakage then maybe we should. > > Well, the problem only exists on Windows so maybe this indeed > escaped notice. Still, this is good evidence that the case isn't > used *much*, and TBH I don't see many applications for it. > I can't say I'm excited about putting effort into fixing it. When I included this case I was thinking in tasks which would just run stuff not directly connected to data. Something like a sub-daemon: say a connection pooler, which is a bgworker just so that it starts and stops together with postmaster, and share facilities like GUC configuration and SIGHUP handling, etc. It doesn't look like anybody has had an interest in developing such a thing, so if this is obstructing your work, I don't object to removing the no-shmem case. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
Hi, On Mon, Aug 2, 2021, at 11:08, Alvaro Herrera wrote: > On 2021-Aug-02, Tom Lane wrote: > > > Robert Haas <robertmhaas@gmail.com> writes: > > > If you're saying that this code has been 100% broken for 7 years and > > > nobody's noticed until now, then that suggests that nobody actually > > > uses non-shmem-connected bgworkers. I sort of hate to give up on that > > > concept but if we've really gone that many years without anyone > > > noticing obvious breakage then maybe we should. > > > > Well, the problem only exists on Windows so maybe this indeed > > escaped notice. Still, this is good evidence that the case isn't > > used *much*, and TBH I don't see many applications for it. > > I can't say I'm excited about putting effort into fixing it. > > When I included this case I was thinking in tasks which would just run > stuff not directly connected to data. Something like a sub-daemon: say > a connection pooler, which is a bgworker just so that it starts and > stops together with postmaster, and share facilities like GUC > configuration and SIGHUP handling, etc. I think nearly all such cases are going to want some monitoring from within the database - which then needs shared memory. And even if not - it's not really that useful to avoid a crash-restart if your worker dies with a segfault. There's no reallylegitimate reasons for completely unhandled errors even if not connected to shmem. > It doesn't look like anybody > has had an interest in developing such a thing, so if this is > obstructing your work, I don't object to removing the no-shmem case. It's not obstructing me right now. I noticed it'd crash on EXEC_BACKEND when I tried to understand some code (see the nearbythread about straightening out process startup). I do think there's some potential gains in simplicity and robustness that are made mildly harder by a subprocess that firstattaches and detaches from shm (it's the only case where we can't easily unify the place InitProcess() is called betweenEB and ! EB right now). There's several ways that could be tackled. Removing the need to have that if obviously oneof them. Regards, Andres
On 2021-Aug-02, Andres Freund wrote: > > When I included this case I was thinking in tasks which would just run > > stuff not directly connected to data. Something like a sub-daemon: say > > a connection pooler, which is a bgworker just so that it starts and > > stops together with postmaster, and share facilities like GUC > > configuration and SIGHUP handling, etc. > > I think nearly all such cases are going to want some monitoring from > within the database - which then needs shared memory. True. Observability for such things is critical (pgbouncer goes quite some trouble to offer SQL-queryable views into its metrics), which kills the argument. > I do think there's some potential gains in simplicity and robustness > that are made mildly harder by a subprocess that first attaches and > detaches from shm (it's the only case where we can't easily unify the > place InitProcess() is called between EB and ! EB right now). There's > several ways that could be tackled. Removing the need to have that if > obviously one of them. Hmm, I don't remember that an shmem-unconnected bgworker first connected to it and then let go. It seems weird to do it that way. My intention, as far as I recall, is that they would just never connect to shmem, period. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "I think my standards have lowered enough that now I think 'good design' is when the page doesn't irritate the living f*ck out of me." (JWZ)
Hi, On Mon, Aug 2, 2021, at 12:12, Alvaro Herrera wrote: > On 2021-Aug-02, Andres Freund wrote: > > I do think there's some potential gains in simplicity and robustness > > that are made mildly harder by a subprocess that first attaches and > > detaches from shm (it's the only case where we can't easily unify the > > place InitProcess() is called between EB and ! EB right now). There's > > several ways that could be tackled. Removing the need to have that if > > obviously one of them. > > Hmm, I don't remember that an shmem-unconnected bgworker first connected > to it and then let go. It seems weird to do it that way. My intention, > as far as I recall, is that they would just never connect to shmem, > period. They currently do for EXEC_BACKEND. See SubPostmasterMain(). There the definition of the worker is passed via shared memory.So it does the full reattach thing, which requires lwlock, which requires PGPROC. We could get away without that bypassing more through the variables file (either the worker definition or the address of the bgworker shmem piece). Greetings, Andres
On 2021-Aug-02, Andres Freund wrote: > On Mon, Aug 2, 2021, at 12:12, Alvaro Herrera wrote: > > Hmm, I don't remember that an shmem-unconnected bgworker first connected > > to it and then let go. It seems weird to do it that way. My intention, > > as far as I recall, is that they would just never connect to shmem, > > period. > > They currently do for EXEC_BACKEND. See SubPostmasterMain(). There the > definition of the worker is passed via shared memory. So it does the > full reattach thing, which requires lwlock, which requires PGPROC. We > could get away without that by passing more through the variables file > (either the worker definition or the address of the bgworker shmem > piece). Ah, that makes sense. That doesn't sound super fragile, but it is odd and it's probably a good argument for removing the feature, particularly since nobody seems to be using it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)
Hi, On 2021-08-02 15:34:07 -0400, Alvaro Herrera wrote: > Ah, that makes sense. That doesn't sound super fragile, but it is odd > and it's probably a good argument for removing the feature, particularly > since nobody seems to be using it. ISTM we concluded that we should remove unconnected workers. Writing a patch to do so left me with two questions: First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat it as a required flag going forward. That way we don't silently start being attached to shared memory in case somebody actually has a unattached worker. And if we ever wanted to add the ability to have unattached workers back, it'll also be easier this way. Perhaps it also has a small amount of signalling value reminding people that they need to be careful... The second question is what we want to do in the backbranches. I think the reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat > it as a required flag going forward. +1 > The second question is what we want to do in the backbranches. I think the > reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an > error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used. I think doing nothing is fine. Given the lack of complaints, we're more likely to break something than fix anything useful. regards, tom lane
Hi, On 2021-08-05 20:02:02 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > First, what do we want to do with BGWORKER_SHMEM_ACCESS? I'm inclined to treat > > it as a required flag going forward. > > +1 > > > The second question is what we want to do in the backbranches. I think the > > reasonable options are to do nothing, or to make !BGWORKER_SHMEM_ACCESS an > > error in SanityCheckBackgroundWorker() if EXEC_BACKEND is used. > > I think doing nothing is fine. Given the lack of complaints, we're > more likely to break something than fix anything useful. Done in the attached patch. I don't think we need to add more to the docs than the flag being required? Greetings, Andres Freund
Attachment
On Thu, Aug 5, 2021 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think doing nothing is fine. Given the lack of complaints, we're > more likely to break something than fix anything useful. +1. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Aug 09, 2021 at 11:07:14AM -0400, Robert Haas wrote: > On Thu, Aug 5, 2021 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think doing nothing is fine. Given the lack of complaints, we're >> more likely to break something than fix anything useful. > > +1. FWIW, the only interesting case I have in my plugin box for a background worker that does not attach to shared memory is a template of worker able to catch signals, to be used as a base for simple actions. So that's not really interesting. Making the SHMEM flag be something mandatory on HEAD while doing nothing in the back-branches sounds good to me, so +1. -- Michael
Attachment
On 2021-08-05 19:56:49 -0700, Andres Freund wrote: > Done in the attached patch. I don't think we need to add more to the docs than > the flag being required? Pushed that patch now. If we want further additions to the docs we can do so separately.