Thread: putting a bgworker to rest
Hi all, I noticed the need to simply stop a bgworker after its work is done but still have it restart in unusual circumstances like a crash. Obviously I can just have it enter a loop where it checks its latch and such, but that seems a bit pointless. Would it make sense to add an extra return value or such for that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > Hi all, > > I noticed the need to simply stop a bgworker after its work is done but > still have it restart in unusual circumstances like a crash. > Obviously I can just have it enter a loop where it checks its latch and > such, but that seems a bit pointless. > > Would it make sense to add an extra return value or such for that? KaiGai also requested some more flexibility in the stop timing and shutdown sequence. I understand the current design that workers are always on can be a bit annoying. How would postmaster know when to restart a worker that stopped? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi all, > > > > I noticed the need to simply stop a bgworker after its work is done but > > still have it restart in unusual circumstances like a crash. > > Obviously I can just have it enter a loop where it checks its latch and > > such, but that seems a bit pointless. > > > > Would it make sense to add an extra return value or such for that? > > KaiGai also requested some more flexibility in the stop timing and > shutdown sequence. I understand the current design that workers are > always on can be a bit annoying. > > How would postmaster know when to restart a worker that stopped? I had imagined we would assign some return codes special meaning. Currently 0 basically means "restart immediately", 1 means "crashed, wait for some time", everything else results in a postmaster restart. It seems we can just assign returncode 2 as "done", probably with some enum or such hiding the numbers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: > > Andres Freund wrote: > > > Hi all, > > > > > > I noticed the need to simply stop a bgworker after its work is done but > > > still have it restart in unusual circumstances like a crash. > > > Obviously I can just have it enter a loop where it checks its latch and > > > such, but that seems a bit pointless. > > > > > > Would it make sense to add an extra return value or such for that? > > > > KaiGai also requested some more flexibility in the stop timing and > > shutdown sequence. I understand the current design that workers are > > always on can be a bit annoying. > > > > How would postmaster know when to restart a worker that stopped? > > I had imagined we would assign some return codes special > meaning. Currently 0 basically means "restart immediately", 1 means > "crashed, wait for some time", everything else results in a postmaster > restart. It seems we can just assign returncode 2 as "done", probably > with some enum or such hiding the numbers. So a "done" worker would never be restarted, until postmaster sees a crash or is itself restarted? I guess that'd be useful for workers running during recovery, which terminate when recovery completes. Is that your use case? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-04-23 14:11:26 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote: > > > Andres Freund wrote: > > > > Hi all, > > > > > > > > I noticed the need to simply stop a bgworker after its work is done but > > > > still have it restart in unusual circumstances like a crash. > > > > Obviously I can just have it enter a loop where it checks its latch and > > > > such, but that seems a bit pointless. > > > > > > > > Would it make sense to add an extra return value or such for that? > > > > > > KaiGai also requested some more flexibility in the stop timing and > > > shutdown sequence. I understand the current design that workers are > > > always on can be a bit annoying. > > > > > > How would postmaster know when to restart a worker that stopped? > > > > I had imagined we would assign some return codes special > > meaning. Currently 0 basically means "restart immediately", 1 means > > "crashed, wait for some time", everything else results in a postmaster > > restart. It seems we can just assign returncode 2 as "done", probably > > with some enum or such hiding the numbers. > > So a "done" worker would never be restarted, until postmaster sees a > crash or is itself restarted? I guess that'd be useful for workers > running during recovery, which terminate when recovery completes. Is > that your use case? Well, its not actual postgres recovery, but something similar in the context of logical replication. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: >> How would postmaster know when to restart a worker that stopped? > > I had imagined we would assign some return codes special > meaning. Currently 0 basically means "restart immediately", 1 means > "crashed, wait for some time", everything else results in a postmaster > restart. It seems we can just assign returncode 2 as "done", probably > with some enum or such hiding the numbers. In Erlang, the lib that cares about such things in called OTP, and that proposes a model of supervisor that knows when to restart a worker. The specs for the restart behaviour are: Restart = permanent | transient | temporary Restart defines when a terminated child process should be restarted. - A permanent child process is always restarted. - A temporary child process is never restarted (not even when the supervisor's restart strategy is rest_for_one or one_for_alland a sibling's death causes the temporary process to be terminated). - A transient child process is restarted only if it terminates abnormally, i.e. with another exit reason than normal,shutdown or {shutdown,Term}. Then about restart frequency, what they have is: The supervisors have a built-in mechanism to limit the number of restarts which can occur in a given time interval.This is determined by the values of the two parameters MaxR and MaxT in the start specification returned bythe callback function [ ... ] If more than MaxR number of restarts occur in the last MaxT seconds, then the supervisor terminates all the child processesand then itself. You can read the whole thing here: http://www.erlang.org/doc/design_principles/sup_princ.html#id71215 I think we should get some inspiration from them here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Tue, Apr 23, 2013 at 1:22 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> So a "done" worker would never be restarted, until postmaster sees a >> crash or is itself restarted? I guess that'd be useful for workers >> running during recovery, which terminate when recovery completes. Is >> that your use case? > > Well, its not actual postgres recovery, but something similar in the > context of logical replication. It's probably too late to be twiddling this very much more, but another thing I think would be useful is for backends to have the ability to request that the postmaster start a worker of type xyz, rather than having the server start it automatically at startup time. That's what you'd need for parallel query, and there might be some replication-related use cases for such things as well. The general usage pattern would be: - regular backend realizes that it needs help - kicks postmaster to start a helper process - helper process runs for a while, doing work - helper process finishes work, maybe waits around for some period of time to see if any new work arrives, and then exits - eventually go back to step 1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 4/24/13 12:30 PM, Dimitri Fontaine wrote: > In Erlang, the lib that cares about such things in called OTP, and that > proposes a model of supervisor that knows when to restart a worker. The > specs for the restart behaviour are: > > Restart = permanent | transient | temporary There is also supervisord; see configuration settings "autorestart" and "exitcodes" here: http://supervisord.org/configuration.html#program-x-section-settings Yes, the feature creep is in full progress!
Peter Eisentraut wrote: > On 4/24/13 12:30 PM, Dimitri Fontaine wrote: > > In Erlang, the lib that cares about such things in called OTP, and that > > proposes a model of supervisor that knows when to restart a worker. The > > specs for the restart behaviour are: > > > > Restart = permanent | transient | temporary > > There is also supervisord; see configuration settings "autorestart" and > "exitcodes" here: > > http://supervisord.org/configuration.html#program-x-section-settings > > Yes, the feature creep is in full progress! The main missing feature before this can be sensibly implemented, in my view, is some way to make workers start when they are stopped, assuming no intervening postmaster crash. I suppose we could write a SQL-callable function so that a backend can signal postmaster to launch a worker. For this to work, I think we need an SQL-accesible way to list existing registered workers, along with whether they are running or not, and some identifier. However, the list of registered workers and their statuses currently only exists in postmaster local memory; exporting that might be problematic. (Maybe a simple file with a list of registered workers, but not the status, is good enough. Postmaster could write it after registration is done.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services