Thread: putting a bgworker to rest

putting a bgworker to rest

From

Andres Freund

Date:

23 April 2013, 16:48:48

Hi all,

I noticed the need to simply stop a bgworker after its work is done but
still have it restart in unusual circumstances like a crash.
Obviously I can just have it enter a loop where it checks its latch and
such, but that seems a bit pointless.

Would it make sense to add an extra return value or such for that?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: putting a bgworker to rest

From

Alvaro Herrera

Date:

23 April 2013, 18:47:44

Andres Freund wrote:
> Hi all,
>
> I noticed the need to simply stop a bgworker after its work is done but
> still have it restart in unusual circumstances like a crash.
> Obviously I can just have it enter a loop where it checks its latch and
> such, but that seems a bit pointless.
>
> Would it make sense to add an extra return value or such for that?

KaiGai also requested some more flexibility in the stop timing and
shutdown sequence.  I understand the current design that workers are
always on can be a bit annoying.

How would postmaster know when to restart a worker that stopped?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: putting a bgworker to rest

From

Andres Freund

Date:

23 April 2013, 19:08:06

On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi all,
> > 
> > I noticed the need to simply stop a bgworker after its work is done but
> > still have it restart in unusual circumstances like a crash.
> > Obviously I can just have it enter a loop where it checks its latch and
> > such, but that seems a bit pointless.
> > 
> > Would it make sense to add an extra return value or such for that?
> 
> KaiGai also requested some more flexibility in the stop timing and
> shutdown sequence.  I understand the current design that workers are
> always on can be a bit annoying.
> 
> How would postmaster know when to restart a worker that stopped?

I had imagined we would assign some return codes special
meaning. Currently 0 basically means "restart immediately", 1 means
"crashed, wait for some time", everything else results in a postmaster
restart. It seems we can just assign returncode 2 as "done", probably
with some enum or such hiding the numbers.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: putting a bgworker to rest

From

Alvaro Herrera

Date:

23 April 2013, 20:11:35

Andres Freund wrote:
> On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > Hi all,
> > >
> > > I noticed the need to simply stop a bgworker after its work is done but
> > > still have it restart in unusual circumstances like a crash.
> > > Obviously I can just have it enter a loop where it checks its latch and
> > > such, but that seems a bit pointless.
> > >
> > > Would it make sense to add an extra return value or such for that?
> >
> > KaiGai also requested some more flexibility in the stop timing and
> > shutdown sequence.  I understand the current design that workers are
> > always on can be a bit annoying.
> >
> > How would postmaster know when to restart a worker that stopped?
>
> I had imagined we would assign some return codes special
> meaning. Currently 0 basically means "restart immediately", 1 means
> "crashed, wait for some time", everything else results in a postmaster
> restart. It seems we can just assign returncode 2 as "done", probably
> with some enum or such hiding the numbers.

So a "done" worker would never be restarted, until postmaster sees a
crash or is itself restarted?  I guess that'd be useful for workers
running during recovery, which terminate when recovery completes.  Is
that your use case?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: putting a bgworker to rest

From

Andres Freund

Date:

23 April 2013, 20:22:18

On 2013-04-23 14:11:26 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2013-04-23 11:59:43 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > Hi all,
> > > > 
> > > > I noticed the need to simply stop a bgworker after its work is done but
> > > > still have it restart in unusual circumstances like a crash.
> > > > Obviously I can just have it enter a loop where it checks its latch and
> > > > such, but that seems a bit pointless.
> > > > 
> > > > Would it make sense to add an extra return value or such for that?
> > > 
> > > KaiGai also requested some more flexibility in the stop timing and
> > > shutdown sequence.  I understand the current design that workers are
> > > always on can be a bit annoying.
> > > 
> > > How would postmaster know when to restart a worker that stopped?
> > 
> > I had imagined we would assign some return codes special
> > meaning. Currently 0 basically means "restart immediately", 1 means
> > "crashed, wait for some time", everything else results in a postmaster
> > restart. It seems we can just assign returncode 2 as "done", probably
> > with some enum or such hiding the numbers.
> 
> So a "done" worker would never be restarted, until postmaster sees a
> crash or is itself restarted?  I guess that'd be useful for workers
> running during recovery, which terminate when recovery completes.  Is
> that your use case?

Well, its not actual postgres recovery, but something similar in the
context of logical replication.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: putting a bgworker to rest

From

Dimitri Fontaine

Date:

24 April 2013, 19:31:07

Andres Freund <andres@2ndquadrant.com> writes:
>> How would postmaster know when to restart a worker that stopped?
>
> I had imagined we would assign some return codes special
> meaning. Currently 0 basically means "restart immediately", 1 means
> "crashed, wait for some time", everything else results in a postmaster
> restart. It seems we can just assign returncode 2 as "done", probably
> with some enum or such hiding the numbers.

In Erlang, the lib that cares about such things in called OTP, and that
proposes a model of supervisor that knows when to restart a worker. The
specs for the restart behaviour are:
 Restart = permanent | transient | temporary

Restart defines when a terminated child process should be restarted.
 - A permanent child process is always restarted.
 - A temporary child process is never restarted (not even when the   supervisor's restart strategy is rest_for_one or
one_for_alland a   sibling's death causes the temporary process to be terminated).

 - A transient child process is restarted only if it terminates   abnormally, i.e. with another exit reason than
normal,shutdown or   {shutdown,Term}.

Then about restart frequency, what they have is:
   The supervisors have a built-in mechanism to limit the number of   restarts which can occur in a given time
interval.This is   determined by the values of the two parameters MaxR and MaxT in the   start specification returned
bythe callback function [ ... ]

   If more than MaxR number of restarts occur in the last MaxT seconds,   then the supervisor terminates all the child
processesand then   itself.

You can read the whole thing here:
   http://www.erlang.org/doc/design_principles/sup_princ.html#id71215

I think we should get some inspiration from them here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: putting a bgworker to rest

From

Robert Haas

Date:

24 April 2013, 20:01:22

On Tue, Apr 23, 2013 at 1:22 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> So a "done" worker would never be restarted, until postmaster sees a
>> crash or is itself restarted?  I guess that'd be useful for workers
>> running during recovery, which terminate when recovery completes.  Is
>> that your use case?
>
> Well, its not actual postgres recovery, but something similar in the
> context of logical replication.

It's probably too late to be twiddling this very much more, but
another thing I think would be useful is for backends to have the
ability to request that the postmaster start a worker of type xyz,
rather than having the server start it automatically at startup time.
That's what you'd need for parallel query, and there might be some
replication-related use cases for such things as well.  The general
usage pattern would be:

- regular backend realizes that it needs help
- kicks postmaster to start a helper process
- helper process runs for a while, doing work
- helper process finishes work, maybe waits around for some period of
time to see if any new work arrives, and then exits
- eventually go back to step 1

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: putting a bgworker to rest

From

Peter Eisentraut

Date:

25 April 2013, 18:20:05

On 4/24/13 12:30 PM, Dimitri Fontaine wrote:
> In Erlang, the lib that cares about such things in called OTP, and that
> proposes a model of supervisor that knows when to restart a worker. The
> specs for the restart behaviour are:
> 
>   Restart = permanent | transient | temporary

There is also supervisord; see configuration settings "autorestart" and
"exitcodes" here:

http://supervisord.org/configuration.html#program-x-section-settings

Yes, the feature creep is in full progress!

Re: putting a bgworker to rest

From

Alvaro Herrera

Date:

25 April 2013, 19:06:07

Peter Eisentraut wrote:
> On 4/24/13 12:30 PM, Dimitri Fontaine wrote:
> > In Erlang, the lib that cares about such things in called OTP, and that
> > proposes a model of supervisor that knows when to restart a worker. The
> > specs for the restart behaviour are:
> >
> >   Restart = permanent | transient | temporary
>
> There is also supervisord; see configuration settings "autorestart" and
> "exitcodes" here:
>
> http://supervisord.org/configuration.html#program-x-section-settings
>
> Yes, the feature creep is in full progress!

The main missing feature before this can be sensibly implemented, in my
view, is some way to make workers start when they are stopped, assuming
no intervening postmaster crash.  I suppose we could write a
SQL-callable function so that a backend can signal postmaster to launch
a worker.  For this to work, I think we need an SQL-accesible way to
list existing registered workers, along with whether they are running or
not, and some identifier.  However, the list of registered workers and
their statuses currently only exists in postmaster local memory;
exporting that might be problematic.  (Maybe a simple file with a list
of registered workers, but not the status, is good enough. Postmaster
could write it after registration is done.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services