Thread: Inconsistent bgworker behaviour

Inconsistent bgworker behaviour

From
Beena Emerson
Date:

Hello,

I have been working on a  module which launches background workers for a list of databases provided by a configuration parameter(say m_databases). This configuration parameter can be edited and reloaded.
It has a launcher which manages all the workers launched by the module.  The worker.bgw_notify_pid of the workers are set to the launcher pid.

The number of background workers that can be launched is restricted by max_worker_processes.

Consider the following scenario:
max_worker_processes = 3
m_databases='db1, db2'

The server is started.

The m_databases is updated to
m_databases='db3, db2'

$ pg_ctl reload

The expected behavior is that the db1 worker should be terminated and db3 worker should be launched. However I found that this behavior is not consistent. In few runs, when the databases parameter is edited and reloaded, the new worker is launched before the old ones are terminated causing an error.

I have used the following code on the launcher to ensure that the old unnecessary background workers are terminated before the new background workers are launched for newly added databases.
for (i = 0; i < workers_launched; i++)
{
        if (!is_inlist(new_dblist, workers[i]->dbname))
        {
            /* Ensure that the background worker is terminated before regitsering
             * new workers to avoid crossing the limit of max_worker_processes
             */
            ResetLatch(&MyProc->procLatch);
            TerminateBackgroundWorker(workers[i]->handle);
            WaitLatch(&MyProc->procLatch, WL_LATCH_SET, 0);
        }
}
.
.
. (launch new workers)
.

The Latch is set when the SIGUSR1 signal is received. IIUC correctly, the launcher gets the SIGUSR1 when the bgworker process has exited. No new worker is launched or terminated in between still the code does not work as expected for all the runs.

Any help will be appreciated.

Thank you,

Beena

Re: Inconsistent bgworker behaviour

From
Craig Ringer
Date:
On 01/07/2015 11:54 AM, Beena Emerson wrote:
>
>             ResetLatch(&MyProc->procLatch);
>             TerminateBackgroundWorker(workers[i]->handle);
>             WaitLatch(&MyProc->procLatch, WL_LATCH_SET, 0);

This doesn't guarantee that the worker of interest has terminated, just
that your latch got set.

You should make sure the worker of interest is actually dead, and you
didn't get a SIGUSR1 for some other reason.

We could probably use a WaitForBackgroundWorkerTermination(...) to
correspond to WaitForBackgroundWorkerStartup(...) .

I think you'll probably want to GetBackgroundWorkerPid(...) and examine
the returned BgwHandleStatus to see if it's BGWH_STOPPED . If not, keep
waiting. You might want a timeout to re-check.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inconsistent bgworker behaviour

From
Beena Emerson
Date:

Hello,

Thank you for your reply.

> We could probably use a WaitForBackgroundWorkerTermination(...) to
> correspond to WaitForBackgroundWorkerStartup(...) .
>
> I think you'll probably want to GetBackgroundWorkerPid(...) and examine
> the returned BgwHandleStatus to see if it's BGWH_STOPPED . If not, keep
> waiting. You might want a timeout to re-check.

I have tried checking the status but I found that BGWH_STOPPED does not imply that the background worker has unregistered.

I updated the code as suggested  to check the Handle status in a loop and added appropriate LOG messages to understand the flow.

This is the log output. The old M.datatbases value was "db1, db4".

LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "m.databases" changed to "db1, db2"
LOG:  WaitLatch Ends
LOG:  The bgworker status is BGWH_STOPPED
WARNING:  could not register background process for database "db2"
HINT:  You may need to increase configuration parameter "max_worker_processes".
LOG:  worker process: m db4 (PID 8629) exited with exit code 0
LOG:  unregistering background worker "m db4"
server signaled

As seen even though the status is BGWH_STOPPED, the slot is not free and hence the new worker cannot be launched.

Is there any way to check if the bgworker has unregistered and freed a slot?

Thanks,

Beena

Re: Inconsistent bgworker behaviour

From
Craig Ringer
Date:
On 01/09/2015 12:53 AM, Beena Emerson wrote:
> Is there any way to check if the bgworker has unregistered and freed a
> slot?

Not that I'm aware of. Anyone else have suggestions?

This is related to some other discussion we've had about needing a way
to enumerate bgworkers. The specific issue of finding out when a
bgworker is unregistered would fit under that.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services