On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > any parallel query. > In LaunchParallelWorkers, you can see > nworkers = n nworkers_launched = n (n>0) > But, all the workers will crash because of the assert statement. > 2. the server restarts automatically, initialize > BackgroundWorkerData->parallel_register_count and > BackgroundWorkerData->parallel_terminate_count in the shared memory. > After that, it calls ForgetBackgroundWorker and it increments > parallel_terminate_count. In LaunchParallelWorkers, we have the > following condition: > if ((BackgroundWorkerData->parallel_register_count - > BackgroundWorkerData->parallel_terminate_count) >= > max_parallel_workers) > DO NOT launch any parallel worker. > Hence, nworkers = n nworkers_launched = 0. parallel_register_count and parallel_terminate_count, both are unsigned integer. So, whenever the difference is negative, it'll be a well-defined unsigned integer and certainly much larger than max_parallel_workers. Hence, no workers will be launched. I've attached a patch to fix this.
The current explanation of active number of parallel workers is:
* The active
* number of parallel workers is the number of registered workers minus the
* terminated ones.
In the situations like you mentioned above, this formula can give negative
number for active parallel workers. However a negative number for active
parallel workers does not make any sense.
I feel it would be better to explain in code that in what situations, the formula
can generate a negative result and what that means.