Review: Extra Daemons / bgworker - Mailing list pgsql-hackers

From Markus Wanner
Subject Review: Extra Daemons / bgworker
Date
Msg-id 50B61846.8050201@bluegap.ch
Whole thread Raw
Responses Re: Review: Extra Daemons / bgworker
List pgsql-hackers
Hello Álvaro,

first of all, thank you for bringing this up again and providing a
patch. My first attempt on that was more than two years ago [1]. As the
author of a former bgworker patch, I'd like to provide an additional
review - KaiGai was simply faster to sing up as a reviewer on the
commitfest app...

I started my review is based on rev 6d7e51.. from your bgworker branch
on github, only to figure out later that it wasn't quite up to date. I
upgraded to bgworker-7.patch. I hope that's the most recent version.


# Concept

I appreciate the ability to run daemons that are not connected to
Postgres shared memory. It satisfies a user request that came up several
times when I talked about the bgworkers in Postgres-R.

Another good point is the flexible interface via extensions, even
allowing different starting points for such background workers.

One thing I'd miss (for use of that framework in Postgres-R) is the
ability to start a registered bgworker only upon request, way after the
system started. So that the bgworker process doesn't even exist until it
is really needed. I realize that RegisterBackgroundWorker() is still
necessary in advance to reserve the slots in BackgroundWorkerList and
shared memory.

As a use case independent of Postgres-R, think of something akin to a
worker_spi, but wanting that to perform a task every 24h on hundreds of
databases. You don't want to keep hundreds of processes occupying PGPROC
slots just perform a measly task every 24h.

(We've discussed the ability to let bgworkers re-connect to another
database back then. For one, you'd still have currently unneeded worker
processes around all the time. And second, I think that's hard to get
right - after all, a terminated process is guaranteed to not leak any
stale data into a newly started one, no matter what.)

From my point of view, autovacuum is the very first example of a
background worker process. And I'm a bit puzzled about it not being
properly integrated into this framework. Once you think of autovacuum as
a background job which needs access to Postgres shared memory and a
database, but no client connection, it looks like a bit of code
duplication (and not using code we already have). I realize this kind of
needs the above feature being able to request the (re)start of bgworkers
at arbitrary points in time. However, it would also be a nice test case
for the bgworker infrastructure.

I'd be happy to help with extending the current patch into that
direction, if you agree it's generally useful. Or adapt my bgworker code
accordingly.


# Minor technical issues or questions

In assign_maxconnections, et al, GetNumRegisteredBackgroundWorkers() is
used in relation to MAX_BACKENDS or to calculate MaxBackends. That seems
to "leak" the bgworkers that registered with neither
BGWORKER_SHMEM_ACCESS nor BGWORKER_BACKEND_DATABASE_CONNECTION set. Or
is there any reason to discount such extra daemon processes?

The additional contrib modules auth_counter and worker_spi are missing
from the contrib/Makefile. If that's intentional, they should probably
be listed under "Missing".

The auth_counter module leaves worker.bgw_restart_time uninitialized.

Being an example, it might make sense for auth_counter to provide a
signal that just calls SetLatch() to interrupt WaitLatch() and make
auth_counter emit its log line upon request, i.e. show how to use SIGUSR1.

The bgw_restart_time doesn't always work (certainly not the way I'm
expecting it to). For example, if you forget to pass the
BGWORKER_SHMEM_ACCESS flag, trying to LWLockAcquire() leads to the
worker being restarted immediately and repeatedly - independent of the
bgw_restart_time setting. The same holds true if the bgworker exits with
status 0 or in case it segfaults. Not when exiting with code 1, though.
Why is that? Especially in case of a segfault or equally "hard" errors
that can be expected to occur repeatedly, I don't want the worker to be
restarted that frequently.


# Documentation

There are two working examples in contrib. The auth_counter could use a
header comment similar to worker_spi, quickly describing what it does.
There's no example of a plain extra daemon, without shmem access.

Coding guidelines for bgworker / extra daemon writers are missing. I
read these must not use sleep(), with an explanation in both examples.
Other questions that come to mind: what about signal handlers? fork()?
threads? Can/should it use PG_TRY/PG_CATCH or setjmp()/longjmp(). How to
best load other 3rd party libraries, i.e. for OpenGL processing?

Especially for extra daemons (i.e. bgw_flags = 0), differences to
running an external daemon should be documented. I myself am unclear
about all of the implications that running as a child of the postmaster
has (OOM and cgroups come to mind - there certainly are other aspects).
(I myself still have a hard time finding a proper use case for extra
daemons. I don't want the postmaster to turn into a general purpose
watchdog for everything and the kitchen sink.)


# Tests

No additional automated tests are included - hard to see how that could
be tested automatically, given the lack of a proper test harness.


I hope this review provides useful input.

Regards

Markus Wanner


[1]: That was during commit fest 2010-09. Back then, neither extensions,
latches nor the walsender existed. The patches had a dependency on
imessages, which was not accepted. Other than that, it's a working,
pooled bgworker implementation on top of which autovacuum runs just
fine. It lacks extensibility and the extra daemons feature, though. For
those who missed it:

https://commitfest.postgresql.org/action/patch_view?id=339
http://archives.postgresql.org/message-id/4C3C789C.1040409@bluegap.ch
http://git.postgres-r.org/?p=bgworker;a=summary



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Enabling frontend-only xlog "desc" routines
Next
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables