dynamic background workers, round two - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | dynamic background workers, round two |
Date | |
Msg-id | CA+Tgmobk_FW1sN55+sXaeN+c+vBBd6z9ijkUHjVwyRm_48NmgA@mail.gmail.com Whole thread Raw |
Responses |
Re: dynamic background workers, round two
|
List | pgsql-hackers |
The dynamic background workers patch that I submitted for CF1 was generally well-received, but several people commented on a significant limitation: there's currently no way for a backend that requests a new background worker to know whether that background worker was successfully started. If you're using the background worker mechanism to run daemon processes of some sort, this is probably not a huge problem. You most likely don't start and stop those processes very frequently, and when you do manually start them, you can always examine the server log to see whether everything worked as planned. Maybe not ideal, but not unbearably awful, either. However, the goal I'm working towards here is parallel query, and for that application, and some others, things don't look so good. In that case, you are probably launching a background worker flagged as BGW_NEVER_RESTART, and so the postmaster is going to try to start it just once, and if it doesn't work, you really need some way of knowing that. Of course, if the worker is launched successfully, you can have it notify the process that started it via any mechanism you choose: creating a sentinel file, inserting data into a table, setting the process latch. Sky's the limit. However, if the worker isn't launched successfully (fork fails, or the process crashes before it reaches your code) you have no way of knowing. If you don't receive the agreed-upon notification from the child, it means that EITHER the process was never started in the first place OR the postmaster just hasn't gotten around to starting it yet. Of course, you could wait for a long time (5s ?) and then give up, but there's no good way to set the wait time. If you make it long, then it takes a long time for you to report errors to the client even when those errors happen quickly. If you make it short, you may time out spuriously on a busy system. The attached patch attempts to remedy this problem. When you register a background worker, you can obtain a "handle" that can subsequently be used to query for the worker's PID. If you additionally initialize bgw_notify_pid = MyProcPid, then the postmaster will send you SIGUSR1 when worker startup has been attempted (successfully or not). You can wait for this signal by passing your handle to WaitForBackgroundWorkerStartup(), which will return only when either (1) an attempt has been made to start the worker or (2) the postmaster is determined to have died. This interface seems adequate for something like worker_spi, where it's useful to know whether the child was started or not (and return the PID if so) but that's about all that's really needed. More complex notification interfaces can also be coded using the primitives introduced by this patch. Setting bgw_notify_pid will cause the postmaster to send SIGUSR1 every time it starts the worker and every time the worker dies. Every time you receive that signal, you can call GetBackgroundWorkerPid() for each background worker to find out which ones have started or terminated. The only slight difficulty is in waiting for receipt of that signal, which I implemented by adding a new global called set_latch_on_sigusr1. WaitForBackgroundWorkerStartup() uses that flag internally, but there's nothing to prevent a caller from using it as part of their own event loop. I find the set_latch_on_sigusr1 flag to be slightly ugly, but it seems better and far safer than having the postmaster try to actually set the latch itself - for example, it'd obviously be unacceptable to pass a Latch * to the postmaster and have the postmaster assume that pointer valid. I'm hopeful that this is about as much fiddling with the background worker mechanism per se as will be needed for parallel query. Once we have this, I think the next hurdle will be designing suitable IPC mechanisms, so that there's a relatively easy way for the "parent" process to pass information to its background workers and visca versa. I expect the main tool for that to be a dynamic shared memory facility; but I'm hoping that won't be closely tied to background workers, though they may be heavy users of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: