Thread: autovacuum starvation
Hi, The recently discovered autovacuum bug made me notice something that is possibly critical. The current autovacuum code makes an effort not to leave workers in a "starting" state for too long, lest there be failure to timely tend all databases needing vacuum. This is how the launching of workers works: 1) the launcher puts a pointer to a WorkerInfo entry in shared memory, called "the starting worker" pointer 2) the launcher sends a signal to the postmaster 3) the postmaster forks a worker 4) the new worker checks the starting worker pointer 5) the new worker resets the starting worker pointer 6) the new worker connects to the given database and vacuums it The problem is this: I originally added some code in the autovacuum launcher to check that a worker does not take "too long" to start. This is autovacuum_naptime seconds. If this happens, the launcher resets the starting worker pointer, which means that the newly starting worker will not see anything that needs to be done and exit quickly. The problem with this is that on a high load machine, for example lionfish during buildfarm runs, this would cause autovacuum starvation for the period in which the high load is sustained. This could prove dangerous. The problem is that things like fork() failure cannot be communicated back to the launcher. So when the postmaster tries to start a process and it fails for some reason (failure to fork, or out of memory) we need a way to re-initiate the worker that failed. The current code resets the starting worker pointer, and leave the slot free for another worker, maybe in another database, to start. I recently added code to resend the postmaster signal when the launcher sees the starting worker pointer not invalid -- step 2 above. I think this is fine, but 1) we should remove the logic to remove the starting worker pointer. It is not needed, because database-local failures will be handled by subsequent checks 2) we should leave the logic to resend the postmaster, but we should make an effort to avoid sending it too frequently Opinions? If I haven't stated the problem clearly please let me know and I'll try to rephrase. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On May 2, 2007, at 5:39 PM, Alvaro Herrera wrote: > The recently discovered autovacuum bug made me notice something > that is > possibly critical. The current autovacuum code makes an effort not to > leave workers in a "starting" state for too long, lest there be > failure > to timely tend all databases needing vacuum. > > This is how the launching of workers works: > 1) the launcher puts a pointer to a WorkerInfo entry in shared memory, > called "the starting worker" pointer > 2) the launcher sends a signal to the postmaster > 3) the postmaster forks a worker > 4) the new worker checks the starting worker pointer > 5) the new worker resets the starting worker pointer > 6) the new worker connects to the given database and vacuums it > > The problem is this: I originally added some code in the autovacuum > launcher to check that a worker does not take "too long" to start. > This > is autovacuum_naptime seconds. If this happens, the launcher > resets the > starting worker pointer, which means that the newly starting worker > will > not see anything that needs to be done and exit quickly. > > The problem with this is that on a high load machine, for example > lionfish during buildfarm runs, this would cause autovacuum starvation > for the period in which the high load is sustained. This could prove > dangerous. > > The problem is that things like fork() failure cannot be communicated > back to the launcher. So when the postmaster tries to start a process > and it fails for some reason (failure to fork, or out of memory) we > need > a way to re-initiate the worker that failed. > > The current code resets the starting worker pointer, and leave the > slot > free for another worker, maybe in another database, to start. > > I recently added code to resend the postmaster signal when the > launcher > sees the starting worker pointer not invalid -- step 2 above. I think > this is fine, but > > 1) we should remove the logic to remove the starting worker > pointer. It > is not needed, because database-local failures will be handled by > subsequent checks > > 2) we should leave the logic to resend the postmaster, but we should > make an effort to avoid sending it too frequently > > Opinions? > > If I haven't stated the problem clearly please let me know and I'll > try > to rephrase. Isn't there some way to get the postmaster to signal the launcher? Perhaps stick an error code in shared memory and send it a signal? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim Nasby wrote: > On May 2, 2007, at 5:39 PM, Alvaro Herrera wrote: > >The recently discovered autovacuum bug made me notice something > >that is > >possibly critical. The current autovacuum code makes an effort not to > >leave workers in a "starting" state for too long, lest there be > >failure > >to timely tend all databases needing vacuum. > Isn't there some way to get the postmaster to signal the launcher? > Perhaps stick an error code in shared memory and send it a signal? We could have the postmaster signal the launcher, but the signal cannot contain much useful info because the postmaster does generally not want to write in shared memory. Perhaps we could have the postmaster send a SIGUSR2 signal, which would mean "couldn't start the worker" without any other info. Anything else the launcher needs can be deduced from shmem state anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > We could have the postmaster signal the launcher, but the signal cannot > contain much useful info because the postmaster does generally not want > to write in shared memory. The postmaster already touches/writes in shared memory in pmsignal.c. The trick here is that whatever it does should be sufficiently constrained that arbitrarily-bad corruption of shared memory can't crash or freeze the postmaster. If you can meet that restriction, feel free to introduce some more signaling knowledge. regards, tom lane