Thread: autovacuum multiworkers, patch 5
Hi, Here is the autovacuum patch I am currently working with. This is basically the same as the previous patch; I have tweaked the database list management so that after a change in databases (say a new database is created or a database is dropped), the list is recomputed to account for the change, keeping the ordering of the previous list. Modulo two low probability failure scenarios, I feel this patch is ready to be applied; I will do so on Friday unless there are objections. The failure scenarios are detailed in the comment pasted below. I intend to attack these problems next, but as the first one should be fairly low probability, I don't think it should bar the current patch from being applied. (The second problem, which seems to me to be the most serious, should be easily fixable by checking launch times and "aborting" processes that took longer than autovacuum_naptime to start). /* * Main loop for the autovacuum launcher process. * * The signalling between launcher and worker is as follows: * * When the worker has finished starting up, it stores its PID in wi_workerpid * and sends a SIGUSR1 signal to the launcher. The launcher then knows that * the postmaster is ready to start a new worker. We do it this way because * otherwise we risk calling SendPostmasterSignal() when the postmaster hasn't * yet processed the last one, in which case the second signal would be lost. * This is only useful when two workers need to be started close to one * another, which should be rare but it's possible. * * Additionally, when the worker is finished with the vacuum work, it sets the * wi_finished flag and sends a SIGUSR1 signal to the launcher. Upon receipt * of this signal, the launcher then clears the entry for future use and may * start another worker right away, if need be. * * There is at least one race condition here: if the workers are all busy, a * database needs immediate attention and a worker finishes just after the * launcher started a worker and sent the signal to postmaster, but before * postmaster processes the signal; at this point, the launcher receives a * signal from the finishing process, sees the empty slot, and sends the * signal to postmaster again to start another worker. But the postmaster * SendPostmasterSignal() flag was already set, so the signal is lost. To * avoid this problem, the launcher should not try to start a new worker until * all WorkerInfo entries that have the wi_dboid field set have a PID assigned. * FIXME someday. The problem is that if we have workers failing to start for * some reason, holding the start of new workers will worsen the starvation by * disabling the start of a new worker as soon as one worker fails to start. * So it's important to be able to distinguish a worker that has failed * starting from a worker that is just taking its little bit of time to do so. * * There is another potential problem if, for some reason, a worker starts and * is not able to finish correctly. It will not be able to set its finished * flag, so the launcher will believe that it's still starting up. To prevent * this problem, we should check the PGPROCs of worker processes, and clean * them up if we find they are not actually running (or they correspond to * processes that are not autovacuum workers.) FIXME someday. */ -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Hi, > > Here is the autovacuum patch I am currently working with. Obviously I forgot to attach the patch, sorry. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "Para tener más hay que desear menos"
Attachment
Alvaro Herrera wrote: > Hi, uhmmm patch? > > Here is the autovacuum patch I am currently working with. This is > basically the same as the previous patch; I have tweaked the database > list management so that after a change in databases (say a new database > is created or a database is dropped), the list is recomputed to account > for the change, keeping the ordering of the previous list. > > Modulo two low probability failure scenarios, I feel this patch is ready > to be applied; I will do so on Friday unless there are objections. > > The failure scenarios are detailed in the comment pasted below. I > intend to attack these problems next, but as the first one should be > fairly low probability, I don't think it should bar the current patch > from being applied. (The second problem, which seems to me to be the > most serious, should be easily fixable by checking launch times and -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Alvaro Herrera <alvherre@commandprompt.com> wrote: > Here is the autovacuum patch I am currently working with. This is > basically the same as the previous patch; I have tweaked the database > list management so that after a change in databases (say a new database > is created or a database is dropped), the list is recomputed to account > for the change, keeping the ordering of the previous list. I'm interested in your multiworkers autovacuum proposal. I'm researching the impact of multiworkers with autovacuum_vacuum_cost_limit. Autovacuum will consume server resources up to autovacuum_max_workers times as many as before. I think we might need to change the semantics of autovacuum_vacuum_cost_limit when we have multiworkers. BTW, I found an unwitting mistake in the foreach_worker() macro. These two operations are same in C. - worker + 1 - (WorkerInfo *) (((char *) worker) + sizeof(WorkerInfo)) #define foreach_worker(_i, _worker) \ _worker = (WorkerInfo *) (AutoVacuumShmem + \ offsetof(AutoVacuumShmemStruct, av_workers)); \ for (_i = 0; _i < autovacuum_max_workers; _i++, _worker += sizeof(WorkerInfo)) should be: #define foreach_worker(_worker) \ for ((_worker) = AutoVacuumShmem->av_workers; \ (_worker) < AutoVacuumShmem->av_workers + autovacuum_max_workers; \ (_worker)++) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > Here is the autovacuum patch I am currently working with. This is > > basically the same as the previous patch; I have tweaked the database > > list management so that after a change in databases (say a new database > > is created or a database is dropped), the list is recomputed to account > > for the change, keeping the ordering of the previous list. > > I'm interested in your multiworkers autovacuum proposal. > > I'm researching the impact of multiworkers with autovacuum_vacuum_cost_limit. > Autovacuum will consume server resources up to autovacuum_max_workers times > as many as before. I think we might need to change the semantics of > autovacuum_vacuum_cost_limit when we have multiworkers. Yes, that's correct. Per previous discussion, what I actually wanted to do was to create a GUC setting to simplify the whole thing, something like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". Then, have each worker use up to (max_per_second/active workers) as much IO resources. This way, the maximum use of IO resources by vacuum can be easily determined and limited by the DBA; certainly much simpler than the vacuum cost limiting feature. > BTW, I found an unwitting mistake in the foreach_worker() macro. > These two operations are same in C. > - worker + 1 > - (WorkerInfo *) (((char *) worker) + sizeof(WorkerInfo)) Ah, thanks. I had originally coded the macro like you suggest, but then during the development I needed to use the "i" variable as well, so I added it. Apparently later I removed that usage; I see that there are no such uses left in the current code. The "+ sizeof(WorkerInfo)" part is just stupidity on my part, sorry about that. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > ITAGAKI Takahiro wrote: >> Alvaro Herrera <alvherre@commandprompt.com> wrote: >> >>> Here is the autovacuum patch I am currently working with. This is >>> basically the same as the previous patch; I have tweaked the database >>> list management so that after a change in databases (say a new database >>> is created or a database is dropped), the list is recomputed to account >>> for the change, keeping the ordering of the previous list. >> I'm interested in your multiworkers autovacuum proposal. >> >> I'm researching the impact of multiworkers with autovacuum_vacuum_cost_limit. >> Autovacuum will consume server resources up to autovacuum_max_workers times >> as many as before. I think we might need to change the semantics of >> autovacuum_vacuum_cost_limit when we have multiworkers. > > Yes, that's correct. Per previous discussion, what I actually wanted to > do was to create a GUC setting to simplify the whole thing, something > like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". > Then, have each worker use up to (max_per_second/active workers) as much > IO resources. This way, the maximum use of IO resources by vacuum can > be easily determined and limited by the DBA; certainly much simpler than > the vacuum cost limiting feature. +1 Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Joshua D. Drake wrote: > Alvaro Herrera wrote: > >ITAGAKI Takahiro wrote: > >>Alvaro Herrera <alvherre@commandprompt.com> wrote: > >> > >>>Here is the autovacuum patch I am currently working with. This is > >>>basically the same as the previous patch; I have tweaked the database > >>>list management so that after a change in databases (say a new database > >>>is created or a database is dropped), the list is recomputed to account > >>>for the change, keeping the ordering of the previous list. > >>I'm interested in your multiworkers autovacuum proposal. > >> > >>I'm researching the impact of multiworkers with > >>autovacuum_vacuum_cost_limit. > >>Autovacuum will consume server resources up to autovacuum_max_workers > >>times > >>as many as before. I think we might need to change the semantics of > >>autovacuum_vacuum_cost_limit when we have multiworkers. > > > >Yes, that's correct. Per previous discussion, what I actually wanted to > >do was to create a GUC setting to simplify the whole thing, something > >like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". > >Then, have each worker use up to (max_per_second/active workers) as much > >IO resources. This way, the maximum use of IO resources by vacuum can > >be easily determined and limited by the DBA; certainly much simpler than > >the vacuum cost limiting feature. > > +1 One thing I forgot to mention is that this is unlikely to be implemented in 8.3. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Another problem seems to be that I'm not checking anywhere that a regular connection (not autovac) is not using an autovac-reserved PGPROC slot :-( I think I should tweak the logic that deals with ReservedBackends but it doesn't look entirely trivial. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> > >Yes, that's correct. Per previous discussion, what I actually wanted to > > >do was to create a GUC setting to simplify the whole thing, something > > >like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". > > >Then, have each worker use up to (max_per_second/active workers) as much > > >IO resources. > > One thing I forgot to mention is that this is unlikely to be implemented > in 8.3. This is a WIP cost balancing patch built on autovacuum-multiworkers-5.patch. The total cost of workers are adjusted to autovacuum_vacuum_cost_delay. I added copy of worker's cost parameters to the shared WorkerInfo array. A launcher and each worker reads and writes the copied parameters when a worker starts a vacuum job or exit the process. Workers assign their local VacuumCostDelay from the shared value every sleep in vacuum_delay_point(). I agree that "mb_per_second" or "io_per_second" are easier to use than present cost delay parameters, but we need more research to move to it. I think it is better to keep "cost_limit" and "cost_delay" as of 8.3, but we need cost-balanced multiworkers at any rate. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
ITAGAKI Takahiro wrote: > > > >Yes, that's correct. Per previous discussion, what I actually wanted to > > > >do was to create a GUC setting to simplify the whole thing, something > > > >like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". > > > >Then, have each worker use up to (max_per_second/active workers) as much > > > >IO resources. > > > > One thing I forgot to mention is that this is unlikely to be implemented > > in 8.3. > > This is a WIP cost balancing patch built on autovacuum-multiworkers-5.patch. > The total cost of workers are adjusted to autovacuum_vacuum_cost_delay. > > I added copy of worker's cost parameters to the shared WorkerInfo array. > A launcher and each worker reads and writes the copied parameters when > a worker starts a vacuum job or exit the process. Workers assign their local > VacuumCostDelay from the shared value every sleep in vacuum_delay_point(). Thanks! I had already incorporated the foreach_worker changes into my code, and later realized that there's an important bug regarding the PGPROC of the workers, so I've reworked the patch, which meant that the foreach_worker() macro went away completely. I'll put it your changes in my current WIP patch; if you do any further work on it, please let me have it to include it in the latest work. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
ITAGAKI Takahiro wrote: > > > >Yes, that's correct. Per previous discussion, what I actually wanted to > > > >do was to create a GUC setting to simplify the whole thing, something > > > >like "autovacuum_max_mb_per_second" or "autovacuum_max_io_per_second". > > > >Then, have each worker use up to (max_per_second/active workers) as much > > > >IO resources. > > > > One thing I forgot to mention is that this is unlikely to be implemented > > in 8.3. > > This is a WIP cost balancing patch built on autovacuum-multiworkers-5.patch. > The total cost of workers are adjusted to autovacuum_vacuum_cost_delay. I manually merged your patch on top of my own. This is the result. Please have a look at whether the new code is correct and behaves sanely (I haven't tested it). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> wrote: > I manually merged your patch on top of my own. This is the result. > Please have a look at whether the new code is correct and behaves sanely > (I haven't tested it). The patch seems to be broken -- the latter half is lost. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > I manually merged your patch on top of my own. This is the result. > > Please have a look at whether the new code is correct and behaves sanely > > (I haven't tested it). > > The patch seems to be broken -- the latter half is lost. Huh, you are right, it is broken, even in my outgoing mailbox -- I don't know what happened, as the file I have on disk is complete. Here is attached again. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > I manually merged your patch on top of my own. This is the result. > > > Please have a look at whether the new code is correct and behaves sanely > > > (I haven't tested it). > > Huh, you are right, it is broken, even in my outgoing mailbox -- I don't > know what happened, as the file I have on disk is complete. Here is > attached again. I tested your patch on Linux and Windows. It works well on Linux, where we use fork(), but falls into segfault on Windows, where we use exec(). Maybe you forgot to initialize the shared memory stuff. (I haven't find out where to be fixed, sorry.) Multiworker and balancing seem to work well after they successfully start up. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > > Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > > > I manually merged your patch on top of my own. This is the result. > > > > Please have a look at whether the new code is correct and behaves sanely > > > > (I haven't tested it). > > > > Huh, you are right, it is broken, even in my outgoing mailbox -- I don't > > know what happened, as the file I have on disk is complete. Here is > > attached again. > > I tested your patch on Linux and Windows. It works well on Linux, > where we use fork(), but falls into segfault on Windows, where we > use exec(). Maybe you forgot to initialize the shared memory stuff. > (I haven't find out where to be fixed, sorry.) Ok, thanks, this confirms that I have to try the EXEC_BACKEND code path. > Multiworker and balancing seem to work well after they successfully start up. Great. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > ITAGAKI Takahiro wrote: > > > > I tested your patch on Linux and Windows. It works well on Linux, > > where we use fork(), but falls into segfault on Windows, where we > > use exec(). Maybe you forgot to initialize the shared memory stuff. > > (I haven't find out where to be fixed, sorry.) > > Ok, thanks, this confirms that I have to try the EXEC_BACKEND code path. Oh, uh, the problem is that CreateSharedMemoryAndSemaphores wants to have access to the PGPROC already, but to obtain the PGPROC we need access to autovac shared memory (per AutoVacuumGetFreeProc). So this wasn't too bright a choice :-( -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Oh, uh, the problem is that CreateSharedMemoryAndSemaphores wants to > have access to the PGPROC already, but to obtain the PGPROC we need > access to autovac shared memory (per AutoVacuumGetFreeProc). So this > wasn't too bright a choice :-( It seems like I'll have to decouple autovacuum PGPROC's from autovacuum's own shared memory. The most sensible way to do this seems to be to store them in ProcGlobal, along with the regular backend's PGPROCs. Is everyone OK with this plan? Note that this will mean that those PGPROCs will be protected by the same spinlock that protects the other PGPROCs. I can't think of any reason why this would be a problem, but if you think otherwise please speak up. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > It seems like I'll have to decouple autovacuum PGPROC's from > autovacuum's own shared memory. The most sensible way to do this seems > to be to store them in ProcGlobal, along with the regular backend's > PGPROCs. Is everyone OK with this plan? > Note that this will mean that those PGPROCs will be protected by the > same spinlock that protects the other PGPROCs. I can't think of any > reason why this would be a problem, but if you think otherwise please > speak up. I thought the separate pool of PGPROCs was a bit weird. If you're going back to a common pool, I'm all for it. regards, tom lane
Ok, here is a version which works on EXEC_BACKEND too. The main change in this patch is that the PGPROC list has been moved into ProcGlobal, as discussed. They are still separate from the main PGPROCs, but they are protected by the same spinlock and thus a bit of code dealing with that goes away. Also, there is no longer a separate ProcKill, and the changes to InitProcess are less intrusive. The code dealing with clearing the WorkerInfo has been moved into its own on_shmem_exit hook. Also, a new signal is sent to the postmaster when the worker which is supposed to be starting has not started. This should not make much of a difference in general, since the postmaster is not supposed to lose signals, but it gives an extra guarantee that things are not going to linger too long. The request to start is removed when autovacuum_naptime seconds have elapsed anyway. Oh, I also changed the default autovacuum_max_workers from 10 to 3. Some other minor infelicites have been corrected, like resetting the av_rebalance flag when the rebalancing takes place (doh); some comments have been expanded a bit. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.