Re: autovacuum next steps, take 3 - Mailing list pgsql-hackers
From | Matthew T. O'Connor |
---|---|
Subject | Re: autovacuum next steps, take 3 |
Date | |
Msg-id | 45F1F316.7020905@zeut.net Whole thread Raw |
In response to | autovacuum next steps, take 3 (Alvaro Herrera <alvherre@commandprompt.com>) |
Responses |
Re: autovacuum next steps, take 3
|
List | pgsql-hackers |
My initial reaction is that this looks good to me, but still a few comments below. Alvaro Herrera wrote: > Here is a low-level, very detailed description of the implementation of > the autovacuum ideas we have so far. > > launcher's dealing with databases > --------------------------------- [ Snip ] > launcher and worker interactions [Snip] > worker to-do list > ----------------- > When each worker starts, it determines which tables to process in the > usual fashion: get pg_autovacuum and pgstat data and compute the > equations. > > The worker then takes a "snapshot" of what's currently going on in the > database, by storing worker PIDs, the corresponding table OID that's > being currently worked, and the to-do list for each worker. Does a new worker really care about the PID of other workers or what table they are currently working on? > It removes from its to-do list the tables being processed. Finally, it > writes the list to disk. Just to be clear, the new worker removes from it's todo list all the tables mentioned in the todo lists of all the other workers? > The table list will be written to a file in > PGDATA/vacuum/<database-oid>/todo.<worker-pid> > The file will consist of table OIDs, in the order in which they are > going to be vacuumed. > > At this point, vacuuming can begin. This all sounds good to me so far. > Before processing each table, it scans the WorkerInfos to see if there's > a new worker, in which case it reads its to-do list to memory. It's not clear to me why a worker cares that there is a new worker, since the new worker is going to ignore all the tables that are already claimed by all worker todo lists. > Then it again fetches the tables being processed by other workers in the > same database, and for each other worker, removes from its own in-memory > to-do all those tables mentioned in the other lists that appear earlier > than the current table being processed (inclusive). Then it picks the > next non-removed table in the list. All of this must be done with the > Autovacuum LWLock grabbed in exclusive mode, so that no other worker can > pick the same table (no IO takes places here, because the whole lists > were saved in memory at the start.) Again it's not clear to me what this is gaining us? It seems to me that if when a worker starts up writes out it's to-do list, it should just do it, I don't see the value in workers constantly updating their todo lists. Maybe I'm just missing something can you enlighten me? > other things to consider > ------------------------ > > This proposal doesn't deal with the hot tables stuff at all, but that is > very easy to bolt on later: just change the first phase, where the > initial to-do list is determined, to exclude "cold" tables. That way, > the vacuuming will be fast. Determining what is a cold table is still > an exercise to the reader ... I think we can make this algorithm naturally favor small / hot tables with one small change. Having workers remove tables that they just vacuumed from their to-do lists and re-write their todo lists to disk. Assuming the todo lists are ordered by size ascending, smaller tables will be made available for inspection by newer workers sooner rather than later.
pgsql-hackers by date: