Re: autovacuum next steps, take 3 - Mailing list pgsql-hackers

From Matthew T. O'Connor
Subject Re: autovacuum next steps, take 3
Date
Msg-id 45F200A7.2060206@zeut.net
Whole thread Raw
In response to Re: autovacuum next steps, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> "Matthew T. O'Connor" <matthew@zeut.net> writes:
>> It's not clear to me why a worker cares that there is a new worker, 
>> since the new worker is going to ignore all the tables that are already 
>> claimed by all worker todo lists.
> 
> That seems wrong to me, since it means that new workers will ignore
> tables that are scheduled for processing by an existing worker, no
> matter how far in the future that schedule extends.  As an example,
> suppose you have half a dozen large tables in need of vacuuming.
> The first worker in will queue them all up, and subsequent workers
> will do nothing useful, at least not till the first worker is done
> with the first table.  Having the first worker update its todo
> list file after each table allows the earlier tables to be exposed
> for reconsideration, but that's expensive and it does nothing for
> later tables.

Well the big problem that we have is not that large tables are being 
starved, so this doesn't bother me too much, plus there is only so much 
IO, so one worker working sequentially through the big tables seems OK 
to me.

> I suggest that maybe we don't need exposed TODO lists at all.  Rather
> the workers could have internal TODO lists that are priority-sorted
> in some way, and expose only their current table OID in shared memory.
> Then the algorithm for processing each table in your list is
> 
>     1. Grab the AutovacSchedule LWLock exclusively.
>     2. Check to see if another worker is currently processing
>        that table; if so drop LWLock and go to next list entry.
>     3. Recompute whether table needs vacuuming; if not,
>        drop LWLock and go to next entry.  (This test covers the
>        case where someone vacuumed the table since you made your
>        list.)
>     4. Put table OID into shared memory, drop LWLock, then
>        vacuum table.
>     5. Clear current-table OID from shared memory, then
>        repeat for next list entry.
> 
> This creates a behavior of "whoever gets to it first" rather than
> allowing workers to claim tables that they actually won't be able
> to service any time soon.

Right, but you could wind up with as many workers working concurrently 
as you have tables in a database which doesn't seem like a good idea 
either.  One thing I like about the todo list setup Alvaro had is that 
new workers will be assigned fewer tables to work on and hence exit 
sooner.  We are going to fire off a new worker every autovac_naptime so 
availability of new workers isn't going to be a problem.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bug in VACUUM FULL ?
Next
From: Greg Smith
Date:
Subject: Re: Log levels for checkpoint/bgwriter monitoring