Re: autovacuum next steps, take 2 - Mailing list pgsql-hackers

From Matthew T. O'Connor
Subject Re: autovacuum next steps, take 2
Date
Msg-id 45E3991A.3030605@zeut.net
Whole thread Raw
In response to Re: autovacuum next steps, take 2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Matthew T. O'Connor wrote:
>>> I'm not sure it's a good idea to tie this to the vacuum cost delay 
>>> settings either, so let me as you this, how is this better than just 
>>> allowing the admin to set a new GUC variable like 
>>> autovacuum_hot_table_size_threshold  (or something shorter) which we can 
>>> assign a decent default of say 8MB.
> 
>> Yeah, maybe that's better -- it's certainly simpler.
> 
> I'm not liking any of these very much, as they seem critically dependent
> on impossible-to-tune parameters.  I think it'd be better to design this
> around having the first worker explicitly expose its state (list of
> tables to process, in order) and having subsequent workers key off that
> info.  The shared memory state could include the OID of the table each
> worker is currently working on, and we could keep the to-do list in some
> simple flat file for instance (since we don't care about crash safety).

So far we are only talking about one parameter, the 
hot_table_size_threshold, which I agree would be a guess by an admin, 
but if we went in this direction, I would also advocate adding a column 
to the pg_autovacuum table that allows an admin to explicitly define a 
table as hot or not.

Also I think each worker should be mostly independent, the only caveat 
being that (assuming each worker works in size order) if we catch up to 
an older worker (get to the table they are currently working on) we 
exit.  Personally I think this is all we need, but others felt the 
additional threshold was needed.  What do you think?  Or what do you 
think might be better?

> I'm not certain exactly what "key off" needs to mean; perhaps each
> worker should make its own to-do list and then discard items that are
> either in-progress or recently done by another worker when it gets to
> them.

My initial design didn't have any threshold at all, but others felt this 
would/could result in too many worker working concurrently in the same DB.

> I think an absolute minimum requirement for a sane design is that no two
> workers ever try to vacuum the same table concurrently, and I don't see
> where that behavior will emerge from your proposal; whereas it's fairly
> easy to make it happen if non-first workers pay attention to what other
> workers are doing.

Maybe we never made that clear, I was always working on the assumption 
that two workers would never try to work on the same table at the same time.

> BTW, it's probably necessary to treat shared catalogs specially ...

Certainly.


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Seeking Google SoC Mentors
Next
From: "Jim C. Nasby"
Date:
Subject: Re: autovacuum next steps, take 2