Re: autovacuum: change priority of the vacuumed tables - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: autovacuum: change priority of the vacuumed tables
Date
Msg-id b10ed5eb-f73f-a262-6991-c8692572787e@2ndquadrant.com
Whole thread Raw
In response to Re: autovacuum: change priority of the vacuumed tables  (Jim Nasby <jim.nasby@openscg.com>)
Responses Re: autovacuum: change priority of the vacuumed tables  (Jim Nasby <jim.nasby@openscg.com>)
List pgsql-hackers
On 03/03/2018 08:32 PM, Jim Nasby wrote:
> On 2/19/18 10:00 AM, Tomas Vondra wrote:
>> So I don't think this is a very promising approach, unfortunately.
>>
>> What I think might work is having a separate pool of autovac workers,
>> dedicated to these high-priority tables. That still would not guarantee
>> the high-priority tables are vacuumed immediately, but at least that
>> they are not stuck in the worker queue behind low-priority ones.
>>
>> I wonder if we could detect tables with high update/delete activity and
>> promote them to high-priority automatically. The reasoning is that by
>> delaying the cleanup for those tables would result in significantly more
>> bloat than for those with low update/delete activity.
> 
> I've looked at this stuff in the past, and I think that the first step
> in trying to improve autovacuum needs to be allowing for a much more
> granular means of controlling worker table selection, and exposing that
> ability. There are simply too many different scenarios to try and
> account for to try and make a single policy that will satisfy everyone.
> Just as a simple example, OLTP databases (especially with small queue
> tables) have very different vacuum needs than data warehouses.
> 

That largely depends on what knobs would be exposed. I'm against adding
some low-level knobs that perhaps 1% of the users will know how to tune,
and the rest will set it incorrectly. Some high-level options that would
specify the workload type might work, but I have no idea about details.

> One fairly simple option would be to simply replace the logic that
> currently builds a worker's table list with running a query via SPI.
> That would allow for prioritizing important tables. It could also reduce
> the problem of workers getting "stuck" on a ton of large tables by
> taking into consideration the total number of pages/tuples a list contains.
> 

I don't see why SPI would be needed to do that, i.e. why couldn't we
implement such prioritization with the current approach. Another thing
is I really doubt prioritizing "important tables" is an good solution,
as it does not really guarantee anything.

> A more fine-grained approach would be to have workers make a new
> selection after every vacuum they complete. That would provide the
> ultimate in control, since you'd be able to see exactly what all the
> other workers are doing.

That was proposed earlier in this thread, and the issue is it may starve
all the other tables when the "important" tables need cleanup all the time.

regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Petar Bogdanovic
Date:
Subject: use of getcwd(3)/chdir(2) during path resolution (exec.c)
Next
From: Jim Nasby
Date:
Subject: Re: autovacuum: change priority of the vacuumed tables