Re: new autovacuum criterion for visible pages - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: new autovacuum criterion for visible pages
Date
Msg-id CAB7nPqToeywaFN84xRkZAoPZRfmbo2-dO0vm0UfyLpwQs5uTiQ@mail.gmail.com
Whole thread Raw
In response to Re: new autovacuum criterion for visible pages  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: new autovacuum criterion for visible pages  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I wanted to create a new relopt named something like
>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>> vacuum a table once less than a certain fraction of the relation's
>> pages are marked allvisible.
>
> Interesting idea.
>
>> 1) One issue is that pg_class.relpages and pg_class.relallvisible are
>> themselves only updated by vacuum/analyze.  In the absence of manual
>> vacuum or analyze, this means that if the new criterion uses those
>> field, it could only kick in after an autoanalyze has already been
>> done, which means that autovacuum_vacuum_pagevisible_factor could not
>> meaningfully be set lower than autovacuum_analyze_scale_factor.
>>
>> Should relallvisible be moved/copied from pg_class to
>> pg_stat_all_tables, so that it is maintained by the stats collector?
>> Or should the autovacuum worker just walk the vm of every table with a
>> defined autovacuum_vacuum_pagevisible_factor each time it is launched
>> to get an up-to-date count that way?
>
> relation_needs_vacanalyze has access to Form_pg_class, so it is not a
> problem to use the value of relallvisible there to decide if a
> vacuum/analyze should be run.

Doh. I missed your point. One idea perhaps would be to have an
additional field that updates the number of pages having their VM bits
cleared, or just decrement relallvisible when that happens, and use
that in relation_needs_vacanalyze to do the decision-making. But that
would require updating stats each time there is a VM cleared in heap
operations, which would be really costly...

The optimizer does not depend directly on pgstat when fetching the
estimation information it needs, so it may be wiser to not add this
dependency, and one can disable pgstat_track_counts so moving this
information out of pg_class is not a good idea.
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Assertion failure in REL9_5_STABLE
Next
From: Shay Rojansky
Date:
Subject: Re: Slowness of extended protocol