Thread: new autovacuum criterion for visible pages

new autovacuum criterion for visible pages

From
Jeff Janes
Date:
I wanted to create a new relopt named something like
autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
vacuum a table once less than a certain fraction of the relation's
pages are marked allvisible.

I wanted some feedback on some things.

1) One issue is that pg_class.relpages and pg_class.relallvisible are
themselves only updated by vacuum/analyze.  In the absence of manual
vacuum or analyze, this means that if the new criterion uses those
field, it could only kick in after an autoanalyze has already been
done, which means that autovacuum_vacuum_pagevisible_factor could not
meaningfully be set lower than autovacuum_analyze_scale_factor.

Should relallvisible be moved/copied from pg_class to
pg_stat_all_tables, so that it is maintained by the stats collector?
Or should the autovacuum worker just walk the vm of every table with a
defined autovacuum_vacuum_pagevisible_factor each time it is launched
to get an up-to-date count that way?

2) Should there be a guc in addition to the relopt?  I can't think of
a reason why I would want to set this globally, so I'm happy with just
a relopt.  If it were set globally, it would sure increase the cost
for scanning the vm for each table once each naptime.

3) Should there be a autovacuum_vacuum_pagevisible_threshold?  The
other settings have both a factor and a threshold.  I've never
understand what the point of the threshold settings is, but presumably
there is a point to them.  Does that reason also apply to keeping vm
tuned up?

Cheers,

Jeff



Re: new autovacuum criterion for visible pages

From
Michael Paquier
Date:
On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I wanted to create a new relopt named something like
> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
> vacuum a table once less than a certain fraction of the relation's
> pages are marked allvisible.

Interesting idea.

> 1) One issue is that pg_class.relpages and pg_class.relallvisible are
> themselves only updated by vacuum/analyze.  In the absence of manual
> vacuum or analyze, this means that if the new criterion uses those
> field, it could only kick in after an autoanalyze has already been
> done, which means that autovacuum_vacuum_pagevisible_factor could not
> meaningfully be set lower than autovacuum_analyze_scale_factor.
>
> Should relallvisible be moved/copied from pg_class to
> pg_stat_all_tables, so that it is maintained by the stats collector?
> Or should the autovacuum worker just walk the vm of every table with a
> defined autovacuum_vacuum_pagevisible_factor each time it is launched
> to get an up-to-date count that way?

relation_needs_vacanalyze has access to Form_pg_class, so it is not a
problem to use the value of relallvisible there to decide if a
vacuum/analyze should be run.

> 2) Should there be a guc in addition to the relopt?  I can't think of
> a reason why I would want to set this globally, so I'm happy with just
> a relopt.  If it were set globally, it would sure increase the cost
> for scanning the vm for each table once each naptime.

Having a GUC is useful to enforce the default behavior of tables that
do not have this parameter directly set with ALTER TABLE.

> 3) Should there be a autovacuum_vacuum_pagevisible_threshold?  The
> other settings have both a factor and a threshold.  I've never
> understand what the point of the threshold settings is, but presumably
> there is a point to them.  Does that reason also apply to keeping vm
> tuned up?

Having both a threshold and a scale would make the most sense to me.
It may be difficult for the lambda user to tune those parameters using
a number of relation pages. An alternative would be to define those
values in kB, like 32MB worth of pages are marked all visible for
example.
-- 
Michael



Re: new autovacuum criterion for visible pages

From
Michael Paquier
Date:
On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I wanted to create a new relopt named something like
>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>> vacuum a table once less than a certain fraction of the relation's
>> pages are marked allvisible.
>
> Interesting idea.
>
>> 1) One issue is that pg_class.relpages and pg_class.relallvisible are
>> themselves only updated by vacuum/analyze.  In the absence of manual
>> vacuum or analyze, this means that if the new criterion uses those
>> field, it could only kick in after an autoanalyze has already been
>> done, which means that autovacuum_vacuum_pagevisible_factor could not
>> meaningfully be set lower than autovacuum_analyze_scale_factor.
>>
>> Should relallvisible be moved/copied from pg_class to
>> pg_stat_all_tables, so that it is maintained by the stats collector?
>> Or should the autovacuum worker just walk the vm of every table with a
>> defined autovacuum_vacuum_pagevisible_factor each time it is launched
>> to get an up-to-date count that way?
>
> relation_needs_vacanalyze has access to Form_pg_class, so it is not a
> problem to use the value of relallvisible there to decide if a
> vacuum/analyze should be run.

Doh. I missed your point. One idea perhaps would be to have an
additional field that updates the number of pages having their VM bits
cleared, or just decrement relallvisible when that happens, and use
that in relation_needs_vacanalyze to do the decision-making. But that
would require updating stats each time there is a VM cleared in heap
operations, which would be really costly...

The optimizer does not depend directly on pgstat when fetching the
estimation information it needs, so it may be wiser to not add this
dependency, and one can disable pgstat_track_counts so moving this
information out of pg_class is not a good idea.
-- 
Michael



Re: new autovacuum criterion for visible pages

From
Amit Kapila
Date:
On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I wanted to create a new relopt named something like
> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
> vacuum a table once less than a certain fraction of the relation's
> pages are marked allvisible.
>

Why would it more convenient for a user to set such a parameter as
compare to existing parameters (autovacuum_vacuum_threshold +
autovacuum_vacuum_scale_factor)?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: new autovacuum criterion for visible pages

From
Jeff Janes
Date:
On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> I wanted to create a new relopt named something like
>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>> vacuum a table once less than a certain fraction of the relation's
>> pages are marked allvisible.
>>
>
> Why would it more convenient for a user to set such a parameter as
> compare to existing parameters (autovacuum_vacuum_threshold +
> autovacuum_vacuum_scale_factor)?

Insertions and HOT-updates clear vm bits but don't increment the
counters that those existing parameters are compared to.

Also, the relationship between number of updated/deleted rows and the
number of hint-bits cleared can be hard to predict due to possible
clustering of the updates into the same blocks.  So it would be hard
to know what to set the values to.

Cheers,

Jeff



Re: new autovacuum criterion for visible pages

From
Michael Paquier
Date:
On Thu, Aug 11, 2016 at 4:21 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> I wanted to create a new relopt named something like
>>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>>> vacuum a table once less than a certain fraction of the relation's
>>> pages are marked allvisible.
>>
>> Interesting idea.
>>
>>> 1) One issue is that pg_class.relpages and pg_class.relallvisible are
>>> themselves only updated by vacuum/analyze.  In the absence of manual
>>> vacuum or analyze, this means that if the new criterion uses those
>>> field, it could only kick in after an autoanalyze has already been
>>> done, which means that autovacuum_vacuum_pagevisible_factor could not
>>> meaningfully be set lower than autovacuum_analyze_scale_factor.
>>>
>>> Should relallvisible be moved/copied from pg_class to
>>> pg_stat_all_tables, so that it is maintained by the stats collector?
>>> Or should the autovacuum worker just walk the vm of every table with a
>>> defined autovacuum_vacuum_pagevisible_factor each time it is launched
>>> to get an up-to-date count that way?
>>
>> relation_needs_vacanalyze has access to Form_pg_class, so it is not a
>> problem to use the value of relallvisible there to decide if a
>> vacuum/analyze should be run.
>
> Doh. I missed your point. One idea perhaps would be to have an
> additional field that updates the number of pages having their VM bits
> cleared, or just decrement relallvisible when that happens, and use
> that in relation_needs_vacanalyze to do the decision-making. But that
> would require updating stats each time there is a VM cleared in heap
> operations, which would be really costly...
>
> The optimizer does not depend directly on pgstat when fetching the
> estimation information it needs, so it may be wiser to not add this
> dependency, and one can disable pgstat_track_counts so moving this
> information out of pg_class is not a good idea.

With a somewhat fresher mind...

The main issue regarding this proposal can be summarized as that: as
track_counts can be disabled by users so moving relallvisible into
pgstat cannot be done except if I am missing something. The VM bits
cleared need to be tracked either by decrementing
pg_class.relallvisible, with a different counter in pg_class, or with
a completely different mechanism. Still I am scared of overall
performance impact because as the VM bit clearing is quite spread so
pg_class or the new relation where this is tracked would become really
bloated.

In short, autovacuum will need to scan by itself the VM of each
relation and decide based on that. I would not expect much performance
impact, but disabling that by default would have no impact on existing
deployments.
-- 
Michael



Re: new autovacuum criterion for visible pages

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> In short, autovacuum will need to scan by itself the VM of each
> relation and decide based on that.

That seems like a worthwhile approach to pursue.  The VM is supposed to be
small, and if you're worried it isn't, you could sample a few pages of it.
I do not think any of the ideas proposed so far for tracking the
visibility percentage on-the-fly are very tenable.
        regards, tom lane



Re: new autovacuum criterion for visible pages

From
Jim Nasby
Date:
On 8/11/16 10:59 AM, Jeff Janes wrote:
> On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> I wanted to create a new relopt named something like
>>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>>> vacuum a table once less than a certain fraction of the relation's
>>> pages are marked allvisible.
>>>
>>
>> Why would it more convenient for a user to set such a parameter as
>> compare to existing parameters (autovacuum_vacuum_threshold +
>> autovacuum_vacuum_scale_factor)?
>
> Insertions and HOT-updates clear vm bits but don't increment the
> counters that those existing parameters are compared to.
>
> Also, the relationship between number of updated/deleted rows and the
> number of hint-bits cleared can be hard to predict due to possible
> clustering of the updates into the same blocks.  So it would be hard
> to know what to set the values to.

I'm wondering if also creating the same options for all-frozen pages 
would be worthwhile. I don't see an obvious use for that, but maybe 
someone else does (and adding both at once would presumably be the least 
amount of work...)
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: new autovacuum criterion for visible pages

From
Masahiko Sawada
Date:
On Fri, Aug 12, 2016 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> In short, autovacuum will need to scan by itself the VM of each
>> relation and decide based on that.
>
> That seems like a worthwhile approach to pursue.  The VM is supposed to be
> small, and if you're worried it isn't, you could sample a few pages of it.
> I do not think any of the ideas proposed so far for tracking the
> visibility percentage on-the-fly are very tenable.
>

The one visibility map page can store the information of 32672 heap
pages (255MB), but it would be cost if autovacuum scan whole
visibility map for all tables.
So I think that it's better to provide
autovacuum_vacuum_pagevisible_factor as a relopts.
And the autovacuum scans or samples the visibility map of table that
autovacuum_vacuum_pagevisible_factor is set.

Regards,

--
Masahiko Sawada



Re: new autovacuum criterion for visible pages

From
Peter Eisentraut
Date:
On 8/11/16 11:59 AM, Jeff Janes wrote:
> Insertions and HOT-updates clear vm bits but don't increment the
> counters that those existing parameters are compared to.
> 
> Also, the relationship between number of updated/deleted rows and the
> number of hint-bits cleared can be hard to predict due to possible
> clustering of the updates into the same blocks.  So it would be hard
> to know what to set the values to.

Well, the current threshold formulas aren't an exact science either.
They just trigger autovacuum often enough relative to table size and
activity.  Just fudging in the insert and HOT update counters times some
factor might be enough, and it would get this functionality out to all
users without more effort.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: new autovacuum criterion for visible pages

From
Vik Fearing
Date:
On 12/08/16 15:15, Peter Eisentraut wrote:
> On 8/11/16 11:59 AM, Jeff Janes wrote:
>> Insertions and HOT-updates clear vm bits but don't increment the
>> counters that those existing parameters are compared to.
>>
>> Also, the relationship between number of updated/deleted rows and the
>> number of hint-bits cleared can be hard to predict due to possible
>> clustering of the updates into the same blocks.  So it would be hard
>> to know what to set the values to.
> 
> Well, the current threshold formulas aren't an exact science either.
> They just trigger autovacuum often enough relative to table size and
> activity.  Just fudging in the insert and HOT update counters times some
> factor might be enough, and it would get this functionality out to all
> users without more effort.

I have a patch I wrote a while ago that does this.  I haven't submitted
it yet because the documentation is lacking.  I will post it over the
weekend (I had planned to do it before the commitfest anyway).
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: new autovacuum criterion for visible pages

From
Amit Kapila
Date:
On Thu, Aug 11, 2016 at 9:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> I wanted to create a new relopt named something like
>>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>>> vacuum a table once less than a certain fraction of the relation's
>>> pages are marked allvisible.
>>>
>>
>> Why would it more convenient for a user to set such a parameter as
>> compare to existing parameters (autovacuum_vacuum_threshold +
>> autovacuum_vacuum_scale_factor)?
>
> Insertions and HOT-updates clear vm bits but don't increment the
> counters that those existing parameters are compared to.
>
> Also, the relationship between number of updated/deleted rows and the
> number of hint-bits cleared can be hard to predict due to possible
> clustering of the updates into the same blocks.  So it would be hard
> to know what to set the values to.
>

Okay.  What I was slightly worried about was that how many users can
understand *pagevisible_* parameters as compare to what we have now
(number of updated/deleted tuples).  However if we have some mechanism
where autovacuum can be triggered automatically based on
pagevisibility, then I think that would be quite beneficial (not sure,
if such a mechanism can be feasible).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: new autovacuum criterion for visible pages

From
Simon Riggs
Date:
On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> In short, autovacuum will need to scan by itself the VM of each
>> relation and decide based on that.
>
> That seems like a worthwhile approach to pursue.  The VM is supposed to be
> small, and if you're worried it isn't, you could sample a few pages of it.
> I do not think any of the ideas proposed so far for tracking the
> visibility percentage on-the-fly are very tenable.

Sounds good, but we can't scan the VM for every table, every minute.
We need to record something that will tell us how many VM bits have
been cleared, which will then allow autovac to do a simple SELECT to
decide what needs vacuuming.

Vik's proposal to keep track of the rows inserted seems like the best
approach to this issue.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] new autovacuum criterion for visible pages

From
Stephen Frost
Date:
All,

* Simon Riggs (simon@2ndquadrant.com) wrote:
> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> In short, autovacuum will need to scan by itself the VM of each
> >> relation and decide based on that.
> >
> > That seems like a worthwhile approach to pursue.  The VM is supposed to be
> > small, and if you're worried it isn't, you could sample a few pages of it.
> > I do not think any of the ideas proposed so far for tracking the
> > visibility percentage on-the-fly are very tenable.
>
> Sounds good, but we can't scan the VM for every table, every minute.
> We need to record something that will tell us how many VM bits have
> been cleared, which will then allow autovac to do a simple SELECT to
> decide what needs vacuuming.
>
> Vik's proposal to keep track of the rows inserted seems like the best
> approach to this issue.

I tend to agree with Simon on this.  I'm also worried that an approach
which was based off of a metric like "% of table not all-visible" might
result in VACUUM running over and over on a table because it isn't able
to actually make any progress towards improving that percentage.  We'd
have to have some kind of "cool-off" period or something.

Tracking INSERTs and then kicking off a VACUUM based on them seems to
address that in a natural way and also seems like something that users
would generally understand as it's very similar to what we do for
UPDATEs and DELETEs.

Tracking the INSERTs as a reason to VACUUM is also very natural when you
consider the need to update BRIN indexes.  I am a bit worried that if we
focus just on if the VM needs to be updated or not that we might miss
out on cases where we need to VACUUM because the BRIN indexes are out of
date.

Thanks!

Stephen

Re: [HACKERS] new autovacuum criterion for visible pages

From
Amit Kapila
Date:
On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote:
> All,
>
> * Simon Riggs (simon@2ndquadrant.com) wrote:
>> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Michael Paquier <michael.paquier@gmail.com> writes:
>> >> In short, autovacuum will need to scan by itself the VM of each
>> >> relation and decide based on that.
>> >
>> > That seems like a worthwhile approach to pursue.  The VM is supposed to be
>> > small, and if you're worried it isn't, you could sample a few pages of it.
>> > I do not think any of the ideas proposed so far for tracking the
>> > visibility percentage on-the-fly are very tenable.
>>
>> Sounds good, but we can't scan the VM for every table, every minute.
>> We need to record something that will tell us how many VM bits have
>> been cleared, which will then allow autovac to do a simple SELECT to
>> decide what needs vacuuming.
>>
>> Vik's proposal to keep track of the rows inserted seems like the best
>> approach to this issue.
>
> I tend to agree with Simon on this.  I'm also worried that an approach
> which was based off of a metric like "% of table not all-visible" might
> result in VACUUM running over and over on a table because it isn't able
> to actually make any progress towards improving that percentage.  We'd
> have to have some kind of "cool-off" period or something.
>
> Tracking INSERTs and then kicking off a VACUUM based on them seems to
> address that in a natural way and also seems like something that users
> would generally understand as it's very similar to what we do for
> UPDATEs and DELETEs.
>
> Tracking the INSERTs as a reason to VACUUM is also very natural when you
> consider the need to update BRIN indexes.
>

Another possible advantage of tracking INSERTs is for hash indexes
where after split we need to remove tuples from buckets that underwent
split recently.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] new autovacuum criterion for visible pages

From
Stephen Frost
Date:
Amit,

* Amit Kapila (amit.kapila16@gmail.com) wrote:
> On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Simon Riggs (simon@2ndquadrant.com) wrote:
> >> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> >> In short, autovacuum will need to scan by itself the VM of each
> >> >> relation and decide based on that.
> >> >
> >> > That seems like a worthwhile approach to pursue.  The VM is supposed to be
> >> > small, and if you're worried it isn't, you could sample a few pages of it.
> >> > I do not think any of the ideas proposed so far for tracking the
> >> > visibility percentage on-the-fly are very tenable.
> >>
> >> Sounds good, but we can't scan the VM for every table, every minute.
> >> We need to record something that will tell us how many VM bits have
> >> been cleared, which will then allow autovac to do a simple SELECT to
> >> decide what needs vacuuming.
> >>
> >> Vik's proposal to keep track of the rows inserted seems like the best
> >> approach to this issue.
> >
> > I tend to agree with Simon on this.  I'm also worried that an approach
> > which was based off of a metric like "% of table not all-visible" might
> > result in VACUUM running over and over on a table because it isn't able
> > to actually make any progress towards improving that percentage.  We'd
> > have to have some kind of "cool-off" period or something.
> >
> > Tracking INSERTs and then kicking off a VACUUM based on them seems to
> > address that in a natural way and also seems like something that users
> > would generally understand as it's very similar to what we do for
> > UPDATEs and DELETEs.
> >
> > Tracking the INSERTs as a reason to VACUUM is also very natural when you
> > consider the need to update BRIN indexes.
>
> Another possible advantage of tracking INSERTs is for hash indexes
> where after split we need to remove tuples from buckets that underwent
> split recently.

That's a good point also, and for clearing GIN pending lists too, now
that I think about it.

We really need to get this fixed for PG10.

Thanks!

Stephen

Re: [HACKERS] new autovacuum criterion for visible pages

From
Vik Fearing
Date:
On Sun, Jan 22, 2017 at 4:45 PM, Stephen Frost <sfrost@snowman.net> wrote:
Amit,

* Amit Kapila (amit.kapila16@gmail.com) wrote:
> On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > * Simon Riggs (simon@2ndquadrant.com) wrote:
> >> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> >> In short, autovacuum will need to scan by itself the VM of each
> >> >> relation and decide based on that.
> >> >
> >> > That seems like a worthwhile approach to pursue.  The VM is supposed to be
> >> > small, and if you're worried it isn't, you could sample a few pages of it.
> >> > I do not think any of the ideas proposed so far for tracking the
> >> > visibility percentage on-the-fly are very tenable.
> >>
> >> Sounds good, but we can't scan the VM for every table, every minute.
> >> We need to record something that will tell us how many VM bits have
> >> been cleared, which will then allow autovac to do a simple SELECT to
> >> decide what needs vacuuming.
> >>
> >> Vik's proposal to keep track of the rows inserted seems like the best
> >> approach to this issue.
> >
> > I tend to agree with Simon on this.  I'm also worried that an approach
> > which was based off of a metric like "% of table not all-visible" might
> > result in VACUUM running over and over on a table because it isn't able
> > to actually make any progress towards improving that percentage.  We'd
> > have to have some kind of "cool-off" period or something.
> >
> > Tracking INSERTs and then kicking off a VACUUM based on them seems to
> > address that in a natural way and also seems like something that users
> > would generally understand as it's very similar to what we do for
> > UPDATEs and DELETEs.
> >
> > Tracking the INSERTs as a reason to VACUUM is also very natural when you
> > consider the need to update BRIN indexes.
>
> Another possible advantage of tracking INSERTs is for hash indexes
> where after split we need to remove tuples from buckets that underwent
> split recently.

That's a good point also, and for clearing GIN pending lists too, now
that I think about it.

As much as I want my patch to go in, this is not an argument for it. The GIN pending lists will be handled by autoanalyze.
 
We really need to get this fixed for PG10.

I'm not sure what more I'm supposed to be doing, if anything.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: [HACKERS] new autovacuum criterion for visible pages

From
Jeff Janes
Date:
On Sat, Jan 21, 2017 at 1:57 PM, Stephen Frost <sfrost@snowman.net> wrote:
All,

* Simon Riggs (simon@2ndquadrant.com) wrote:
> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> In short, autovacuum will need to scan by itself the VM of each
> >> relation and decide based on that.
> >
> > That seems like a worthwhile approach to pursue.  The VM is supposed to be
> > small, and if you're worried it isn't, you could sample a few pages of it.
> > I do not think any of the ideas proposed so far for tracking the
> > visibility percentage on-the-fly are very tenable.
>
> Sounds good, but we can't scan the VM for every table, every minute.
> We need to record something that will tell us how many VM bits have
> been cleared, which will then allow autovac to do a simple SELECT to
> decide what needs vacuuming.
>
> Vik's proposal to keep track of the rows inserted seems like the best
> approach to this issue.

I tend to agree with Simon on this.  I'm also worried that an approach
which was based off of a metric like "% of table not all-visible" might
result in VACUUM running over and over on a table because it isn't able
to actually make any progress towards improving that percentage.  We'd
have to have some kind of "cool-off" period or something.

This is why I didn't want a global guc for it but only a relopt.  I figured I would set it only for tables which I have good reason to know would benefit, because I know that they are both candidates for beneficial IOS, and because it is possible to keep them mostly all-visible with a reasonable amount of vacuum work.  I think that this is a small but important subset of tables (currently, I personally have zero such tables, but that could increase if covering indexes get implemented).  It would be nice to have settings that users of all experience level would understand (or no settings at all), but I don't think that it is feasible to have that without compromising the basic functionality.



Tracking INSERTs and then kicking off a VACUUM based on them seems to
address that in a natural way and also seems like something that users
would generally understand as it's very similar to what we do for
UPDATEs and DELETEs.

If we do an INSERT based count with an extra knob to control how that gets weighted when added to the vacuum threshold function, then I could use that knob to micromanage to my satisfaction.  If there is no knob added, then I think that I and many other people are going to see their vacuum workload skyrocket for no benefit and with no recourse, other than disabling autovac.

Cheers,

Jeff

Re: [HACKERS] new autovacuum criterion for visible pages

From
Alvaro Herrera
Date:
Stephen Frost wrote:

> Tracking the INSERTs as a reason to VACUUM is also very natural when you
> consider the need to update BRIN indexes.  I am a bit worried that if we
> focus just on if the VM needs to be updated or not that we might miss
> out on cases where we need to VACUUM because the BRIN indexes are out of
> date.

Actually, I was thinking in a different approach for BRIN -- namely that
brininsert, on its fast path out where it sees an insert on a
non-summarized range, record on the index's metapage what the last
inserted page is, as well as what's the latest summarized pages.  When a
range is complete, it saves the index OID in a DSA so that the next
autovacuum worker knows to run (the equivalent of) brinsummarize on the
index.(*)

That's efficient because this only occurs when it is known that a range
can be usefully summarized, instead of using a heuristic based on tuples
inserted, which may or may not correspond to a certain number of filled
pages.

(*) I think it's better if this only summarizes complete ranges, leaving
incomplete ones alone (which is what it does now).  That way, if there's
a bulk insert going, we leave it to run at full speed the whole time.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services