Thread: new autovacuum criterion for visible pages
I wanted to create a new relopt named something like autovacuum_vacuum_pagevisible_factor which would cause autovacuum to vacuum a table once less than a certain fraction of the relation's pages are marked allvisible. I wanted some feedback on some things. 1) One issue is that pg_class.relpages and pg_class.relallvisible are themselves only updated by vacuum/analyze. In the absence of manual vacuum or analyze, this means that if the new criterion uses those field, it could only kick in after an autoanalyze has already been done, which means that autovacuum_vacuum_pagevisible_factor could not meaningfully be set lower than autovacuum_analyze_scale_factor. Should relallvisible be moved/copied from pg_class to pg_stat_all_tables, so that it is maintained by the stats collector? Or should the autovacuum worker just walk the vm of every table with a defined autovacuum_vacuum_pagevisible_factor each time it is launched to get an up-to-date count that way? 2) Should there be a guc in addition to the relopt? I can't think of a reason why I would want to set this globally, so I'm happy with just a relopt. If it were set globally, it would sure increase the cost for scanning the vm for each table once each naptime. 3) Should there be a autovacuum_vacuum_pagevisible_threshold? The other settings have both a factor and a threshold. I've never understand what the point of the threshold settings is, but presumably there is a point to them. Does that reason also apply to keeping vm tuned up? Cheers, Jeff
On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > I wanted to create a new relopt named something like > autovacuum_vacuum_pagevisible_factor which would cause autovacuum to > vacuum a table once less than a certain fraction of the relation's > pages are marked allvisible. Interesting idea. > 1) One issue is that pg_class.relpages and pg_class.relallvisible are > themselves only updated by vacuum/analyze. In the absence of manual > vacuum or analyze, this means that if the new criterion uses those > field, it could only kick in after an autoanalyze has already been > done, which means that autovacuum_vacuum_pagevisible_factor could not > meaningfully be set lower than autovacuum_analyze_scale_factor. > > Should relallvisible be moved/copied from pg_class to > pg_stat_all_tables, so that it is maintained by the stats collector? > Or should the autovacuum worker just walk the vm of every table with a > defined autovacuum_vacuum_pagevisible_factor each time it is launched > to get an up-to-date count that way? relation_needs_vacanalyze has access to Form_pg_class, so it is not a problem to use the value of relallvisible there to decide if a vacuum/analyze should be run. > 2) Should there be a guc in addition to the relopt? I can't think of > a reason why I would want to set this globally, so I'm happy with just > a relopt. If it were set globally, it would sure increase the cost > for scanning the vm for each table once each naptime. Having a GUC is useful to enforce the default behavior of tables that do not have this parameter directly set with ALTER TABLE. > 3) Should there be a autovacuum_vacuum_pagevisible_threshold? The > other settings have both a factor and a threshold. I've never > understand what the point of the threshold settings is, but presumably > there is a point to them. Does that reason also apply to keeping vm > tuned up? Having both a threshold and a scale would make the most sense to me. It may be difficult for the lambda user to tune those parameters using a number of relation pages. An alternative would be to define those values in kB, like 32MB worth of pages are marked all visible for example. -- Michael
On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> I wanted to create a new relopt named something like >> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >> vacuum a table once less than a certain fraction of the relation's >> pages are marked allvisible. > > Interesting idea. > >> 1) One issue is that pg_class.relpages and pg_class.relallvisible are >> themselves only updated by vacuum/analyze. In the absence of manual >> vacuum or analyze, this means that if the new criterion uses those >> field, it could only kick in after an autoanalyze has already been >> done, which means that autovacuum_vacuum_pagevisible_factor could not >> meaningfully be set lower than autovacuum_analyze_scale_factor. >> >> Should relallvisible be moved/copied from pg_class to >> pg_stat_all_tables, so that it is maintained by the stats collector? >> Or should the autovacuum worker just walk the vm of every table with a >> defined autovacuum_vacuum_pagevisible_factor each time it is launched >> to get an up-to-date count that way? > > relation_needs_vacanalyze has access to Form_pg_class, so it is not a > problem to use the value of relallvisible there to decide if a > vacuum/analyze should be run. Doh. I missed your point. One idea perhaps would be to have an additional field that updates the number of pages having their VM bits cleared, or just decrement relallvisible when that happens, and use that in relation_needs_vacanalyze to do the decision-making. But that would require updating stats each time there is a VM cleared in heap operations, which would be really costly... The optimizer does not depend directly on pgstat when fetching the estimation information it needs, so it may be wiser to not add this dependency, and one can disable pgstat_track_counts so moving this information out of pg_class is not a good idea. -- Michael
On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > I wanted to create a new relopt named something like > autovacuum_vacuum_pagevisible_factor which would cause autovacuum to > vacuum a table once less than a certain fraction of the relation's > pages are marked allvisible. > Why would it more convenient for a user to set such a parameter as compare to existing parameters (autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor)? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> I wanted to create a new relopt named something like >> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >> vacuum a table once less than a certain fraction of the relation's >> pages are marked allvisible. >> > > Why would it more convenient for a user to set such a parameter as > compare to existing parameters (autovacuum_vacuum_threshold + > autovacuum_vacuum_scale_factor)? Insertions and HOT-updates clear vm bits but don't increment the counters that those existing parameters are compared to. Also, the relationship between number of updated/deleted rows and the number of hint-bits cleared can be hard to predict due to possible clustering of the updates into the same blocks. So it would be hard to know what to set the values to. Cheers, Jeff
On Thu, Aug 11, 2016 at 4:21 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I wanted to create a new relopt named something like >>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >>> vacuum a table once less than a certain fraction of the relation's >>> pages are marked allvisible. >> >> Interesting idea. >> >>> 1) One issue is that pg_class.relpages and pg_class.relallvisible are >>> themselves only updated by vacuum/analyze. In the absence of manual >>> vacuum or analyze, this means that if the new criterion uses those >>> field, it could only kick in after an autoanalyze has already been >>> done, which means that autovacuum_vacuum_pagevisible_factor could not >>> meaningfully be set lower than autovacuum_analyze_scale_factor. >>> >>> Should relallvisible be moved/copied from pg_class to >>> pg_stat_all_tables, so that it is maintained by the stats collector? >>> Or should the autovacuum worker just walk the vm of every table with a >>> defined autovacuum_vacuum_pagevisible_factor each time it is launched >>> to get an up-to-date count that way? >> >> relation_needs_vacanalyze has access to Form_pg_class, so it is not a >> problem to use the value of relallvisible there to decide if a >> vacuum/analyze should be run. > > Doh. I missed your point. One idea perhaps would be to have an > additional field that updates the number of pages having their VM bits > cleared, or just decrement relallvisible when that happens, and use > that in relation_needs_vacanalyze to do the decision-making. But that > would require updating stats each time there is a VM cleared in heap > operations, which would be really costly... > > The optimizer does not depend directly on pgstat when fetching the > estimation information it needs, so it may be wiser to not add this > dependency, and one can disable pgstat_track_counts so moving this > information out of pg_class is not a good idea. With a somewhat fresher mind... The main issue regarding this proposal can be summarized as that: as track_counts can be disabled by users so moving relallvisible into pgstat cannot be done except if I am missing something. The VM bits cleared need to be tracked either by decrementing pg_class.relallvisible, with a different counter in pg_class, or with a completely different mechanism. Still I am scared of overall performance impact because as the VM bit clearing is quite spread so pg_class or the new relation where this is tracked would become really bloated. In short, autovacuum will need to scan by itself the VM of each relation and decide based on that. I would not expect much performance impact, but disabling that by default would have no impact on existing deployments. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > In short, autovacuum will need to scan by itself the VM of each > relation and decide based on that. That seems like a worthwhile approach to pursue. The VM is supposed to be small, and if you're worried it isn't, you could sample a few pages of it. I do not think any of the ideas proposed so far for tracking the visibility percentage on-the-fly are very tenable. regards, tom lane
On 8/11/16 10:59 AM, Jeff Janes wrote: > On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I wanted to create a new relopt named something like >>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >>> vacuum a table once less than a certain fraction of the relation's >>> pages are marked allvisible. >>> >> >> Why would it more convenient for a user to set such a parameter as >> compare to existing parameters (autovacuum_vacuum_threshold + >> autovacuum_vacuum_scale_factor)? > > Insertions and HOT-updates clear vm bits but don't increment the > counters that those existing parameters are compared to. > > Also, the relationship between number of updated/deleted rows and the > number of hint-bits cleared can be hard to predict due to possible > clustering of the updates into the same blocks. So it would be hard > to know what to set the values to. I'm wondering if also creating the same options for all-frozen pages would be worthwhile. I don't see an obvious use for that, but maybe someone else does (and adding both at once would presumably be the least amount of work...) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Fri, Aug 12, 2016 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> In short, autovacuum will need to scan by itself the VM of each >> relation and decide based on that. > > That seems like a worthwhile approach to pursue. The VM is supposed to be > small, and if you're worried it isn't, you could sample a few pages of it. > I do not think any of the ideas proposed so far for tracking the > visibility percentage on-the-fly are very tenable. > The one visibility map page can store the information of 32672 heap pages (255MB), but it would be cost if autovacuum scan whole visibility map for all tables. So I think that it's better to provide autovacuum_vacuum_pagevisible_factor as a relopts. And the autovacuum scans or samples the visibility map of table that autovacuum_vacuum_pagevisible_factor is set. Regards, -- Masahiko Sawada
On 8/11/16 11:59 AM, Jeff Janes wrote: > Insertions and HOT-updates clear vm bits but don't increment the > counters that those existing parameters are compared to. > > Also, the relationship between number of updated/deleted rows and the > number of hint-bits cleared can be hard to predict due to possible > clustering of the updates into the same blocks. So it would be hard > to know what to set the values to. Well, the current threshold formulas aren't an exact science either. They just trigger autovacuum often enough relative to table size and activity. Just fudging in the insert and HOT update counters times some factor might be enough, and it would get this functionality out to all users without more effort. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/08/16 15:15, Peter Eisentraut wrote: > On 8/11/16 11:59 AM, Jeff Janes wrote: >> Insertions and HOT-updates clear vm bits but don't increment the >> counters that those existing parameters are compared to. >> >> Also, the relationship between number of updated/deleted rows and the >> number of hint-bits cleared can be hard to predict due to possible >> clustering of the updates into the same blocks. So it would be hard >> to know what to set the values to. > > Well, the current threshold formulas aren't an exact science either. > They just trigger autovacuum often enough relative to table size and > activity. Just fudging in the insert and HOT update counters times some > factor might be enough, and it would get this functionality out to all > users without more effort. I have a patch I wrote a while ago that does this. I haven't submitted it yet because the documentation is lacking. I will post it over the weekend (I had planned to do it before the commitfest anyway). -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Aug 11, 2016 at 9:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Aug 11, 2016 at 8:32 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Aug 11, 2016 at 2:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I wanted to create a new relopt named something like >>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to >>> vacuum a table once less than a certain fraction of the relation's >>> pages are marked allvisible. >>> >> >> Why would it more convenient for a user to set such a parameter as >> compare to existing parameters (autovacuum_vacuum_threshold + >> autovacuum_vacuum_scale_factor)? > > Insertions and HOT-updates clear vm bits but don't increment the > counters that those existing parameters are compared to. > > Also, the relationship between number of updated/deleted rows and the > number of hint-bits cleared can be hard to predict due to possible > clustering of the updates into the same blocks. So it would be hard > to know what to set the values to. > Okay. What I was slightly worried about was that how many users can understand *pagevisible_* parameters as compare to what we have now (number of updated/deleted tuples). However if we have some mechanism where autovacuum can be triggered automatically based on pagevisibility, then I think that would be quite beneficial (not sure, if such a mechanism can be feasible). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> In short, autovacuum will need to scan by itself the VM of each >> relation and decide based on that. > > That seems like a worthwhile approach to pursue. The VM is supposed to be > small, and if you're worried it isn't, you could sample a few pages of it. > I do not think any of the ideas proposed so far for tracking the > visibility percentage on-the-fly are very tenable. Sounds good, but we can't scan the VM for every table, every minute. We need to record something that will tell us how many VM bits have been cleared, which will then allow autovac to do a simple SELECT to decide what needs vacuuming. Vik's proposal to keep track of the rows inserted seems like the best approach to this issue. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
All, * Simon Riggs (simon@2ndquadrant.com) wrote: > On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael.paquier@gmail.com> writes: > >> In short, autovacuum will need to scan by itself the VM of each > >> relation and decide based on that. > > > > That seems like a worthwhile approach to pursue. The VM is supposed to be > > small, and if you're worried it isn't, you could sample a few pages of it. > > I do not think any of the ideas proposed so far for tracking the > > visibility percentage on-the-fly are very tenable. > > Sounds good, but we can't scan the VM for every table, every minute. > We need to record something that will tell us how many VM bits have > been cleared, which will then allow autovac to do a simple SELECT to > decide what needs vacuuming. > > Vik's proposal to keep track of the rows inserted seems like the best > approach to this issue. I tend to agree with Simon on this. I'm also worried that an approach which was based off of a metric like "% of table not all-visible" might result in VACUUM running over and over on a table because it isn't able to actually make any progress towards improving that percentage. We'd have to have some kind of "cool-off" period or something. Tracking INSERTs and then kicking off a VACUUM based on them seems to address that in a natural way and also seems like something that users would generally understand as it's very similar to what we do for UPDATEs and DELETEs. Tracking the INSERTs as a reason to VACUUM is also very natural when you consider the need to update BRIN indexes. I am a bit worried that if we focus just on if the VM needs to be updated or not that we might miss out on cases where we need to VACUUM because the BRIN indexes are out of date. Thanks! Stephen
On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote: > All, > > * Simon Riggs (simon@2ndquadrant.com) wrote: >> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Michael Paquier <michael.paquier@gmail.com> writes: >> >> In short, autovacuum will need to scan by itself the VM of each >> >> relation and decide based on that. >> > >> > That seems like a worthwhile approach to pursue. The VM is supposed to be >> > small, and if you're worried it isn't, you could sample a few pages of it. >> > I do not think any of the ideas proposed so far for tracking the >> > visibility percentage on-the-fly are very tenable. >> >> Sounds good, but we can't scan the VM for every table, every minute. >> We need to record something that will tell us how many VM bits have >> been cleared, which will then allow autovac to do a simple SELECT to >> decide what needs vacuuming. >> >> Vik's proposal to keep track of the rows inserted seems like the best >> approach to this issue. > > I tend to agree with Simon on this. I'm also worried that an approach > which was based off of a metric like "% of table not all-visible" might > result in VACUUM running over and over on a table because it isn't able > to actually make any progress towards improving that percentage. We'd > have to have some kind of "cool-off" period or something. > > Tracking INSERTs and then kicking off a VACUUM based on them seems to > address that in a natural way and also seems like something that users > would generally understand as it's very similar to what we do for > UPDATEs and DELETEs. > > Tracking the INSERTs as a reason to VACUUM is also very natural when you > consider the need to update BRIN indexes. > Another possible advantage of tracking INSERTs is for hash indexes where after split we need to remove tuples from buckets that underwent split recently. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit, * Amit Kapila (amit.kapila16@gmail.com) wrote: > On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote: > > * Simon Riggs (simon@2ndquadrant.com) wrote: > >> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> > Michael Paquier <michael.paquier@gmail.com> writes: > >> >> In short, autovacuum will need to scan by itself the VM of each > >> >> relation and decide based on that. > >> > > >> > That seems like a worthwhile approach to pursue. The VM is supposed to be > >> > small, and if you're worried it isn't, you could sample a few pages of it. > >> > I do not think any of the ideas proposed so far for tracking the > >> > visibility percentage on-the-fly are very tenable. > >> > >> Sounds good, but we can't scan the VM for every table, every minute. > >> We need to record something that will tell us how many VM bits have > >> been cleared, which will then allow autovac to do a simple SELECT to > >> decide what needs vacuuming. > >> > >> Vik's proposal to keep track of the rows inserted seems like the best > >> approach to this issue. > > > > I tend to agree with Simon on this. I'm also worried that an approach > > which was based off of a metric like "% of table not all-visible" might > > result in VACUUM running over and over on a table because it isn't able > > to actually make any progress towards improving that percentage. We'd > > have to have some kind of "cool-off" period or something. > > > > Tracking INSERTs and then kicking off a VACUUM based on them seems to > > address that in a natural way and also seems like something that users > > would generally understand as it's very similar to what we do for > > UPDATEs and DELETEs. > > > > Tracking the INSERTs as a reason to VACUUM is also very natural when you > > consider the need to update BRIN indexes. > > Another possible advantage of tracking INSERTs is for hash indexes > where after split we need to remove tuples from buckets that underwent > split recently. That's a good point also, and for clearing GIN pending lists too, now that I think about it. We really need to get this fixed for PG10. Thanks! Stephen
On Sun, Jan 22, 2017 at 4:45 PM, Stephen Frost <sfrost@snowman.net> wrote:
-- Amit,
* Amit Kapila (amit.kapila16@gmail.com) wrote:
> On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost <sfrost@snowman.net> wrote:That's a good point also, and for clearing GIN pending lists too, now> > * Simon Riggs (simon@2ndquadrant.com) wrote:
> >> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> >> In short, autovacuum will need to scan by itself the VM of each
> >> >> relation and decide based on that.
> >> >
> >> > That seems like a worthwhile approach to pursue. The VM is supposed to be
> >> > small, and if you're worried it isn't, you could sample a few pages of it.
> >> > I do not think any of the ideas proposed so far for tracking the
> >> > visibility percentage on-the-fly are very tenable.
> >>
> >> Sounds good, but we can't scan the VM for every table, every minute.
> >> We need to record something that will tell us how many VM bits have
> >> been cleared, which will then allow autovac to do a simple SELECT to
> >> decide what needs vacuuming.
> >>
> >> Vik's proposal to keep track of the rows inserted seems like the best
> >> approach to this issue.
> >
> > I tend to agree with Simon on this. I'm also worried that an approach
> > which was based off of a metric like "% of table not all-visible" might
> > result in VACUUM running over and over on a table because it isn't able
> > to actually make any progress towards improving that percentage. We'd
> > have to have some kind of "cool-off" period or something.
> >
> > Tracking INSERTs and then kicking off a VACUUM based on them seems to
> > address that in a natural way and also seems like something that users
> > would generally understand as it's very similar to what we do for
> > UPDATEs and DELETEs.
> >
> > Tracking the INSERTs as a reason to VACUUM is also very natural when you
> > consider the need to update BRIN indexes.
>
> Another possible advantage of tracking INSERTs is for hash indexes
> where after split we need to remove tuples from buckets that underwent
> split recently.
that I think about it.
As much as I want my patch to go in, this is not an argument for it. The GIN pending lists will be handled by autoanalyze.
We really need to get this fixed for PG10.
I'm not sure what more I'm supposed to be doing, if anything.
Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sat, Jan 21, 2017 at 1:57 PM, Stephen Frost <sfrost@snowman.net> wrote:
All,
* Simon Riggs (simon@2ndquadrant.com) wrote:
> On 12 August 2016 at 01:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> In short, autovacuum will need to scan by itself the VM of each
> >> relation and decide based on that.
> >
> > That seems like a worthwhile approach to pursue. The VM is supposed to be
> > small, and if you're worried it isn't, you could sample a few pages of it.
> > I do not think any of the ideas proposed so far for tracking the
> > visibility percentage on-the-fly are very tenable.
>
> Sounds good, but we can't scan the VM for every table, every minute.
> We need to record something that will tell us how many VM bits have
> been cleared, which will then allow autovac to do a simple SELECT to
> decide what needs vacuuming.
>
> Vik's proposal to keep track of the rows inserted seems like the best
> approach to this issue.
I tend to agree with Simon on this. I'm also worried that an approach
which was based off of a metric like "% of table not all-visible" might
result in VACUUM running over and over on a table because it isn't able
to actually make any progress towards improving that percentage. We'd
have to have some kind of "cool-off" period or something.
This is why I didn't want a global guc for it but only a relopt. I figured I would set it only for tables which I have good reason to know would benefit, because I know that they are both candidates for beneficial IOS, and because it is possible to keep them mostly all-visible with a reasonable amount of vacuum work. I think that this is a small but important subset of tables (currently, I personally have zero such tables, but that could increase if covering indexes get implemented). It would be nice to have settings that users of all experience level would understand (or no settings at all), but I don't think that it is feasible to have that without compromising the basic functionality.
Tracking INSERTs and then kicking off a VACUUM based on them seems to
address that in a natural way and also seems like something that users
would generally understand as it's very similar to what we do for
UPDATEs and DELETEs.
If we do an INSERT based count with an extra knob to control how that gets weighted when added to the vacuum threshold function, then I could use that knob to micromanage to my satisfaction. If there is no knob added, then I think that I and many other people are going to see their vacuum workload skyrocket for no benefit and with no recourse, other than disabling autovac.
Cheers,
Jeff
Stephen Frost wrote: > Tracking the INSERTs as a reason to VACUUM is also very natural when you > consider the need to update BRIN indexes. I am a bit worried that if we > focus just on if the VM needs to be updated or not that we might miss > out on cases where we need to VACUUM because the BRIN indexes are out of > date. Actually, I was thinking in a different approach for BRIN -- namely that brininsert, on its fast path out where it sees an insert on a non-summarized range, record on the index's metapage what the last inserted page is, as well as what's the latest summarized pages. When a range is complete, it saves the index OID in a DSA so that the next autovacuum worker knows to run (the equivalent of) brinsummarize on the index.(*) That's efficient because this only occurs when it is known that a range can be usefully summarized, instead of using a heuristic based on tuples inserted, which may or may not correspond to a certain number of filled pages. (*) I think it's better if this only summarizes complete ranges, leaving incomplete ones alone (which is what it does now). That way, if there's a bulk insert going, we leave it to run at full speed the whole time. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services