Thread: Per table autovacuum vacuum cost limit behaviour strange
A while back we were discussing rapid space bloat of tables under certain circumstances. One further case I am examining is a highly volatile single table, and how to tame its space blowout. I've got a nice simple example (attached). Making use of pgbench to run it as usual (): $ createdb cache $ psql cache < schema.sql $ pgbench -n -c8 -T300 -f volatile0.sql cache ...causes the table (imaginatively named 'cache0') to grow several GB with default autovacuum parameters. Some minimal changes will rein in the growth to about 100MB: $ grep -e naptime -e autovacuum_vacuum_cost_limit postgresql.conf autovacuum_naptime = 5s autovacuum_vacuum_cost_limit = 10000 However the cost_limit setting is likely to be way too aggressive globally. No problem I figured, I'd leave it at the default (200) and use ALTER TABLE to change it for *just* the 'cache0' table: cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_limit=10000); However re-running the pgbench test results in several GB worth of space used by this table. Hmmm - looks like setting this parameter per table does not work how I expected. Looking at src/backend/postmaster/autovacuum.c I see some balancing calculations in autovac_balance_cost() and AutoVacuumUpdateDelay(), the effect which seems to be (after adding some debugging elogs) to reset the actual effective cost_limit back to 200 for this table: viz (rel 16387 is cache0): LOG: autovac_balance_cost(pid=24058 db=16384, rel=16387, cost_limit=200, cost_limit_base=10000, cost_delay=20) LOG: autovac_update_delay(pid=24058 db=16384, rel=16387, cost_limit=200, cost_delay=20) Is this working as intended? I did wonder if it was an artifact of only having 1 table (creating another one made no difference)...or perhaps only 1 active worker... I found I had to lobotomize the balancing calc by doing: cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_delay=0); before I got the same effect as just setting the cost_limit globally. I'm now a bit confused about whether I understand how setting cost_limit and cost_delay via ALTER TABLE works (or in fact if it is working properly for that matter). Regards Mark
Attachment
A while back we were discussing rapid space bloat of tables under certain circumstances. One further case I am examining is a highly volatile single table, and how to tame its space blowout.
I've got a nice simple example (attached). Making use of pgbench to run it as usual ():
$ createdb cache
$ psql cache < schema.sql
$ pgbench -n -c8 -T300 -f volatile0.sql cache
...causes the table (imaginatively named 'cache0') to grow several GB with default autovacuum parameters. Some minimal changes will rein in the growth to about 100MB:
$ grep -e naptime -e autovacuum_vacuum_cost_limit postgresql.conf
autovacuum_naptime = 5s
autovacuum_vacuum_cost_limit = 10000
However the cost_limit setting is likely to be way too aggressive globally. No problem I figured, I'd leave it at the default (200) and use ALTER TABLE to change it for *just* the 'cache0' table:
cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_limit=10000);
However re-running the pgbench test results in several GB worth of space used by this table. Hmmm - looks like setting this parameter per table does not work how I expected. Looking at src/backend/postmaster/autovacuum.c I see some balancing calculations in autovac_balance_cost() and AutoVacuumUpdateDelay(), the effect which seems to be (after adding some debugging elogs) to reset the actual effective cost_limit back to 200 for this table: viz (rel 16387 is cache0):
LOG: autovac_balance_cost(pid=24058 db=16384, rel=16387, cost_limit=200, cost_limit_base=10000, cost_delay=20)
LOG: autovac_update_delay(pid=24058 db=16384, rel=16387, cost_limit=200, cost_delay=20)
Is this working as intended? I did wonder if it was an artifact of only having 1 table (creating another one made no difference)...or perhaps only 1 active worker... I found I had to lobotomize the balancing calc by doing:
cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_delay=0);
before I got the same effect as just setting the cost_limit globally. I'm now a bit confused about whether I understand how setting cost_limit and cost_delay via ALTER TABLE works (or in fact if it is working properly for that matter).
On 13/02/14 17:13, Haribabu Kommi wrote: > On Wed, Feb 12, 2014 at 12:32 PM, Mark Kirkwood < > mark.kirkwood@catalyst.net.nz> wrote: > >> A while back we were discussing rapid space bloat of tables under certain >> circumstances. One further case I am examining is a highly volatile single >> table, and how to tame its space blowout. >> >> I've got a nice simple example (attached). Making use of pgbench to run it >> as usual (): >> >> $ createdb cache >> $ psql cache < schema.sql >> $ pgbench -n -c8 -T300 -f volatile0.sql cache >> >> ...causes the table (imaginatively named 'cache0') to grow several GB with >> default autovacuum parameters. Some minimal changes will rein in the growth >> to about 100MB: >> >> $ grep -e naptime -e autovacuum_vacuum_cost_limit postgresql.conf >> autovacuum_naptime = 5s >> autovacuum_vacuum_cost_limit = 10000 >> >> However the cost_limit setting is likely to be way too aggressive >> globally. No problem I figured, I'd leave it at the default (200) and use >> ALTER TABLE to change it for *just* the 'cache0' table: >> >> cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_limit=10000); >> >> However re-running the pgbench test results in several GB worth of space >> used by this table. Hmmm - looks like setting this parameter per table does >> not work how I expected. Looking at src/backend/postmaster/autovacuum.c I >> see some balancing calculations in autovac_balance_cost() and >> AutoVacuumUpdateDelay(), the effect which seems to be (after adding some >> debugging elogs) to reset the actual effective cost_limit back to 200 for >> this table: viz (rel 16387 is cache0): >> >> >> LOG: autovac_balance_cost(pid=24058 db=16384, rel=16387, cost_limit=200, >> cost_limit_base=10000, cost_delay=20) >> LOG: autovac_update_delay(pid=24058 db=16384, rel=16387, cost_limit=200, >> cost_delay=20) >> >> Is this working as intended? I did wonder if it was an artifact of only >> having 1 table (creating another one made no difference)...or perhaps only >> 1 active worker... I found I had to lobotomize the balancing calc by doing: >> >> cache=# ALTER TABLE cache0 SET (autovacuum_vacuum_cost_delay=0); >> >> before I got the same effect as just setting the cost_limit globally. I'm >> now a bit confused about whether I understand how setting cost_limit and >> cost_delay via ALTER TABLE works (or in fact if it is working properly for >> that matter). >> > When I go through the code for checking the same, I got the following > behavior. > > The default values of vacuum parameters - cost_limit - 200 and cost_delay - > 0 > The default values of auto vacuum parameters - cost_limit - (-1) and > cost_delay - 20ms. > > 1. User is not provided any vacuum parameters to the table, so the vacuum > options for the table are cost_limit - 200 and cost_delay - 20 > 2. User is provided cost_limit as 1000 to the table, so the vacuum options > for the table are cost_limit - 1000 and cost_delay - 20 > > For the above two cases, the "autovac_balance_cost" function sets the cost > parameters as cost_limit - 200 and cost_delay - 20. > > 3. User is provided cost_limit as 1000 and cost_delay as 10 to the table, > so the vacuum options for the table are cost_limit - 1000 and cost_delay - > 10 > > This case the cost_limit - 100 and cost_delay - 10. > > 4. User is provided cost_limit as 1000 and cost_delay as 100 to the table, > so the vacuum options for the table are cost_limit - 1000 and cost_delay - > 100 > > This case the cost_limit - 1000 and cost_delay - 100 > > From the above observations, The cost parameters of vacuum are not working > as they specified. > please correct me if anything wrong in my observation. > FWIW - I can confirm these calculations in 9.4devel. I found the attached patch handy for logging what the balanced limit and delay was. Regards Mark
Attachment
FWIW - I can confirm these calculations in 9.4devel. I found the attached patch handy for logging what the balanced limit and delay was.On 13/02/14 17:13, Haribabu Kommi wrote:When I go through the code for checking the same, I got the following
behavior.
The default values of vacuum parameters - cost_limit - 200 and cost_delay -
0
The default values of auto vacuum parameters - cost_limit - (-1) and
cost_delay - 20ms.
1. User is not provided any vacuum parameters to the table, so the vacuum
options for the table are cost_limit - 200 and cost_delay - 20
2. User is provided cost_limit as 1000 to the table, so the vacuum options
for the table are cost_limit - 1000 and cost_delay - 20
For the above two cases, the "autovac_balance_cost" function sets the cost
parameters as cost_limit - 200 and cost_delay - 20.
3. User is provided cost_limit as 1000 and cost_delay as 10 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
10
This case the cost_limit - 100 and cost_delay - 10.
4. User is provided cost_limit as 1000 and cost_delay as 100 to the table,
so the vacuum options for the table are cost_limit - 1000 and cost_delay -
100
This case the cost_limit - 1000 and cost_delay - 100
From the above observations, The cost parameters of vacuum are not working
as they specified.
please correct me if anything wrong in my observation.
I changed the balance cost calculations a little bit to give priority to the user provided per table autovacuum parameters.
Attachment
I hadn't noticed this thread. I will give this a look. Thanks for providing a patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Haribabu Kommi escribió: > I changed the balance cost calculations a little bit to give priority to > the user provided per table autovacuum parameters. > If any user specified per table vacuum parameters exists and those are > different with guc vacuum parameters then the > balance cost calculations will not include that worker in calculation. Only > the cost is distributed between other workers > with specified guc vacuum cost parameter. > > The problem in this calculation is if the user provides same guc values to > the per table values also then it doesn't consider them in calculation. I think this is a strange approach to the problem, because if you configure the backends just so, they are completely ignored instead of being adjusted. And this would have action-at-a-distance consequences because if you change the defaults in postgresql.conf you end up with completely different behavior on the tables for which you have carefully tuned the delay so that they are ignored in rebalance calculations. I think that rather than ignoring some backends completely, we should be looking at how to "weight" the balancing calculations among all the backends in some smart way that doesn't mean they end up with the default values of limit, which AFAIU is what happens now -- which is stupid. Not real sure how to do that, perhaps base it on the globally-configured ratio of delay/limit vs. the table-specific ratio. What I mean is that perhaps the current approach is all wrong and we need to find a better algorithm to suit this case and more generally. Of course, I don't mean to say that it should behave completely differently than now in normal cases, only that it shouldn't give completely stupid results in corner cases such as this one. As an example, suppose that global limit=200 and global delay=20 (the defaults). Then we have a global ratio of 5. If all three tables being vacuumed currently are using the default values, then they all have ratio=5 and therefore all should have the same limit and delay settings applied after rebalance. Now, if two tables have ratio=5 and one table has been configured to have a very fast vacuum, that is limit=10000, then ratio for that table is 10000/20=500. Therefore that table should be configured, after rebalance, to have a limit and delay that are 100 times faster than the settings for the other two tables. (And there is a further constraint that the total delay per "limit unit" should be so-and-so to accomodate getting the correct total delay per limit unit.) I haven't thought about how to code that, but I don't think it should be too difficult. Want to give it a try? I think it makes sense to modify both the running delay and the running limit to achieve whatever ratio we come up with, except that delay should probably not go below 10ms because, apparently, some platforms have that much of sleep granularity and it wouldn't really work to have a smaller delay. Am I making sense? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<p><br /> On Feb 15, 2014 9:19 AM, "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>wrote:<br /> ><br /> > Haribabu Kommi escribió:<br/> ><br /> > > I changed the balance cost calculations a little bit to give priority to<br /> > >the user provided per table autovacuum parameters.<br /> > > If any user specified per table vacuum parametersexists and those are<br /> > > different with guc vacuum parameters then the<br /> > > balance costcalculations will not include that worker in calculation. Only<br /> > > the cost is distributed between otherworkers<br /> > > with specified guc vacuum cost parameter.<br /> > ><br /> > > The problem in thiscalculation is if the user provides same guc values to<br /> > > the per table values also then it doesn't considerthem in calculation.<br /> ><br /> > I think this is a strange approach to the problem, because if you<br />> configure the backends just so, they are completely ignored instead of<br /> > being adjusted. And this wouldhave action-at-a-distance consequences<br /> > because if you change the defaults in postgresql.conf you end up with<br/> > completely different behavior on the tables for which you have carefully<br /> > tuned the delay so thatthey are ignored in rebalance calculations.<br /> ><br /> > I think that rather than ignoring some backends completely,we should be<br /> > looking at how to "weight" the balancing calculations among all the<br /> > backendsin some smart way that doesn't mean they end up with the<br /> > default values of limit, which AFAIU is whathappens now -- which is<br /> > stupid. Not real sure how to do that, perhaps base it on the<br /> > globally-configuredratio of delay/limit vs. the table-specific ratio.<br /> ><br /> > What I mean is that perhaps thecurrent approach is all wrong and we<br /> > need to find a better algorithm to suit this case and more generally.<br/> > Of course, I don't mean to say that it should behave completely<br /> > differently than now in normalcases, only that it shouldn't give<br /> > completely stupid results in corner cases such as this one.<br /> ><br/> > As an example, suppose that global limit=200 and global delay=20 (the<br /> > defaults). Then we havea global ratio of 5. If all three tables being<br /> > vacuumed currently are using the default values, then theyall have<br /> > ratio=5 and therefore all should have the same limit and delay settings<br /> > applied afterrebalance. Now, if two tables have ratio=5 and one table<br /> > has been configured to have a very fast vacuum,that is limit=10000,<br /> > then ratio for that table is 10000/20=500. Therefore that table should<br /> >be configured, after rebalance, to have a limit and delay that are 100<br /> > times faster than the settings forthe other two tables. (And there is<br /> > a further constraint that the total delay per "limit unit" should be<br/> > so-and-so to accomodate getting the correct total delay per limit unit.)<br /> ><br /> > I haven't thoughtabout how to code that, but I don't think it should be<br /> > too difficult. Want to give it a try? I thinkit makes sense to modify<br /> > both the running delay and the running limit to achieve whatever ratio<br /> >we come up with, except that delay should probably not go below 10ms<br /> > because, apparently, some platforms havethat much of sleep granularity<br /> > and it wouldn't really work to have a smaller delay.<br /> ><br /> >Am I making sense?<p>Yes makes sense and it's a good approach also not leaving the delay parameter as is. Thanks I willgive a try.<p>Regards,<br /> Hari Babu<br /> Fujitsu Australia<br />
On Feb 15, 2014 9:19 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
>
> I think this is a strange approach to the problem, because if you
> configure the backends just so, they are completely ignored instead of
> being adjusted. And this would have action-at-a-distance consequences
> because if you change the defaults in postgresql.conf you end up with
> completely different behavior on the tables for which you have carefully
> tuned the delay so that they are ignored in rebalance calculations.
>
> I think that rather than ignoring some backends completely, we should be
> looking at how to "weight" the balancing calculations among all the
> backends in some smart way that doesn't mean they end up with the
> default values of limit, which AFAIU is what happens now -- which is
> stupid. Not real sure how to do that, perhaps base it on the
> globally-configured ratio of delay/limit vs. the table-specific ratio.
>
> What I mean is that perhaps the current approach is all wrong and we
> need to find a better algorithm to suit this case and more generally.
> Of course, I don't mean to say that it should behave completely
> differently than now in normal cases, only that it shouldn't give
> completely stupid results in corner cases such as this one.
>
> As an example, suppose that global limit=200 and global delay=20 (the
> defaults). Then we have a global ratio of 5. If all three tables being
> vacuumed currently are using the default values, then they all have
> ratio=5 and therefore all should have the same limit and delay settings
> applied after rebalance. Now, if two tables have ratio=5 and one table
> has been configured to have a very fast vacuum, that is limit=10000,
> then ratio for that table is 10000/20=500. Therefore that table should
> be configured, after rebalance, to have a limit and delay that are 100
> times faster than the settings for the other two tables. (And there is
> a further constraint that the total delay per "limit unit" should be
> so-and-so to accomodate getting the correct total delay per limit unit.)
>
> I haven't thought about how to code that, but I don't think it should be
> too difficult. Want to give it a try? I think it makes sense to modify
> both the running delay and the running limit to achieve whatever ratio
> we come up with, except that delay should probably not go below 10ms
> because, apparently, some platforms have that much of sleep granularity
> and it wouldn't really work to have a smaller delay.
>
> Am I making sense?Yes makes sense and it's a good approach also not leaving the delay parameter as is. Thanks I will give a try.
Attachment
Haribabu Kommi wrote: > I modified the "autovac_balance_cost" function to balance the costs using > the number of running workers, instead > of default vacuum cost parameters. Just as a heads-up, this patch wasn't part of the commitfest, but I intend to review it and possibly commit for 9.4. Not immediately but at some point. Arguably this is a bug fix, since the autovac rebalance code behaves horribly in cases such as the one described here, so I should consider a backpatch right away. However I don't think it's a good idea to do that without more field testing. Perhaps we can backpatch later if the new code demonstrates its sanity. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > I modified the "autovac_balance_cost" function to balance the costs using > the number of running workers, instead > of default vacuum cost parameters. > > Lets assume there are 4 workers running currently with default cost values > of limit 200 and delay 20ms. > The cost will be distributed as 50 and 10ms each. > > Suppose if one worker is having a different cost limit value as 1000, which > is 5 times more than default value. > The cost will be distributed as 50 and 10ms each for other 3 workers and 250 > and 10ms for the worker having > cost limit value other than default. By this way also it still ensures the > cost limit value is 5 times more than other workers. Won't this change break the basic idea of autovacuum_vacuum_cost_limit which is as follows: "Note that the value is distributed proportionally among the running autovacuum workers, if there is more than one, so that the sum of the limits of each worker never exceeds the limit on this variable.". Basically with proposed change, the sum of limits of each worker will be more than autovacuum_vacuum_cost_limit and I think main reason for same is that the new calculation doesn't consider autovacuum_vacuum_cost_limit or other similar parameters. I think current calculation gives appropriate consideration for table level vacuum settings when autovacuum_vacuum_cost_limit is configured with more care (i.e it is more than table level settings). As an example consider the below case: autovacuum_vacuum_cost_limit = 10000 t1 (autovacuum_vacuum_cost_limit = 1000) t2 (default) t3 (default) t4 (default) Consider other settings as Default. Now cost_limit after autovac_balance_cost is as follows: Worker-1 for t1 = 322 Worker-2 for t2 = 3225 Worker-3 for t3 = 3225 Worker-4 for t3 = 3225 So in this way proper consideration has been given to table level vacuum settings and guc configured for autovacuum_vacuum_cost_limit with current code. Now it might be the case that we want to improve current calculation for cases where it doesn't work well, but I think it has to be better than current behaviour and it is better to consider both guc's and table level settings with some better formula. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, May 5, 2014 at 1:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi > <kommi.haribabu@gmail.com> wrote: >> I modified the "autovac_balance_cost" function to balance the costs using >> the number of running workers, instead >> of default vacuum cost parameters. >> >> Lets assume there are 4 workers running currently with default cost values >> of limit 200 and delay 20ms. >> The cost will be distributed as 50 and 10ms each. >> >> Suppose if one worker is having a different cost limit value as 1000, which >> is 5 times more than default value. >> The cost will be distributed as 50 and 10ms each for other 3 workers and 250 >> and 10ms for the worker having >> cost limit value other than default. By this way also it still ensures the >> cost limit value is 5 times more than other workers. > > Won't this change break the basic idea of autovacuum_vacuum_cost_limit > which is as follows: > "Note that the value is distributed proportionally among the running autovacuum > workers, if there is more than one, so that the sum of the limits of each worker > never exceeds the limit on this variable.". It is not breaking the behavior. This setting can be overridden for individual tables by changing storage parameters. Still the cost values for the default tables are under the guc limit. > Basically with proposed change, the sum of limits of each worker will be more > than autovacuum_vacuum_cost_limit and I think main reason for same is that > the new calculation doesn't consider autovacuum_vacuum_cost_limit or other > similar parameters. If user doesn't provide any table specific value then autovacuum_vacuum_cost_limit guc value is set to the table. So the same is used in the calculation. > I think current calculation gives appropriate consideration for table level > vacuum settings when autovacuum_vacuum_cost_limit is configured > with more care (i.e it is more than table level settings). As an example > consider the below case: > > autovacuum_vacuum_cost_limit = 10000 > t1 (autovacuum_vacuum_cost_limit = 1000) > t2 (default) > t3 (default) > t4 (default) > > Consider other settings as Default. > > Now cost_limit after autovac_balance_cost is as follows: > Worker-1 for t1 = 322 > Worker-2 for t2 = 3225 > Worker-3 for t3 = 3225 > Worker-4 for t3 = 3225 > > So in this way proper consideration has been given to table level > vacuum settings and guc configured for autovacuum_vacuum_cost_limit > with current code. It works for the case where the table specific values less than the default cost limit. The same logic doesn't work with higher values. Usually the table specific values are more than default values to the tables where the faster vacuuming is expected. > Now it might be the case that we want to improve current calculation for > cases where it doesn't work well, but I think it has to be better than current > behaviour and it is better to consider both guc's and table level settings with > some better formula. With the proposed change, it works for both fine whether the table specific value is higher or lower to the default value. It works on the factor of the difference between the default value to the table specific value. default autovacuum_vacuum_cost_limit = 10000 t1 - 1000, t2 - default, t3 - default, t4 - default --> balance costs t1 - 250, t2 - 2500, t3 - 2500, t4 - 2500. t1 - 20000, t2 - default, t3 - default, t4 - default --> balance costs t1 - 5000, t2 - 2500, t3 - 2500, t4 - 2500. Regards, Hari Babu Fujitsu Australia
On Mon, May 5, 2014 at 6:35 AM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > On Mon, May 5, 2014 at 1:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Mon, Feb 17, 2014 at 7:38 AM, Haribabu Kommi >> <kommi.haribabu@gmail.com> wrote: >>> I modified the "autovac_balance_cost" function to balance the costs using >>> the number of running workers, instead >>> of default vacuum cost parameters. >>> >>> Lets assume there are 4 workers running currently with default cost values >>> of limit 200 and delay 20ms. >>> The cost will be distributed as 50 and 10ms each. >>> >>> Suppose if one worker is having a different cost limit value as 1000, which >>> is 5 times more than default value. >>> The cost will be distributed as 50 and 10ms each for other 3 workers and 250 >>> and 10ms for the worker having >>> cost limit value other than default. By this way also it still ensures the >>> cost limit value is 5 times more than other workers. >> >> Won't this change break the basic idea of autovacuum_vacuum_cost_limit >> which is as follows: >> "Note that the value is distributed proportionally among the running autovacuum >> workers, if there is more than one, so that the sum of the limits of each worker >> never exceeds the limit on this variable.". > > It is not breaking the behavior. This setting can be overridden for > individual tables by > changing storage parameters. Still the cost values for the default > tables are under the guc limit. Could you think of a case where in current calculation it doesn't follow what I mentioned above ("the sum of the limits of each worker never exceeds the limit on this variable.")? Here what I could understand is that sum of cost_limit for all autovacuum workers should never exceed the value of autovacuum_vacuum_cost_limit which seems to be always the case in current code but same is not true for proposed patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 05/05/14 15:22, Amit Kapila wrote: > Here what I could understand is that sum of cost_limit for all > autovacuum workers should never exceed the value of > autovacuum_vacuum_cost_limit which seems to be always the > case in current code but same is not true for proposed patch. > Right, but have a look at the 1st message in this thread - the current behavior (and to a large extent the above condition) means that setting cost limits per table does not work in any remotely intuitive way. ITSM that the whole purpose of a per table setting in this context is to override the behavior of auto vacuum throttling - and currently this does not happen unless you get real brutal (i.e setting the cost delay to zero in addition to setting cost limit...making the whole cost limit a bit pointless). regards Mark
On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > On 05/05/14 15:22, Amit Kapila wrote: > Right, but have a look at the 1st message in this thread - the current > behavior (and to a large extent the above condition) means that setting > cost limits per table does not work in any remotely intuitive way. > > ITSM that the whole purpose of a per table setting in this context is to > override the behavior of auto vacuum throttling - and currently this does > not happen unless you get real brutal (i.e setting the cost delay to zero in > addition to setting cost limit...making the whole cost limit a bit > pointless). I think meaning of per table setting is just that it overrides the default value of autovacuum_vacuum_cost_limit for that table and the rest of calculation or concept remains same. This is what currently code does and the same is mentioned in docs as far as I can understand. As per current behaviour the per-table cost_limit is also adjusted based on the setting of GUC autovacuum_vacuum_cost_limit and right now it follows a simple principle that the total cost limit for all workers should be equal to autovacuum_vacuum_cost_limit. Even code has below comment: /* * Adjust cost limit of each active worker to balance the total of cost * limit to autovacuum_vacuum_cost_limit. */ Now If you want to change for the case where user specifies value per table which is more than autovacuum_vacuum_cost_limit or otherwise, then I think the new definition should be bit more clear and it is better not to impact current calculation for default values. I could think of 2 ways to change this: a. if user has specified cost_limit value for table, then it just uses it rather than rebalancing based on value of system-wideguc variable autovacuum_vacuum_cost_limit b. another could be to restrict setting per-table value to be lesser than system-wide value? The former is used for auto vacuum parameters like scale_factor and later is used for parameters like freeze_max_age. Thoughts? Alvaro, do you think above options makes sense to solve this problem? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 06/05/14 16:28, Amit Kapila wrote: > On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz> wrote: >> On 05/05/14 15:22, Amit Kapila wrote: >> Right, but have a look at the 1st message in this thread - the current >> behavior (and to a large extent the above condition) means that setting >> cost limits per table does not work in any remotely intuitive way. >> >> ITSM that the whole purpose of a per table setting in this context is to >> override the behavior of auto vacuum throttling - and currently this does >> not happen unless you get real brutal (i.e setting the cost delay to zero in >> addition to setting cost limit...making the whole cost limit a bit >> pointless). > I think meaning of per table setting is just that it overrides the default > value of autovacuum_vacuum_cost_limit for that table and the rest of > calculation or concept remains same. This is what currently code does > and the same is mentioned in docs as far as I can understand. > > As per current behaviour the per-table cost_limit is also adjusted based > on the setting of GUC autovacuum_vacuum_cost_limit and right now it > follows a simple principle that the total cost limit for all workers should be > equal to autovacuum_vacuum_cost_limit. Even code has below comment: > > /* > * Adjust cost limit of each active worker to balance the total of cost > * limit to autovacuum_vacuum_cost_limit. > */ > > Now If you want to change for the case where user specifies value per > table which is more than autovacuum_vacuum_cost_limit or otherwise, > then I think the new definition should be bit more clear and it is better > not to impact current calculation for default values. > > I could think of 2 ways to change this: > > a. if user has specified cost_limit value for table, then it just uses it > rather than rebalancing based on value of system-wide guc variable > autovacuum_vacuum_cost_limit > b. another could be to restrict setting per-table value to be lesser than > system-wide value? > > The former is used for auto vacuum parameters like scale_factor and > later is used for parameters like freeze_max_age. > > Thoughts? > > Alvaro, do you think above options makes sense to solve this problem? Yes indeed - the code currently working differently from what one would expect. However the usual reason for handing knobs to the user for individual object is so that special configurations can be applied to them. The current method of operation of the per table knobs does not do this (not without clubbing 'em on the head) The (ahem) sensible way that one would expect (perhaps even need) autovacuum throttling to work is: - set sensible defaults for all the usual (well behaved) tables - set a few really aggressive overrides for a handful of the naughty ones Runaway free space bloat is one of the things that can really mangle a postgres system (I've been called in to rescue a few in my time)... there needs to be a way to control those few badly behaved tables ... without removing the usefulness of throttling the others. Regards Mark
Mark Kirkwood wrote: > On 06/05/14 16:28, Amit Kapila wrote: > >On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood > ><mark.kirkwood@catalyst.net.nz> wrote: > >I could think of 2 ways to change this: > > > >a. if user has specified cost_limit value for table, then it just uses it > > rather than rebalancing based on value of system-wide guc variable > > autovacuum_vacuum_cost_limit > >b. another could be to restrict setting per-table value to be lesser than > > system-wide value? > > > >The former is used for auto vacuum parameters like scale_factor and > >later is used for parameters like freeze_max_age. > > > >Thoughts? > > > >Alvaro, do you think above options makes sense to solve this problem? I've been giving some thought to this. Really, there is no way to handle this sensibly while at the same time keeping the documented behavior -- or in other words, what we have documented is not useful behavior. Your option (b) above is an easy solution to the problem, however it means that the user will have serious trouble configuring the system in scenarios such as volatile tables, as Mark says -- essentially that will foreclose the option of using autovacuum for them. I'm not sure I like your (a) proposal much better. One problem there is that if you set the values for a table to be exactly the same values as in postgresql.conf, it will behave completely different because it will not participate in balancing. To me this seems to violate POLA. I checked Haribabu's latest patch in this thread, and didn't like it much. If you set up a table to have cost_delay=1000, it runs at that speed when vacuumed alone; but if there are two workers, it goes at half the speed even if the other one is configured with a very small cost_delay (in essence, "waste" the allocated I/O bandwidth). Three workers, it goes at a third of the speed -- again, even if the other tables are configured to go much slower than the volatile one. This seems too simplistic. It might be okay when you have only one or two very large or high-churn tables, and small numbers of workers, but it's not unreasonable to think that you might have lots more workers if your DB has many high-churn tables. So my proposal is a bit more complicated. First we introduce the notion of a single number, to enable sorting and computations: the "delay equivalent", which is the cost_limit divided by cost_delay. The highest the value is for any table, the fastest it is vacuumed. (It makes sense in physical terms: a higher cost_limit makes it faster, because vacuum sleeps less often; and a higher cost_delay makes it go slower, because vacuums sleeps for longer.) Now, the critical issue is to notice that not all tables are equal; they can be split in two groups, those that go faster than the global delay equivalent (i.e. the effective values of GUC variables autovacuum_vacuum_cost_limit/autovacuum_vacuum_cost_delay), and those that go equal or slower. For the latter group, the rebalancing algorithm "distributes" the allocated I/O by the global vars, in a pro-rated manner. For the former group (tables vacuumed faster than global delay equiv), to rebalance we don't consider the global delay equiv but the delay equiv of the fastest table currently being vacuumed. Suppose we have two tables, delay_equiv=10 each (which is the default value). If they are both vacuumed in parallel, then we distribute a delay_equiv of 5 to each (so set cost_limit=100, cost_delay=20). As soon as one of them finishes, the remaining one is allowed to upgrade to delay_equiv=10 (cost_limit=200, cost_delay=20). Now add a third table, delay_equiv=500 (cost_limit=10000, cost_delay=20; this is Mark's volatile table). If it's being vacuumed on its own, just assign cost_limit=10000 cost_delay=20, as normal. If one of the other two tables are being vacuumed, that one will use delay_equiv=10, as per above. To balance the volatile table, we take the delay_equiv of this one and subtract the already handed-out delay_equiv of 10; so we set the volatile table to delay_equiv=490 (cost_limit=9800, cost_delay=20). If we do it this way, the whole system is running at the full speed enabled by the fastest table we have set the per-table options, but also we have scaled things so that the slow tables go slow and the fast tables go fast. As a more elaborate example, add a fourth table with delay_equiv=50 (cost_limit=1000, cost_delay=20). This is also faster than the global vars, so we put it in the first group. If all four tables are being vacuumed in parallel, we have the two slow tables going at delay_equiv=5 each (cost_limit=100, cost_delay=20); then there are delay_equiv=490 to distribute among the remaining ones; pro-rating this we have delay_equiv=445 (cost_limit=8900, cost_delay=20) for the volatile table and delay_equiv=45 (cost_limit=900, cost_delay=20) for the other one. If one of the slowest tables finished vacuuming, the other one will speed up to delay_equiv=10, and the two fastest ones will go on unchanged. If both finish and the fast tables keep going, the faster one will go at delay_equiv=454 and the other one at delay_equiv=45. Note that the volatile table will go a bit faster while the other one is barely affected. Essentially, if you configure a table with a delay-equiv that's greater than the system configured values, you're giving permission for vacuum to use more I/O, but each table has its own limit to how fast it can go. > The (ahem) sensible way that one would expect (perhaps even need) > autovacuum throttling to work is: > > - set sensible defaults for all the usual (well behaved) tables > - set a few really aggressive overrides for a handful of the naughty ones Does my proposal above satisfy your concerns? > Runaway free space bloat is one of the things that can really mangle > a postgres system (I've been called in to rescue a few in my > time)... there needs to be a way to control those few badly behaved > tables ... without removing the usefulness of throttling the others. Agreed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera wrote: > So my proposal is a bit more complicated. First we introduce the notion > of a single number, to enable sorting and computations: the "delay > equivalent", which is the cost_limit divided by cost_delay. Here's a patch that implements this idea. As you see this is quite a bit more complicated that Haribabu's proposal. There are two holes in this: 1. if you ALTER DATABASE to change vacuum delay for a database, those values are not considered in the global equiv delay. I don't think this is very important and anyway we haven't considered this very much, so it's okay if we don't handle it. 2. If you have a "fast worker" that's only slightly faster than regular workers, it will become slower in some cases. This is explained in a FIXME comment in the patch. I don't really have any more time to invest in this, but I would like to see it in 9.4. Mark, would you test this? Haribabu, how open are you to fixing point (2) above? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 27/08/14 10:27, Alvaro Herrera wrote: > Alvaro Herrera wrote: > >> So my proposal is a bit more complicated. First we introduce the notion >> of a single number, to enable sorting and computations: the "delay >> equivalent", which is the cost_limit divided by cost_delay. > > Here's a patch that implements this idea. As you see this is quite a > bit more complicated that Haribabu's proposal. > > There are two holes in this: > > 1. if you ALTER DATABASE to change vacuum delay for a database, those > values are not considered in the global equiv delay. I don't think this > is very important and anyway we haven't considered this very much, so > it's okay if we don't handle it. > > 2. If you have a "fast worker" that's only slightly faster than regular > workers, it will become slower in some cases. This is explained in a > FIXME comment in the patch. > > I don't really have any more time to invest in this, but I would like to > see it in 9.4. Mark, would you test this? Haribabu, how open are you > to fixing point (2) above? > Thanks Alvaro - I will take a look. regards Mark
On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera wrote: > >> So my proposal is a bit more complicated. First we introduce the notion >> of a single number, to enable sorting and computations: the "delay >> equivalent", which is the cost_limit divided by cost_delay. > > Here's a patch that implements this idea. As you see this is quite a > bit more complicated that Haribabu's proposal. > > There are two holes in this: > > 1. if you ALTER DATABASE to change vacuum delay for a database, those > values are not considered in the global equiv delay. I don't think this > is very important and anyway we haven't considered this very much, so > it's okay if we don't handle it. > > 2. If you have a "fast worker" that's only slightly faster than regular > workers, it will become slower in some cases. This is explained in a > FIXME comment in the patch. > > I don't really have any more time to invest in this, but I would like to > see it in 9.4. Mark, would you test this? Haribabu, how open are you > to fixing point (2) above? Thanks Alvaro. I will check the point(2). Regards, Hari Babu Fujitsu Australia
On 27/08/14 10:27, Alvaro Herrera wrote: > Alvaro Herrera wrote: > >> So my proposal is a bit more complicated. First we introduce the notion >> of a single number, to enable sorting and computations: the "delay >> equivalent", which is the cost_limit divided by cost_delay. > > Here's a patch that implements this idea. As you see this is quite a > bit more complicated that Haribabu's proposal. > > There are two holes in this: > > 1. if you ALTER DATABASE to change vacuum delay for a database, those > values are not considered in the global equiv delay. I don't think this > is very important and anyway we haven't considered this very much, so > it's okay if we don't handle it. > > 2. If you have a "fast worker" that's only slightly faster than regular > workers, it will become slower in some cases. This is explained in a > FIXME comment in the patch. > > I don't really have any more time to invest in this, but I would like to > see it in 9.4. Mark, would you test this? Haribabu, how open are you > to fixing point (2) above? > I did some testing with this patch applied. Minimally tweaking autovacuum (naptime of 5s) with a single table 'cache0' created with a cost limit setting of 10000, running: $ pgbench -n -c8 -T300 -f volatile0.sql cache and monitoring the size of 'cache0' table showed a steady state of: cache=# SELECT pg_relation_size('cache0')/(1024*1024) AS mb; mb ------ 85 So far so good. Adding another table 'cache1' similar to the previous but lacking any per table autovacuum settings, and running 2 pgbench sessions: $ pgbench -n -c8 -T300 -f volatile0.sql cache $ pgbench -n -c8 -T300 -f volatile1.sql cache (volatile1.sql just uses table 'cache1' instead of 'cache0') shows after a few minutes: cache=# SELECT relname,pg_relation_size(oid)/(1024*1024) AS mb FROM pg_class WHERE relname like 'cache_'; relname | mb ---------+-------- cache0 | 664 cache1 | 1900 So we are definitely seeing the 'fast' worker being slowed down. Also, the growth of 'cache1' was only a bit faster than 'cache0' - so the 'slow' worker was getting a speed boost was well. So looks like good progress, but yeah - point (2) is obviously rearing its head in this test. Cheers Mark
Attachment
On Tue, Aug 26, 2014 at 12:19 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> >On Mon, May 5, 2014 at 11:57 AM, Mark Kirkwood >> ><mark.kirkwood@catalyst.net.nz> wrote: >> >I could think of 2 ways to change this: >> > >> >a. if user has specified cost_limit value for table, then it just uses it >> > rather than rebalancing based on value of system-wide guc variable >> > autovacuum_vacuum_cost_limit >> >b. another could be to restrict setting per-table value to be lesser than >> > system-wide value? >> > >> >The former is used for auto vacuum parameters like scale_factor and >> >later is used for parameters like freeze_max_age. >> > > I've been giving some thought to this. Really, there is no way to > handle this sensibly while at the same time keeping the documented > behavior -- or in other words, what we have documented is not useful > behavior. Your option (b) above is an easy solution to the problem, > however it means that the user will have serious trouble configuring the > system in scenarios such as volatile tables, as Mark says -- essentially > that will foreclose the option of using autovacuum for them. > > I'm not sure I like your (a) proposal much better. One problem there is > that if you set the values for a table to be exactly the same values as > in postgresql.conf, it will behave completely different because it will > not participate in balancing. To me this seems to violate POLA. > > So my proposal is a bit more complicated. First we introduce the notion > of a single number, to enable sorting and computations: the "delay > equivalent", ... I favor option (a). There's something to be said for your proposal in terms of logical consistency with what we have now, but to be honest I'm not sure it's the behavior anyone wants (I would welcome more feedback on what people actually want). I think we should view an attempt to set a limit for a particular table as a way to control the rate at which that table is vacuumed - period. At least in situations that I've encountered, it's typical to be able to determine the frequency with which a given table needs to be vacuumed to avoid runaway bloat, and from that you can work backwards to figure out how fast you must process it in MB/s, and from there you can work backwards to figure out what cost delay will achieve that effect. But if the system tinkers with the cost delay under the hood, then you're vacuuming at a different (slower) rate and, of course, the table bloats. Now, in the case where you are setting an overall limit, there is at least an argument to be made that you can determine the overall rate of autovacuum-induced I/O activity that the system can tolerate, and set your limits to stay within that budget, and then let the system decide how to divide that I/O up between workers. But if you're overriding a per-table limit, I don't really see how that holds any water. The system I/O budget doesn't go up just because one particular table is being vacuumed rather than any other. The only plausible use case for setting a per-table rate that I can see is when you actually want the system to use that exact rate for that particular table. I might be missing something, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > Now, in the case where you are setting an overall limit, there is at > least an argument to be made that you can determine the overall rate > of autovacuum-induced I/O activity that the system can tolerate, and > set your limits to stay within that budget, and then let the system > decide how to divide that I/O up between workers. But if you're > overriding a per-table limit, I don't really see how that holds any > water. The system I/O budget doesn't go up just because one > particular table is being vacuumed rather than any other. The only > plausible use case for setting a per-table rate that I can see is when > you actually want the system to use that exact rate for that > particular table. Yeah, this makes sense to me too -- at least as long as you only have one such table. But if you happen to have more than one, and due to some bad luck they happen to be vacuumed concurrently, they will eat a larger share of your I/O bandwidth budget than you anticipated, which you might not like. Thus what I am saying is that those should be scaled down too to avoid peaks. Now, my proposal above mentioned subtracting the speed of tables under the limit, from the speed of those above the limit; maybe we can just rip that part out. Then we end up with the behavior you want, that is to have the fast table vacuum as fast as it is configured when it's the only fast table being vacuumed; and also with what I say, which is that if you have two of them, the two balance the I/O consumption (but only among themselves, not with the slow ones.) Since figuring out this subtraction is the only thing missing from the patch I posted, ISTM we could have something committable with very little extra effort if we agree on this. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 28, 2014 at 1:36 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> Now, in the case where you are setting an overall limit, there is at >> least an argument to be made that you can determine the overall rate >> of autovacuum-induced I/O activity that the system can tolerate, and >> set your limits to stay within that budget, and then let the system >> decide how to divide that I/O up between workers. But if you're >> overriding a per-table limit, I don't really see how that holds any >> water. The system I/O budget doesn't go up just because one >> particular table is being vacuumed rather than any other. The only >> plausible use case for setting a per-table rate that I can see is when >> you actually want the system to use that exact rate for that >> particular table. > > Yeah, this makes sense to me too -- at least as long as you only have > one such table. But if you happen to have more than one, and due to > some bad luck they happen to be vacuumed concurrently, they will eat a > larger share of your I/O bandwidth budget than you anticipated, which > you might not like. I agree that you might not like that. But you might not like having the table vacuumed slower than the configured rate, either. My impression is that the time between vacuums isn't really all that negotiable for some people. I had one customer who had horrible bloat issues on a table that was vacuumed every minute; when we changed the configuration so that it was vacuumed every 15 seconds, those problems went away. Now that is obviously more a matter for autovacuum_naptime than this option, but the point seems general to me: if you need the table vacuumed every N seconds, minutes, or hours, and it only gets vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there are other autovacuum workers running, the table is going to bloat. That *might* be better than busting your I/O budget, but it might also be (and I think frequently is) much worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > I agree that you might not like that. But you might not like having > the table vacuumed slower than the configured rate, either. My > impression is that the time between vacuums isn't really all that > negotiable for some people. I had one customer who had horrible bloat > issues on a table that was vacuumed every minute; when we changed the > configuration so that it was vacuumed every 15 seconds, those problems > went away. Wow, that's extreme. For that case you can set autovacuum_vacuum_cost_limit to 0, which disables the whole thing and lets vacuum run at full speed -- no throttling at all. Would that satisfy the concern? > Now that is obviously more a matter for autovacuum_naptime > than this option, but the point seems general to me: if you need the > table vacuumed every N seconds, minutes, or hours, and it only gets > vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there > are other autovacuum workers running, the table is going to bloat. There might be another problem here which is that if you have all your workers busy because they are vacuuming large tables that don't have churn high enough to warrant disrupting the whole environment (thus low cost_limit), then the table will bloat no matter what you set its cost limit to. So there's not only a matter of a low enough naptime (which is a bad thing for the rest of the system, also) but also one of something similar to priority inversion; should you speed up the vacuuming of those large tables so that one worker is freed soon enough to get to the high-churn table? Was the solution for that customer to set an external tool running vacuum on that table? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 28, 2014 at 4:56 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> I agree that you might not like that. But you might not like having >> the table vacuumed slower than the configured rate, either. My >> impression is that the time between vacuums isn't really all that >> negotiable for some people. I had one customer who had horrible bloat >> issues on a table that was vacuumed every minute; when we changed the >> configuration so that it was vacuumed every 15 seconds, those problems >> went away. > > Wow, that's extreme. For that case you can set > autovacuum_vacuum_cost_limit to 0, which disables the whole thing and > lets vacuum run at full speed -- no throttling at all. Would that > satisfy the concern? Well, maybe, if you want to run completely unthrottled. But I have no evidence that's a common desire. >> Now that is obviously more a matter for autovacuum_naptime >> than this option, but the point seems general to me: if you need the >> table vacuumed every N seconds, minutes, or hours, and it only gets >> vacuumed every 2N or 3N or 5N seconds, minutes, or hours because there >> are other autovacuum workers running, the table is going to bloat. > > There might be another problem here which is that if you have all your > workers busy because they are vacuuming large tables that don't have > churn high enough to warrant disrupting the whole environment (thus low > cost_limit), then the table will bloat no matter what you set its cost > limit to. So there's not only a matter of a low enough naptime (which > is a bad thing for the rest of the system, also) but also one of > something similar to priority inversion; should you speed up the > vacuuming of those large tables so that one worker is freed soon enough > to get to the high-churn table? I don't think so. I continue to believe that the we need to provide the user with the tools to be certain that table X will get vacuumed at least every Y seconds/minutes/hours. To me, allowing the user to set a rate that the system will not adjust or manipulate in any way makes this a lot easier than anything else we might do. > Was the solution for that customer to set an external tool running > vacuum on that table? Nope, we just changed settings. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29/08/14 08:56, Alvaro Herrera wrote: > Robert Haas wrote: > >> I agree that you might not like that. But you might not like having >> the table vacuumed slower than the configured rate, either. My >> impression is that the time between vacuums isn't really all that >> negotiable for some people. I had one customer who had horrible bloat >> issues on a table that was vacuumed every minute; when we changed the >> configuration so that it was vacuumed every 15 seconds, those problems >> went away. > > Wow, that's extreme. For that case you can set > autovacuum_vacuum_cost_limit to 0, which disables the whole thing and > lets vacuum run at full speed -- no throttling at all. Would that > satisfy the concern? > Well no - you might have a whole lot of big tables that you want vacuum to not get too aggressive on, but a few small tables that are highly volatile. So you want *them* vacuumed really fast to prevent them becoming huge tables with only a few rows therein, but your system might not be able to handle *all* your tables being vacuum full speed. This is a fairly common scenario for (several) web CMS systems that tend to want to have session and/cache tables that are small and extremely volatile, plus the rest of the (real) data that is bigger and vastly less volatile. While there is a valid objection along the lines of "don't use a database instead of memcache", it does seem reasonable that Postgres should be able to cope with this type of workload. Cheers Mark
Mark Kirkwood wrote: > On 29/08/14 08:56, Alvaro Herrera wrote: > >Robert Haas wrote: > > > >>I agree that you might not like that. But you might not like having > >>the table vacuumed slower than the configured rate, either. My > >>impression is that the time between vacuums isn't really all that > >>negotiable for some people. I had one customer who had horrible bloat > >>issues on a table that was vacuumed every minute; when we changed the > >>configuration so that it was vacuumed every 15 seconds, those problems > >>went away. > > > >Wow, that's extreme. For that case you can set > >autovacuum_vacuum_cost_limit to 0, which disables the whole thing and > >lets vacuum run at full speed -- no throttling at all. Would that > >satisfy the concern? > > Well no - you might have a whole lot of big tables that you want > vacuum to not get too aggressive on, but a few small tables that are > highly volatile. So you want *them* vacuumed really fast to prevent > them becoming huge tables with only a few rows therein, but your > system might not be able to handle *all* your tables being vacuum > full speed. I meant setting cost limit to 0 *for those tables* only, not for all of them. Anyway it seems to me maybe there is room for a new table storage parameter, say autovacuum_do_balance which means to participate in the balancing program or not. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>
> So my proposal is a bit more complicated. First we introduce the notion
> of a single number, to enable sorting and computations: the "delay
> equivalent", which is the cost_limit divided by cost_delay. The highest
> the value is for any table, the fastest it is vacuumed. (It makes sense
> in physical terms: a higher cost_limit makes it faster, because vacuum
> sleeps less often; and a higher cost_delay makes it go slower, because
> vacuums sleeps for longer.) Now, the critical issue is to notice that
> not all tables are equal; they can be split in two groups, those that go
> faster than the global delay equivalent
> (i.e. the effective values of GUC variables
> autovacuum_vacuum_cost_limit/autovacuum_vacuum_cost_delay), and those
> that go equal or slower. For the latter group, the rebalancing
> algorithm "distributes" the allocated I/O by the global vars, in a
> pro-rated manner. For the former group (tables vacuumed faster than
> global delay equiv), to rebalance we don't consider the global delay
> equiv but the delay equiv of the fastest table currently being vacuumed.
>
> Suppose we have two tables, delay_equiv=10 each (which is the default
> value). If they are both vacuumed in parallel, then we distribute a
> delay_equiv of 5 to each (so set cost_limit=100, cost_delay=20). As
> soon as one of them finishes, the remaining one is allowed to upgrade to
> delay_equiv=10 (cost_limit=200, cost_delay=20).
>
> Now add a third table, delay_equiv=500 (cost_limit=10000, cost_delay=20;
> this is Mark's volatile table). If it's being vacuumed on its own, just
> assign cost_limit=10000 cost_delay=20, as normal. If one of the other
> two tables are being vacuumed, that one will use delay_equiv=10, as per
> above. To balance the volatile table, we take the delay_equiv of this
> one and subtract the already handed-out delay_equiv of 10; so we set the
> volatile table to delay_equiv=490 (cost_limit=9800, cost_delay=20).
>
> If we do it this way, the whole system is running at the full speed
> enabled by the fastest table we have set the per-table options, but also
> we have scaled things so that the slow tables go slow and the fast
> tables go fast.
>
> As a more elaborate example, add a fourth table with delay_equiv=50
> (cost_limit=1000, cost_delay=20). This is also faster than the global
> vars, so we put it in the first group. If all four tables are being
> vacuumed in parallel, we have the two slow tables going at delay_equiv=5
> each (cost_limit=100, cost_delay=20); then there are delay_equiv=490 to
> distribute among the remaining ones; pro-rating this we have
> delay_equiv=445 (cost_limit=8900, cost_delay=20) for the volatile table
> and delay_equiv=45 (cost_limit=900, cost_delay=20) for the other one.
How will this calculation behave if third table has delay_equiv = 30
and fourth table has delay_equiv = 20 which are both greater than
>
> Robert Haas wrote:
>
> > Now, in the case where you are setting an overall limit, there is at
> > least an argument to be made that you can determine the overall rate
> > of autovacuum-induced I/O activity that the system can tolerate, and
> > set your limits to stay within that budget, and then let the system
> > decide how to divide that I/O up between workers. But if you're
> > overriding a per-table limit, I don't really see how that holds any
> > water. The system I/O budget doesn't go up just because one
> > particular table is being vacuumed rather than any other. The only
> > plausible use case for setting a per-table rate that I can see is when
> > you actually want the system to use that exact rate for that
> > particular table.
>
> Yeah, this makes sense to me too -- at least as long as you only have
> one such table. But if you happen to have more than one, and due to
> some bad luck they happen to be vacuumed concurrently, they will eat a
> larger share of your I/O bandwidth budget than you anticipated, which
> you might not like.
I think to control I/O bandwidth, there should be a separate mechanism
On Wed, Aug 27, 2014 at 8:27 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera wrote: > >> So my proposal is a bit more complicated. First we introduce the notion >> of a single number, to enable sorting and computations: the "delay >> equivalent", which is the cost_limit divided by cost_delay. > > Here's a patch that implements this idea. As you see this is quite a > bit more complicated that Haribabu's proposal. > > There are two holes in this: > > 1. if you ALTER DATABASE to change vacuum delay for a database, those > values are not considered in the global equiv delay. I don't think this > is very important and anyway we haven't considered this very much, so > it's okay if we don't handle it. In the attached patch, I changed the balance_cost function to get the global cost values. With this change the above problem can be addressed. > 2. If you have a "fast worker" that's only slightly faster than regular > workers, it will become slower in some cases. This is explained in a > FIXME comment in the patch. I changed as follows to handle these scenarios, If fast workers equiv_delay is more than double global equiv_delay then the global equiv_delay is decreased from fast workers equiv_delay and distribute the same among fast workers. If the difference delay between fast workers equiv_delay to global equiv_delay is less than global equiv_delay, the fast workers equiv_delay along with global equiv_delay is changed as below. fast_equiv_delay -= (diff_equiv_delay) / num_fast_workers global_equiv_delay -= (diff_equiv_delay) / num_regular_workers Sorry for the late reply. Please let me know your comments. Regards, Hari Babu Fujitsu Australia
Attachment
On 8/28/14, 12:18 PM, Robert Haas wrote: > At least in situations that I've encountered, it's typical to be able > to determine the frequency with which a given table needs to be > vacuumed to avoid runaway bloat, and from that you can work backwards > to figure out how fast you must process it in MB/s, and from there you > can work backwards to figure out what cost delay will achieve that > effect. But if the system tinkers with the cost delay under the hood, > then you're vacuuming at a different (slower) rate and, of course, the > table bloats. The last time I took a whack at this, I worked toward making all of the parameters operate in terms of target MB/s, for exactly this style of thinking and goal. Those converted into the same old mechanism under the hood and I got the math right to give the same behavior for the simple cases, but that could have been simplified eventually. I consider that line of thinking to be the only useful one here. The answer I like to these values that don't inherit as expected in the GUC tree is to nuke that style of interface altogether in favor of simplifer bandwidth measured one, then perhaps add multiple QoS levels. Certainly no interest in treating the overly complicated innards of cost computation as a bug and fixing them with even more complicated behavior. The part of this I was trying hard to find time to do myself by the next CF was a better bloat measure tool needed to actually see the problem better. With that in hand, and some nasty test cases, I wanted to come back to simplified MB/s vacuum parameters with easier to understand sharing rules again. If other people are hot to go on that topic, I don't care if I actually do the work; I just have a pretty clear view of what I think people want. > The only plausible use case for setting a per-table rate that I can > see is when you actually want the system to use that exact rate for > that particular table. That's the main one, for these must run on > schedule or else jobs. Yes. On 8/29/14, 9:45 AM, Alvaro Herrera wrote: > Anyway it seems to me maybe there is room for a new table storage > parameter, say autovacuum_do_balance which means to participate in the > balancing program or not. If that eliminates some of the hairy edge cases, sure. A useful concept to consider is having a soft limit that most thing work against, along with a total hard limit for the server. When one of these tight schedule queries with !autovacuum_do_balance starts, they must run at their designed speed with no concern for anyone else. Which means: a) Their bandwidth gets pulled out of the regular, soft limit numbers until they're done. Last time I had one of these jobs, once the big important boys were running, everyone else in the regular shared set were capped at vacuum_cost_limit=5 worth of work. Just enough to keep up with system catalog things, and over the course of many hours process small tables. b) If you try to submit multiple locked rate jobs at once, and the total goes over the hard limit, they have to just be aborted. If the rush of users comes back at 8AM, and you can clean the table up by then if you give it 10MB/s, what you cannot do is let some other user decrease your rate such that you're unfinished at 8AM. Then you'll have aggressive AV competing against the user load you were trying to prepare for. It's better to just throw a serious error that forces someone to look at the hard limit budget and adjust the schedule instead. The systems with this sort of problem are getting cleaned up every single day, almost continuously; missing a day is not bad as long as it's noted and fixed again before the next cleanup window.
Robert Haas wrote: > I favor option (a). There's something to be said for your proposal > in terms of logical consistency with what we have now, but to be > honest I'm not sure it's the behavior anyone wants (I would welcome > more feedback on what people actually want). I think we should view > an attempt to set a limit for a particular table as a way to control > the rate at which that table is vacuumed - period. After re-reading this whole thread one more time, I think I have come to agree with you and Amit here, because not only it is simpler to implement, but it is also simpler to document. Per Greg Smith's opinion elsewhere in the thread, it seems that for end users it doesn't make sense to make the already complicated mechanism even more complicated. So in essence what we're going to do is that the balance mechanism considers only tables that don't have per-table configuration options; for those that do, we will use the values configured there without any changes. I'll see about implementing this and making sure it finds its way to 9.4beta3. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 30, 2014 at 6:16 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> I favor option (a). There's something to be said for your proposal >> in terms of logical consistency with what we have now, but to be >> honest I'm not sure it's the behavior anyone wants (I would welcome >> more feedback on what people actually want). I think we should view >> an attempt to set a limit for a particular table as a way to control >> the rate at which that table is vacuumed - period. > > After re-reading this whole thread one more time, I think I have come to > agree with you and Amit here, because not only it is simpler to > implement, but it is also simpler to document. Per Greg Smith's opinion > elsewhere in the thread, it seems that for end users it doesn't make > sense to make the already complicated mechanism even more complicated. > > So in essence what we're going to do is that the balance mechanism > considers only tables that don't have per-table configuration options; > for those that do, we will use the values configured there without any > changes. > > I'll see about implementing this and making sure it finds its way to > 9.4beta3. Cool! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Alvaro Herrera wrote: > So in essence what we're going to do is that the balance mechanism > considers only tables that don't have per-table configuration options; > for those that do, we will use the values configured there without any > changes. > > I'll see about implementing this and making sure it finds its way to > 9.4beta3. Here's a patch that makes it work as proposed. How do people feel about back-patching this? On one hand it seems there's a lot of fear of changing autovacuum behavior in back branches, because for many production systems it has carefully been tuned; on the other hand, it seems hard to believe that anyone has tuned the system to work sanely given how insanely per-table options behave in the current code. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera wrote: >> So in essence what we're going to do is that the balance mechanism >> considers only tables that don't have per-table configuration options; >> for those that do, we will use the values configured there without any >> changes. >> >> I'll see about implementing this and making sure it finds its way to >> 9.4beta3. > > Here's a patch that makes it work as proposed. > > How do people feel about back-patching this? On one hand it seems > there's a lot of fear of changing autovacuum behavior in back branches, > because for many production systems it has carefully been tuned; on the > other hand, it seems hard to believe that anyone has tuned the system to > work sanely given how insanely per-table options behave in the current > code. I agree with both of those arguments. I have run into very few customers who have used the autovacuum settings to customize behavior for particular tables, and anyone who hasn't should see no change (right?), so my guess is that the practical impact of the change will be pretty limited. On the other hand, it's a clear behavior change. Someone could have set the per-table limit to something enormous and never suffered from that setting because it has basically no effect as things stand right now today; and that person might get an unpleasant surprise when they update. I would at least back-patch it to 9.4. I could go either way on whether to back-patch into older branches. I lean mildly in favor of it at the moment, but with considerable trepidation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Oct 2, 2014 at 9:54 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Alvaro Herrera wrote: > >> So in essence what we're going to do is that the balance mechanism > >> considers only tables that don't have per-table configuration options; > >> for those that do, we will use the values configured there without any > >> changes. > >> > >> I'll see about implementing this and making sure it finds its way to > >> 9.4beta3. > > > > Here's a patch that makes it work as proposed. > > > > How do people feel about back-patching this? On one hand it seems > > there's a lot of fear of changing autovacuum behavior in back branches, > > because for many production systems it has carefully been tuned; on the > > other hand, it seems hard to believe that anyone has tuned the system to > > work sanely given how insanely per-table options behave in the current > > code. > > I agree with both of those arguments. I have run into very few > customers who have used the autovacuum settings to customize behavior > for particular tables, and anyone who hasn't should see no change > (right?), so my guess is that the practical impact of the change will > be pretty limited. On the other hand, it's a clear behavior change. > Someone could have set the per-table limit to something enormous and > never suffered from that setting because it has basically no effect as > things stand right now today; and that person might get an unpleasant > surprise when they update. > > I would at least back-patch it to 9.4. I could go either way on > whether to back-patch into older branches. I lean mildly in favor of > it at the moment, but with considerable trepidation. I'm fine with putting it into 9.4. I'm not sure that I see the value in changing the back-branches and then having to deal with the "well, if you're on 9.3.5 then X, but if you're on 9.3.6 then Y" or having to figure out how to deal with the documentation for this. Has there been any thought as to what pg_upgrade should do..? Thanks, Stephen
Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: > > I agree with both of those arguments. I have run into very few > > customers who have used the autovacuum settings to customize behavior > > for particular tables, and anyone who hasn't should see no change > > (right?), so my guess is that the practical impact of the change will > > be pretty limited. On the other hand, it's a clear behavior change. > > Someone could have set the per-table limit to something enormous and > > never suffered from that setting because it has basically no effect as > > things stand right now today; and that person might get an unpleasant > > surprise when they update. > > > > I would at least back-patch it to 9.4. I could go either way on > > whether to back-patch into older branches. I lean mildly in favor of > > it at the moment, but with considerable trepidation. > > I'm fine with putting it into 9.4. I'm not sure that I see the value in > changing the back-branches and then having to deal with the "well, if > you're on 9.3.5 then X, but if you're on 9.3.6 then Y" or having to > figure out how to deal with the documentation for this. Well, the value obviously is that we would fix the bug that Mark Kirkwood reported that started this thread. Basically, if you are on 9.3.5 or earlier any per-table options for autovacuum cost delay will misbehave (meaning: any such table will be processed with settings flattened according to balancing of the standard options, _not_ the configured ones). If you are on 9.3.6 or newer they will behave as described in the docs. > Has there been any thought as to what pg_upgrade should do..? Yes, I'm thinking there's nothing it should do. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera wrote: > Basically, if you are on 9.3.5 or earlier any per-table options for > autovacuum cost delay will misbehave (meaning: any such table will be > processed with settings flattened according to balancing of the standard > options, _not_ the configured ones). If you are on 9.3.6 or newer they > will behave as described in the docs. Another thing to note is that if you have configured a table to have cost_limit *less* than the default (say 150 instead of the default 200), the balance system will again break that and process the table at 200 instead; in other words, the balancing system has completely broken the ability to tweak the cost system for individual tables in autovacuum. With the current code you can change the *ratios* of cost_limit that would be applied (by setting one table to 5000 and another to 150, when they are vacuumed in parallel you will get them at 194 and 5 instead of the values you set, but the total limit allowed will never be exceeded no matter what you do.) With the v5 patch, the example tables above will be vacuumed at exactly 5000 and 150 instead. The more complex patch I produced earlier would have them vacuumed at something like 4900 and 100 instead, so you wouldn't exceed the total of 5000. I think there is some value to that idea, but it seems the complexity of managing this is too high. The ill effect of the patch is that it will be possible to have autovacuum utilize a much larger fraction of the I/O bandwidth. This was also possible before, but only by setting the cost delay to 0 (in effect disabling throttling for the table). If somebody is thinking that you could mount an attack to a server by having tables that use all the I/O bandwidth, well -- it's not a new problem so I don't think we need to solve it with this patch. I am rather surprised that nobody has reported this problem before. I am now of the mind that this is clearly a bug that should be fixed all the way back. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Alvaro Herrera wrote: > > Basically, if you are on 9.3.5 or earlier any per-table options for > > autovacuum cost delay will misbehave (meaning: any such table will be > > processed with settings flattened according to balancing of the standard > > options, _not_ the configured ones). If you are on 9.3.6 or newer they > > will behave as described in the docs. > > Another thing to note is that if you have configured a table to have > cost_limit *less* than the default (say 150 instead of the default 200), > the balance system will again break that and process the table at 200 > instead; in other words, the balancing system has completely broken the > ability to tweak the cost system for individual tables in autovacuum. That's certainly pretty ugly. > With the v5 patch, the example tables above will be vacuumed at exactly > 5000 and 150 instead. The more complex patch I produced earlier would > have them vacuumed at something like 4900 and 100 instead, so you > wouldn't exceed the total of 5000. I think there is some value to that > idea, but it seems the complexity of managing this is too high. Agreed. > I am rather surprised that nobody has reported this problem before. I > am now of the mind that this is clearly a bug that should be fixed all > the way back. I'm coming around to that also, however, should we worry about users who set per-table settings and then simply forgot about them? I suppose that won't matter too much unless the table is really active, and if it is, they've probably already set it to zero. Thanks, Stephen
Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > > I am rather surprised that nobody has reported this problem before. I > > am now of the mind that this is clearly a bug that should be fixed all > > the way back. > > I'm coming around to that also, however, should we worry about users who > set per-table settings and then simply forgot about them? I suppose > that won't matter too much unless the table is really active, and if it > is, they've probably already set it to zero. Right. For the cases where it's been set and forgotten, perhaps we can have an item in release notes to tell people to look into tables with the parameters set in pg_class.reloptions (to any value different from zero) and to look for performance differences from previous versions when they are. I have pushed this now, backpatch all the way back to 9.0. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services