Thread: Why do we let autovacuum give up?
Hello, I have run into yet again another situation where there was an assumption that autovacuum was keeping up and it wasn't. It was caused by autovacuum quitting because another process requested a lock. In turn we received a ton of bloat on pg_attribute which caused all kinds of other issues (as can be expected). The more I run into it, the more it seems like autovacuum should behave like vacuum, in that it gets precedence when it is running. First come, first serve as they say. Thoughts? JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
"Joshua D. Drake" <jd@commandprompt.com> writes: > I have run into yet again another situation where there was an > assumption that autovacuum was keeping up and it wasn't. It was caused > by autovacuum quitting because another process requested a lock. > In turn we received a ton of bloat on pg_attribute which caused all > kinds of other issues (as can be expected). > The more I run into it, the more it seems like autovacuum should behave > like vacuum, in that it gets precedence when it is running. First come, > first serve as they say. 1. Back when it worked like that, things were worse. 2. What have you got that is requesting exclusive lock on pg_attribute? That seems like a pretty unfriendly behavior in itself. regards, tom lane
On 01/23/2014 12:34 PM, Joshua D. Drake wrote: > > Hello, > > I have run into yet again another situation where there was an > assumption that autovacuum was keeping up and it wasn't. It was caused > by autovacuum quitting because another process requested a lock. > > In turn we received a ton of bloat on pg_attribute which caused all > kinds of other issues (as can be expected). > > The more I run into it, the more it seems like autovacuum should behave > like vacuum, in that it gets precedence when it is running. First come, > first serve as they say. > > Thoughts? If we let autovacuum block user activity, a lot more people would turn it off. Now, if you were to argue that we should have some way to monitor the tables which autovac can never touch because of conflicts, I would agree with you. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 01/23/2014 12:34 PM, Joshua D. Drake wrote: >> >> Hello, >> >> I have run into yet again another situation where there was an >> assumption that autovacuum was keeping up and it wasn't. It was caused >> by autovacuum quitting because another process requested a lock. >> >> In turn we received a ton of bloat on pg_attribute which caused all >> kinds of other issues (as can be expected). >> >> The more I run into it, the more it seems like autovacuum should behave >> like vacuum, in that it gets precedence when it is running. First come, >> first serve as they say. >> >> Thoughts? > > If we let autovacuum block user activity, a lot more people would turn > it off. > > Now, if you were to argue that we should have some way to monitor the > tables which autovac can never touch because of conflicts, I would agree > with you. Agree completely. Easy ways to monitor this would be great. Once you know there's a problem, tweaking autovacuum settings is very hard and misunderstood, and explaining how to be effective at it is a dark art too. -Harold
On 24/01/14 09:49, Tom Lane wrote: > 2. What have you got that is requesting exclusive lock on > pg_attribute? That seems like a pretty unfriendly behavior in itself. > regards, tom lane I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why* they needed to make so many, but it seemed like a bad idea. Regards Mark
On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > On 24/01/14 09:49, Tom Lane wrote: >> 2. What have you got that is requesting exclusive lock on pg_attribute? >> That seems like a pretty unfriendly behavior in itself. regards, tom lane > > I've seen this sort of problem where every db session was busily creating > temporary tables. I never got to the find *why* they needed to make so many, > but it seemed like a bad idea. But... how does that result on a vacuum-incompatible lock request against pg_attribute? I see that it'll insert lots of rows into pg_attribute, and maybe later delete them, but none of that blocks vacuum. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 01/23/2014 01:03 PM, Mark Kirkwood wrote: > > On 24/01/14 09:49, Tom Lane wrote: >> 2. What have you got that is requesting exclusive lock on >> pg_attribute? That seems like a pretty unfriendly behavior in itself. >> regards, tom lane > > I've seen this sort of problem where every db session was busily > creating temporary tables. I never got to the find *why* they needed to > make so many, but it seemed like a bad idea. Yep... that's the one. They are creating lots and lots of temp tables. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
<p dir="ltr"><br /> Dne 23.1.2014 22:04 "Mark Kirkwood" <<a href="mailto:mark.kirkwood@catalyst.net.nz">mark.kirkwood@catalyst.net.nz</a>>napsal(a):<br /> ><br /> > On 24/01/1409:49, Tom Lane wrote:<br /> >><br /> >> 2. What have you got that is requesting exclusive lock on pg_attribute?That seems like a pretty unfriendly behavior in itself. regards, tom lane <br /> ><br /> ><br /> >I've seen this sort of problem where every db session was busily creating temporary tables. I never got to the find *why*they needed to make so many, but it seemed like a bad idea.<br /> ><p dir="ltr">Our customer had same problem with temp tables by intensively plpgsql functions. For higher load a temp tables are performance and stability killer. Vacuumof pg attrib has very ugly impacts :(<p dir="ltr">Regars<p dir="ltr">Pavel<p dir="ltr">After redesign - without tmptables - his applications works well.<p dir="ltr">We needs a global temp tables<br /><p dir="ltr">> Regards<br /> ><br/> > Mark<br /> ><br /> ><br /> ><br /> ><br /> > -- <br /> > Sent via pgsql-hackers mailinglist (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> > To make changes toyour subscription:<br /> > <a href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/>
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > On 24/01/14 09:49, Tom Lane wrote: >> 2. What have you got that is requesting exclusive lock on >> pg_attribute? That seems like a pretty unfriendly behavior in itself. > I've seen this sort of problem where every db session was busily > creating temporary tables. I never got to the find *why* they needed to > make so many, but it seemed like a bad idea. That shouldn't result in any table-level exclusive locks on system catalogs, though. [ thinks... ] It's possible that what you saw is not the kick-out-autovacuum-entirely behavior, but the behavior added in commit bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it can't immediately get an exclusive buffer lock on. On a heavily used table, we might skip the same page repeatedly, so that dead tuples don't get cleaned for a long time. To add insult to injury, despite having done that, vacuum would reset the pgstats dead-tuple count to zero, thus postponing the next autovacuum. I think commit 115f41412 may have improved the situation, but I'd want to see some testing of this theory before I'd propose back-patching it. regards, tom lane
On 24/01/14 10:09, Robert Haas wrote: > On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz> wrote: >> On 24/01/14 09:49, Tom Lane wrote: >>> 2. What have you got that is requesting exclusive lock on pg_attribute? >>> That seems like a pretty unfriendly behavior in itself. regards, tom lane >> I've seen this sort of problem where every db session was busily creating >> temporary tables. I never got to the find *why* they needed to make so many, >> but it seemed like a bad idea. > But... how does that result on a vacuum-incompatible lock request > against pg_attribute? > > I see that it'll insert lots of rows into pg_attribute, and maybe > later delete them, but none of that blocks vacuum. > That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. regards Mark
On 24/01/14 10:16, Mark Kirkwood wrote: > On 24/01/14 10:09, Robert Haas wrote: >> On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood >> <mark.kirkwood@catalyst.net.nz> wrote: >>> On 24/01/14 09:49, Tom Lane wrote: >>>> 2. What have you got that is requesting exclusive lock on >>>> pg_attribute? >>>> That seems like a pretty unfriendly behavior in itself. regards, >>>> tom lane >>> I've seen this sort of problem where every db session was busily >>> creating >>> temporary tables. I never got to the find *why* they needed to make >>> so many, >>> but it seemed like a bad idea. >> But... how does that result on a vacuum-incompatible lock request >> against pg_attribute? >> >> I see that it'll insert lots of rows into pg_attribute, and maybe >> later delete them, but none of that blocks vacuum. >> > > That was my thought too - if I see it happening again here (was a year > or so ago that I saw some serious pg_attribute bloat) I'll dig deeper. > > Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes.
Attachment
On Thu, Jan 23, 2014 at 10:00 PM, Harold Giménez <harold@heroku.com> wrote:
On Thu, Jan 23, 2014 at 12:53 PM, Josh Berkus <josh@agliodbs.com> wrote:Agree completely. Easy ways to monitor this would be great. Once you
> On 01/23/2014 12:34 PM, Joshua D. Drake wrote:
>>
>> Hello,
>>
>> I have run into yet again another situation where there was an
>> assumption that autovacuum was keeping up and it wasn't. It was caused
>> by autovacuum quitting because another process requested a lock.
>>
>> In turn we received a ton of bloat on pg_attribute which caused all
>> kinds of other issues (as can be expected).
>>
>> The more I run into it, the more it seems like autovacuum should behave
>> like vacuum, in that it gets precedence when it is running. First come,
>> first serve as they say.
>>
>> Thoughts?
>
> If we let autovacuum block user activity, a lot more people would turn
> it off.
>
> Now, if you were to argue that we should have some way to monitor the
> tables which autovac can never touch because of conflicts, I would agree
> with you.
know there's a problem, tweaking autovacuum settings is very hard and
misunderstood, and explaining how to be effective at it is a dark art
too.
FWIW, I have a patch around somewhere that I never cleaned up properly for submissions that simply added a counter to pg_stat_user_tables indicating how many times vacuum had aborted on that specific table. If that's enough info (it was for my case) to cover this case, I can try to dig it out again and clean it up...
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 01/23/2014 02:17 PM, Magnus Hagander wrote: > FWIW, I have a patch around somewhere that I never cleaned up properly for > submissions that simply added a counter to pg_stat_user_tables indicating > how many times vacuum had aborted on that specific table. If that's enough > info (it was for my case) to cover this case, I can try to dig it out > again and clean it up... It would be 100% more information than we currently have. How much more difficult would it be to count completed autovacuums as well? It's really the ratio of the two which matters ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 01/23/2014 02:55 PM, Josh Berkus wrote: > On 01/23/2014 02:17 PM, Magnus Hagander wrote: >> FWIW, I have a patch around somewhere that I never cleaned up properly for >> submissions that simply added a counter to pg_stat_user_tables indicating >> how many times vacuum had aborted on that specific table. If that's enough >> info (it was for my case) to cover this case, I can try to dig it out >> again and clean it up... > > It would be 100% more information than we currently have. How much more > difficult would it be to count completed autovacuums as well? It's > really the ratio of the two which matters ... Actually, now that I think about it, the ratio of the two doesn't matter as much as whether the most recent autovacuum aborted or not. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
On 24/01/14 10:16, Mark Kirkwood wrote:Actually not much digging required. Running the attached script via pgbench (8 sessions) against a default configured postgres 8.4 sees pg_attribute get to 1G after about 15 minutes.On 24/01/14 10:09, Robert Haas wrote:On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:On 24/01/14 09:49, Tom Lane wrote:But... how does that result on a vacuum-incompatible lock request2. What have you got that is requesting exclusive lock on pg_attribute?I've seen this sort of problem where every db session was busily creating
That seems like a pretty unfriendly behavior in itself. regards, tom lane
temporary tables. I never got to the find *why* they needed to make so many,
but it seemed like a bad idea.
against pg_attribute?
I see that it'll insert lots of rows into pg_attribute, and maybe
later delete them, but none of that blocks vacuum.
That was my thought too - if I see it happening again here (was a year or so ago that I saw some serious pg_attribute bloat) I'll dig deeper.
At that rate, with default throttling, it will be a close race whether autovac can vacuum pages as fast as they are being added. Even if it never gets cancelled, it might not ever finish.
Cheers,
Jeff
On 24/01/14 12:13, Jeff Janes wrote: > On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood < > mark.kirkwood@catalyst.net.nz> wrote: > >> On 24/01/14 10:16, Mark Kirkwood wrote: >> >>> On 24/01/14 10:09, Robert Haas wrote: >>> >>>> On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood >>>> <mark.kirkwood@catalyst.net.nz> wrote: >>>> >>>>> On 24/01/14 09:49, Tom Lane wrote: >>>>> >>>>>> 2. What have you got that is requesting exclusive lock on pg_attribute? >>>>>> That seems like a pretty unfriendly behavior in itself. regards, tom >>>>>> lane >>>>>> >>>>> I've seen this sort of problem where every db session was busily >>>>> creating >>>>> temporary tables. I never got to the find *why* they needed to make so >>>>> many, >>>>> but it seemed like a bad idea. >>>>> >>>> But... how does that result on a vacuum-incompatible lock request >>>> against pg_attribute? >>>> >>>> I see that it'll insert lots of rows into pg_attribute, and maybe >>>> later delete them, but none of that blocks vacuum. >>>> >>>> >>> That was my thought too - if I see it happening again here (was a year or >>> so ago that I saw some serious pg_attribute bloat) I'll dig deeper. >>> >>> >>> >> Actually not much digging required. Running the attached script via >> pgbench (8 sessions) against a default configured postgres 8.4 sees >> pg_attribute get to 1G after about 15 minutes. >> > At that rate, with default throttling, it will be a close race whether > autovac can vacuum pages as fast as they are being added. Even if it never > gets cancelled, it might not ever finish. > Yes - I should have set the cost delay to 0 first (checking that now).
On 24/01/14 12:28, Mark Kirkwood wrote: > On 24/01/14 12:13, Jeff Janes wrote: >> On Thu, Jan 23, 2014 at 1:41 PM, Mark Kirkwood < >> mark.kirkwood@catalyst.net.nz> wrote: >> >>> On 24/01/14 10:16, Mark Kirkwood wrote: >>> >>>> On 24/01/14 10:09, Robert Haas wrote: >>>> >>>>> On Thu, Jan 23, 2014 at 4:03 PM, Mark Kirkwood >>>>> <mark.kirkwood@catalyst.net.nz> wrote: >>>>> >>>>>> On 24/01/14 09:49, Tom Lane wrote: >>>>>> >>>>>>> 2. What have you got that is requesting exclusive lock on >>>>>>> pg_attribute? >>>>>>> That seems like a pretty unfriendly behavior in itself. regards, >>>>>>> tom >>>>>>> lane >>>>>>> >>>>>> I've seen this sort of problem where every db session was busily >>>>>> creating >>>>>> temporary tables. I never got to the find *why* they needed to >>>>>> make so >>>>>> many, >>>>>> but it seemed like a bad idea. >>>>>> >>>>> But... how does that result on a vacuum-incompatible lock request >>>>> against pg_attribute? >>>>> >>>>> I see that it'll insert lots of rows into pg_attribute, and maybe >>>>> later delete them, but none of that blocks vacuum. >>>>> >>>>> >>>> That was my thought too - if I see it happening again here (was a >>>> year or >>>> so ago that I saw some serious pg_attribute bloat) I'll dig deeper. >>>> >>>> >>>> >>> Actually not much digging required. Running the attached script via >>> pgbench (8 sessions) against a default configured postgres 8.4 sees >>> pg_attribute get to 1G after about 15 minutes. >>> >> At that rate, with default throttling, it will be a close race whether >> autovac can vacuum pages as fast as they are being added. Even if it >> never >> gets cancelled, it might not ever finish. >> > > Yes - I should have set the cost delay to 0 first (checking that now). > > > Doing that (and a few other autovac tweaks): autovacuum_max_workers = 4 autovacuum_naptime = 10s autovacuum_vacuum_scale_factor = 0.1 autovacuum_analyze_scale_factor = 0.1 autovacuum_vacuum_cost_delay = 0ms Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute in this case. Back to drawing board for a test case. Regards Mark
On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote: > autovacuum_max_workers = 4 > autovacuum_naptime = 10s > autovacuum_vacuum_scale_factor = 0.1 > autovacuum_analyze_scale_factor = 0.1 > autovacuum_vacuum_cost_delay = 0ms > > Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute > in this case. Back to drawing board for a test case. Well, I think quite many people don't realize it might be necessary to tune autovac on busy workloads. As it very well might be the case in Josh's case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-01-23 16:15:50 -0500, Tom Lane wrote: > [ thinks... ] It's possible that what you saw is not the > kick-out-autovacuum-entirely behavior, but the behavior added in commit > bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it > can't immediately get an exclusive buffer lock on. On a heavily used > table, we might skip the same page repeatedly, so that dead tuples don't > get cleaned for a long time. I don't think it's too likely as an explanation here. Such workloads are likely to fill a page with only dead tuples, right? Once all tuples are safely dead they will be killed from the btree which should cause the page not to be visited anymore and thus safely vacuumable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-23 16:15:50 -0500, Tom Lane wrote: >> [ thinks... ] It's possible that what you saw is not the >> kick-out-autovacuum-entirely behavior, but the behavior added in commit >> bbb6e559c, whereby vacuum (auto or regular) will skip over pages that it >> can't immediately get an exclusive buffer lock on. On a heavily used >> table, we might skip the same page repeatedly, so that dead tuples don't >> get cleaned for a long time. > I don't think it's too likely as an explanation here. Such workloads are > likely to fill a page with only dead tuples, right? Once all tuples are > safely dead they will be killed from the btree which should cause the > page not to be visited anymore and thus safely vacuumable. I added some instrumentation to vacuumlazy.c to count the number of pages skipped in this way. You're right, it seems to be negligible, at least with Mark's test case. I saw at most two pages skipped per vacuum, and usually none; so there's no way that a whole lot of tuples are going unvacuumed because of this. (I wonder though if we ought to add such counting as a permanent feature ...) I concur with the other reports that the main problem in this test case is just that the default cost delay settings throttle autovacuum so hard that it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay from the default 20ms to 2ms, it seems to keep up quite nicely, on my machine anyway. Probably other combinations of changes would do it too. Perhaps we need to back off the default cost delay settings a bit? We've certainly heard more than enough reports of table bloat in heavily-updated tables. A system that wasn't hitting the updates as hard as it could might not need this, but on the other hand it probably wouldn't miss the I/O cycles from a more aggressive autovacuum, either. regards, tom lane
On 2014-01-23 19:29:23 -0500, Tom Lane wrote: > I saw at most two pages skipped per vacuum, and > usually none; so there's no way that a whole lot of tuples are going > unvacuumed because of this. (I wonder though if we ought to add such > counting as a permanent feature ...) I generally think we need to work a bit on the data reported back by vacuum. Adding data and also making the data output when using autovacuum more consistent with what VACUUM VERBOSE reports. The latter curiously often has less detail than autovacuum. I had hoped to get to that for 9.4, but it doesn't look like it. > I concur with the other reports that the main problem in this test case is > just that the default cost delay settings throttle autovacuum so hard that > it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay > from the default 20ms to 2ms, it seems to keep up quite nicely, on my > machine anyway. Probably other combinations of changes would do it too. > Perhaps we need to back off the default cost delay settings a bit? > We've certainly heard more than enough reports of table bloat in > heavily-updated tables. A system that wasn't hitting the updates as hard > as it could might not need this, but on the other hand it probably > wouldn't miss the I/O cycles from a more aggressive autovacuum, either. Yes, I think adjusting the default makes sense, most setups that have enough activity that costing plays a role have to greatly increase the values. I'd rather increase the cost limit than reduce cost delay so drastically though, but that's admittedly just gut feeling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-23 19:29:23 -0500, Tom Lane wrote: >> I concur with the other reports that the main problem in this test case is >> just that the default cost delay settings throttle autovacuum so hard that >> it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay >> from the default 20ms to 2ms, it seems to keep up quite nicely, on my >> machine anyway. Probably other combinations of changes would do it too. >> Perhaps we need to back off the default cost delay settings a bit? >> We've certainly heard more than enough reports of table bloat in >> heavily-updated tables. A system that wasn't hitting the updates as hard >> as it could might not need this, but on the other hand it probably >> wouldn't miss the I/O cycles from a more aggressive autovacuum, either. > Yes, I think adjusting the default makes sense, most setups that have > enough activity that costing plays a role have to greatly increase the > values. I'd rather increase the cost limit than reduce cost delay so > drastically though, but that's admittedly just gut feeling. Well, I didn't experiment with intermediate values, I was just trying to test the theory that autovac could keep up given less-extreme throttling. I'm not taking any position on just where we need to set the values, only that what we've got is probably too extreme. regards, tom lane
On 01/24/2014 07:52 AM, Andres Freund wrote: > On 2014-01-24 12:49:57 +1300, Mark Kirkwood wrote: >> autovacuum_max_workers = 4 >> autovacuum_naptime = 10s >> autovacuum_vacuum_scale_factor = 0.1 >> autovacuum_analyze_scale_factor = 0.1 >> autovacuum_vacuum_cost_delay = 0ms >> >> Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute >> in this case. Back to drawing board for a test case. > > Well, I think quite many people don't realize it might be necessary to > tune autovac on busy workloads. As it very well might be the case in > Josh's case. Oh, lots of people realise it's a good idea to tune autovac on busy workloads. They just do it in the wrong direction, making it run less often and less aggressively, causing more bloat, and making their problem worse. I've seen this enough times that I'm starting to think the autovauum tuning knobs need a child safety lock ;-) More seriously, people don't understand autovacuum, how it works, or why they need it. They notice it when things are already bad, see that it's doing lots of work and doing lots of I/O that competes with queries, and turn it off to "solve" the problem. I'm not sure how to tackle that. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 23, 2014 at 10:38 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> >>> Stops excessive bloat - clearly autovacuum *is* able to vacuum pg_attribute >>> in this case. Back to drawing board for a test case. >> >> Well, I think quite many people don't realize it might be necessary to >> tune autovac on busy workloads. As it very well might be the case in >> Josh's case. > > Oh, lots of people realise it's a good idea to tune autovac on busy > workloads. > > They just do it in the wrong direction, making it run less often and > less aggressively, causing more bloat, and making their problem worse. > > I've seen this enough times that I'm starting to think the autovauum > tuning knobs need a child safety lock ;-) > > More seriously, people don't understand autovacuum, how it works, or why > they need it. They notice it when things are already bad, see that it's > doing lots of work and doing lots of I/O that competes with queries, and > turn it off to "solve" the problem. AFAIK, tuning down autovacuum is common advice **when compounded with manually scheduled vacuuming**. The problem of autovacuum is that it always picks the wrong time to work. That is, when the DB is the most active. Because statistically that's when the thresholds are passed. If you ask me, I'd like autovac to know when not to run (or rather wait a bit, not forever), perhaps by checking load factors or some other tell-tale of an already-saturated I/O system.
Claudio Freire escribió: > If you ask me, I'd like autovac to know when not to run (or rather > wait a bit, not forever), perhaps by checking load factors or some > other tell-tale of an already-saturated I/O system. We had a proposed design to tell autovac when not to run (or rather, when to switch settings very high so that in practice it'd never run). At some point somebody said "but we can just change autovacuum=off in postgresql.conf via crontab when the high load period starts, and turn it back on afterwards" --- and that was the end of it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Claudio Freire escribi�: >> If you ask me, I'd like autovac to know when not to run (or rather >> wait a bit, not forever), perhaps by checking load factors or some >> other tell-tale of an already-saturated I/O system. > We had a proposed design to tell autovac when not to run (or rather, > when to switch settings very high so that in practice it'd never run). > At some point somebody said "but we can just change autovacuum=off in > postgresql.conf via crontab when the high load period starts, and turn > it back on afterwards" --- and that was the end of it. The hard part of this is that shutting down autovacuum during heavy load may be exactly the wrong thing to do. regards, tom lane
On 01/24/2014 11:32 AM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Claudio Freire escribió: >>> If you ask me, I'd like autovac to know when not to run (or rather >>> wait a bit, not forever), perhaps by checking load factors or some >>> other tell-tale of an already-saturated I/O system. > >> We had a proposed design to tell autovac when not to run (or rather, >> when to switch settings very high so that in practice it'd never run). >> At some point somebody said "but we can just change autovacuum=off in >> postgresql.conf via crontab when the high load period starts, and turn >> it back on afterwards" --- and that was the end of it. > > The hard part of this is that shutting down autovacuum during heavy > load may be exactly the wrong thing to do. Yep. In fact, it may be appropriate to limit or stop autovacuum's work on some big tables, while pushing its activity even higher for small, high churn tables. If you stop autovacuum on a message-queue system when load gets high, you'll get a giant messy bloat explosion. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer escribió: > On 01/24/2014 11:32 AM, Tom Lane wrote: > > The hard part of this is that shutting down autovacuum during heavy > > load may be exactly the wrong thing to do. > > Yep. In fact, it may be appropriate to limit or stop autovacuum's work > on some big tables, while pushing its activity even higher for small, > high churn tables. > > If you stop autovacuum on a message-queue system when load gets high, > you'll get a giant messy bloat explosion. The design we had was to have table groups, each with their own set of custom parameters, and they would change depending on schedule. You could keep the queue tables in one group which would not change parameters, and only change the rest. But as I said, it was never fully implemented. (We had a partial patch from a GSoC project, IIRC.) I don't have the cycles to implement it now, anyway. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 24, 2014 at 12:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 01/24/2014 11:32 AM, Tom Lane wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> Claudio Freire escribió: >>>> If you ask me, I'd like autovac to know when not to run (or rather >>>> wait a bit, not forever), perhaps by checking load factors or some >>>> other tell-tale of an already-saturated I/O system. >> >>> We had a proposed design to tell autovac when not to run (or rather, >>> when to switch settings very high so that in practice it'd never run). >>> At some point somebody said "but we can just change autovacuum=off in >>> postgresql.conf via crontab when the high load period starts, and turn >>> it back on afterwards" --- and that was the end of it. >> >> The hard part of this is that shutting down autovacuum during heavy >> load may be exactly the wrong thing to do. > > Yep. In fact, it may be appropriate to limit or stop autovacuum's work > on some big tables, while pushing its activity even higher for small, > high churn tables. > > If you stop autovacuum on a message-queue system when load gets high, > you'll get a giant messy bloat explosion. A message queue has a steady state and needs way more than autovacuum. A table used as a message queue would need a wholly dedicated autovacuum worker to be constantly vacuuming. It's certainly an extreme example. But normal tables are much bigger than their active set, so vacuuming, which walks all those cold gigabytes, tends to wreak havoc with I/O performance. Doing it in peak hours, which is autovacuum's preferred time, is terrible. Delaying autovacuum for a while doesn't sound like such a disastruous thing. In essence, I'm talking about two thresholds. A "vacuum in the background" threshold, and a "omfg this table is a mess vacuum now now now" threshold. The background part is quite not straightforward though. As in, what is background?
On 01/23/2014 07:22 PM, Alvaro Herrera wrote: >> If you ask me, I'd like autovac to know when not to run (or rather >> > wait a bit, not forever), perhaps by checking load factors or some >> > other tell-tale of an already-saturated I/O system. > We had a proposed design to tell autovac when not to run (or rather, > when to switch settings very high so that in practice it'd never run). > At some point somebody said "but we can just change autovacuum=off in > postgresql.conf via crontab when the high load period starts, and turn > it back on afterwards" --- and that was the end of it. Anything which depends on a timing-based feedback loop is going to be hopeless. Saying "autovac shouldn't run if load is high" sounds like a simple statement, until you actually try to implement it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Jan 24, 2014 at 2:44 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 01/23/2014 07:22 PM, Alvaro Herrera wrote: >>> If you ask me, I'd like autovac to know when not to run (or rather >>> > wait a bit, not forever), perhaps by checking load factors or some >>> > other tell-tale of an already-saturated I/O system. >> We had a proposed design to tell autovac when not to run (or rather, >> when to switch settings very high so that in practice it'd never run). >> At some point somebody said "but we can just change autovacuum=off in >> postgresql.conf via crontab when the high load period starts, and turn >> it back on afterwards" --- and that was the end of it. > > Anything which depends on a timing-based feedback loop is going to be > hopeless. Saying "autovac shouldn't run if load is high" sounds like a > simple statement, until you actually try to implement it. Exactly. But people tuning autovac down are doing exactly that: trying to tune autovac to background-only work. They *must* then launch foreground vacuums, at times they deem sensible, when doing that. So, problem is not of people tuning down autovacuum, but of them forgetting to vacuum explicitly after doing so.
On Thu, Jan 23, 2014 at 7:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2014-01-23 19:29:23 -0500, Tom Lane wrote: >>> I concur with the other reports that the main problem in this test case is >>> just that the default cost delay settings throttle autovacuum so hard that >>> it has no chance of keeping up. If I reduce autovacuum_vacuum_cost_delay >>> from the default 20ms to 2ms, it seems to keep up quite nicely, on my >>> machine anyway. Probably other combinations of changes would do it too. > >>> Perhaps we need to back off the default cost delay settings a bit? >>> We've certainly heard more than enough reports of table bloat in >>> heavily-updated tables. A system that wasn't hitting the updates as hard >>> as it could might not need this, but on the other hand it probably >>> wouldn't miss the I/O cycles from a more aggressive autovacuum, either. > >> Yes, I think adjusting the default makes sense, most setups that have >> enough activity that costing plays a role have to greatly increase the >> values. I'd rather increase the cost limit than reduce cost delay so >> drastically though, but that's admittedly just gut feeling. > > Well, I didn't experiment with intermediate values, I was just trying > to test the theory that autovac could keep up given less-extreme > throttling. I'm not taking any position on just where we need to set > the values, only that what we've got is probably too extreme. So, Greg Smith proposed what I think is a very useful methodology for assessing settings in this area: figure out what it works out to in MB/s. If we assume we're going to read and dirty every page we vacuum, and that this will take negligible time of itself so that the work is dominated by the sleeps, the default settings work out to 200/(10 + 20) pages every 20ms, or 2.67MB/s. Obviously, the rate will be 3x higher if the pages don't need to be dirtied, and higher still if they're all in cache, but considering the way the visibility map works, it seems like a good bet that we WILL need to dirty most of the pages that we look at - either they've got dead tuples and need clean-up, or they don't and need to be marked all-visible. A corollary of this is that if you're dirtying heap pages faster than a few megabytes per second, autovacuum, at least with default settings, is not going to keep up. And if you assume that each write transaction dirties at least one heap page, any volume of write transactions in excess of a few hundred per second will meat that criteria. Which is really not that much; a single core can do over 1000 tps with synchronous_commit=off, or if there's a BBWC that can absorb it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company