Thread: stats collector suddenly causing lots of IO
I have a lot of centos servers which are running postgres. Postgres isn't used that heavily on any of them, but lately, the stats collector process keeps causing tons of IO load. It seems to happen only on servers with centos 5. The versions of postgres that are running are: 8.1.18 8.2.6 8.3.1 8.3.5 8.3.6 8.3.7 8.3.8 8.3.9 8.4.2 8.4.3 I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf except track_counts (since auto vacuum says it needs it), but it seems to have little affect on the IO caused by the stats collector. Has anyone else noticed this? Have there been recent kernel changes that could cause this that anyone knows about? Since we haven't touched postgres on these boxes since they were setup initially, I'm a bit baffled as to what might be causing the problem, and why I can't make it go away short of kill -STOP. Any suggestions would be much appreciated!
2010/4/13 Chris <lists@deksai.com>: > I have a lot of centos servers which are running postgres. Postgres isn't used > that heavily on any of them, but lately, the stats collector process keeps > causing tons of IO load. It seems to happen only on servers with centos 5. > The versions of postgres that are running are: > > 8.1.18 > 8.2.6 > 8.3.1 > 8.3.5 > 8.3.6 > 8.3.7 > 8.3.8 > 8.3.9 > 8.4.2 > 8.4.3 > > I've tried turning off everything under RUNTIME STATISTICS in postgresql.conf > except track_counts (since auto vacuum says it needs it), but it seems to have > little affect on the IO caused by the stats collector. > > Has anyone else noticed this? Have there been recent kernel changes > that could cause this that anyone knows about? Since we haven't touched > postgres on these boxes since they were setup initially, I'm a bit baffled as > to what might be causing the problem, and why I can't make it go away short of > kill -STOP. > > Any suggestions would be much appreciated! stats file is writed to disk every 500ms (can be change while building postgres) but it have been improved in 8.4 and should be write only if needed. In 8.4 you can change the directory where to write the stat file with the config param : stats_temp_directory Perhaps have a test and change the filesystem (you might want to try a ramdisk and another fs - ext3 -XFS-ext4 depending of your kernel) and see if it does change something in your IO load. Anyway it looks like it is centos 5 relative so what is your curernt running kernel ? (and what FS ) > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain
Chris wrote: > I have a lot of centos servers which are running postgres. Postgres isn't used > that heavily on any of them, but lately, the stats collector process keeps > causing tons of IO load. It seems to happen only on servers with centos 5. Does this correlate to an increase in size of the pgstat.stat file? Maybe you could try resetting stats, so that the file goes back to an initial size and is slowly repopulated. I'd suggest monitoring the size of the stats file, just in case there's something abnormal with it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Chris <lists@deksai.com> writes: > I have a lot of centos servers which are running postgres. Postgres isn't used > that heavily on any of them, but lately, the stats collector process keeps > causing tons of IO load. It seems to happen only on servers with centos 5. > The versions of postgres that are running are: > 8.1.18 > 8.2.6 > 8.3.1 > 8.3.5 > 8.3.6 > 8.3.7 > 8.3.8 > 8.3.9 > 8.4.2 > 8.4.3 Do these different server versions really all show the problem to the same extent? I'd expect 8.4.x in particular to be cheaper than the older branches. Are their pgstat.stat files all of similar sizes? (Note that 8.4.x keeps pgstat.stat under $PGDATA/pg_stat_tmp/ whereas in earlier versions it was under $PGDATA/global/.) If your applications create/use/drop a lot of tables (perhaps temp tables) then bloat of the pgstat.stat file is to be expected, but it should get cleaned up by vacuum (including autovacuum). What is your vacuuming policy on these servers ... do you use autovacuum? regards, tom lane
On Thu, Apr 15, 2010 at 6:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Chris <lists@deksai.com> writes: >> I have a lot of centos servers which are running postgres. Postgres isn't used >> that heavily on any of them, but lately, the stats collector process keeps >> causing tons of IO load. It seems to happen only on servers with centos 5. > > Say, I just realized that both of you are complaining about stats > collector overhead on centos 5 servers. I hadn't been thinking in terms > of OS-specific causes, but maybe that is what we need to consider. > Can you tell me the exact kernel versions you are seeing these problems > with? uname -a says "... 2.6.18-92.1.13.el5 #1 SMP ... x86_64", and it's CentOS 5.2. I'm not sure whether this is related to the stats collector problems on this machine, but I noticed alarming table bloat in the catalog tables pg_attribute, pg_attrdef, pg_depend, and pg_type. Perhaps this has happened slowly over the past few months, but I discovered the bloat when I ran the query from: http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html on the most-active database on this server (OID 16389 from the pgstat.stat I sent in). See attached table_bloat.txt. The autovacuum settings for this server haven't been tweaked from the default; they probably should have been, given the heavy bulk updates/inserts done. Maybe there's another cause for this extreme catalog bloat, besides the weak autovacuum settings, though. Table sizes, according to pg_size_pretty(pg_total_relation_size(...)): * pg_attribute: 145 GB * pg_attrdef: 85 GB * pg_depend: 38 GB * pg_type: 3465 MB I'll try to send in strace outputs later today. Josh
Attachment
Josh Kupershmidt <schmiddy@gmail.com> writes: > I'm not sure whether this is related to the stats collector problems > on this machine, but I noticed alarming table bloat in the catalog > tables pg_attribute, pg_attrdef, pg_depend, and pg_type. Hmm. That makes me wonder if autovacuum is functioning properly at all. What does pg_stat_all_tables show for the last vacuum and analyze times of those tables? Try something like select relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tables whereschemaname = 'pg_catalog' order by 1; regards, tom lane
Chris <lists@deksai.com> writes: > After the file was made larger and I stopped the vacuum process, I started > seeing the problem. All other postgress processes were quiet, but the stats > collector was constantly causing anywhere from 20-60 of the IO on the server. > Since all the other postgres processes weren't really doing anything, and it is > a busy web server which is predominately MySQL, I'm fairly curious as to what > it is doing. Yeah, the stats collector rewrites the stats file every half second, if there have been any changes since last time --- so the bigger the file, the more overhead. (8.4 is smarter about this, but that doesn't help you on 8.3.) > I straced the stats collector process. I wasn't sure what else to trace as > there wasn't a single other postgres process doing anything. That strace doesn't really prove much; it's what I'd expect. Here's what to do: start a PG session, and strace that session's backend *and* the stats collector while you manually do VACUUM some-small-table. The VACUUM command should try to send some messages to the stats collector process. I'm wondering if those get dropped somehow. regards, tom lane
On Fri, Apr 16, 2010 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Kupershmidt <schmiddy@gmail.com> writes: >> I'm not sure whether this is related to the stats collector problems >> on this machine, but I noticed alarming table bloat in the catalog >> tables pg_attribute, pg_attrdef, pg_depend, and pg_type. > > Hmm. That makes me wonder if autovacuum is functioning properly at all. > What does pg_stat_all_tables show for the last vacuum and analyze times > of those tables? Try something like > > select relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tableswhere schemaname = 'pg_catalog' order by 1; > Output attached. Note that I ran pg_stat_reset() a few days ago. Josh
Attachment
Josh Kupershmidt <schmiddy@gmail.com> writes: > On Fri, Apr 16, 2010 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm. �That makes me wonder if autovacuum is functioning properly at all. >> What does pg_stat_all_tables show for the last vacuum and analyze times >> of those tables? �Try something like >> >> select relname,n_live_tup,n_dead_tup,last_vacuum,last_autovacuum,last_analyze,last_autoanalyze from pg_stat_all_tableswhere schemaname = 'pg_catalog' order by 1; > Output attached. Note that I ran pg_stat_reset() a few days ago. Wow. Well, we have a smoking gun here: for some reason, autovacuum isn't running, or isn't doing its job if it is. If it's not running at all, that would explain failure to prune the stats collector's file too. Is there anything in the postmaster log that would suggest autovac difficulties? regards, tom lane
Josh Kupershmidt <schmiddy@gmail.com> writes: > I made a small half-empty table like this: > CREATE TABLE test_vacuum (i int PRIMARY KEY); > INSERT INTO test_vacuum (i) SELECT a FROM generate_series(1,500000) AS a; > DELETE FROM test_vacuum WHERE RANDOM() < 0.5; > and then ran: > VACUUM test_vacuum; > while an strace of the stats collector process was running. Then after > a few seconds, found the PID of the VACUUM process, and ran strace on > it. I killed them after the VACUUM finished. Outputs attached. Huh. The VACUUM strace clearly shows a boatload of TABPURGE messages being sent: sendto(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0\324\206<\24\321uC\24\320\350)\24\225\345,\24"..., 1000, 0, NULL, 0) = 1000 sendto(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0C\274?\24\365\323?\24\241N@\24\217\0309\24"..., 1000, 0, NULL, 0) = 1000 sendto(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0\375Z2\24\211\f@\0241\3047\24\357mH\24"..., 1000, 0, NULL, 0) = 1000 sendto(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0\242\3529\24\234K\'\24\17\227)\24\300\22+\24"..., 1000, 0, NULL, 0) = 1000 and the stats collector is receiving them: recvfrom(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0\324\206<\24\321uC\24\320\350)\24\225\345,\24"..., 1000, 0, NULL, NULL) =1000 recvfrom(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0C\274?\24\365\323?\24\241N@\24\217\0309\24"..., 1000, 0, NULL, NULL) = 1000 recvfrom(7, "\2\0\0\0\350\3\0\0\5@\0\0\366\0\0\0\375Z2\24\211\f@\0241\3047\24\357mH\24"..., 1000, 0, NULL, NULL) = 1000 So this *should* have resulted in the stats file shrinking. Did you happen to notice if it did, after you did this? regards, tom lane
On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Wow. Well, we have a smoking gun here: for some reason, autovacuum > isn't running, or isn't doing its job if it is. If it's not running > at all, that would explain failure to prune the stats collector's file > too. Hrm, well autovacuum is at least trying to do work: it's currently stuck on those bloated pg_catalog tables, of course. Another developer killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) after it had been running for two weeks. See current pg_stat_activity output attached, which shows the three autovacuum workers running plus two manual VACUUM ANALYZEs I started yesterday. > Is there anything in the postmaster log that would suggest autovac > difficulties? Yup, there are logs from April 1st which I just grepped through. I attached the redacted output, and I see a few warnings about "[table] contains more than "max_fsm_pages" pages with useful free space", as well as "ERROR: canceling autovacuum task". Perhaps bumping up max_fsm_pages and making autovacuum settings more aggressive will help me? I was also planning to run a CLUSTER of those four bloated pg_catalog tables -- is this safe, particularly for tables like pg_attrdef which rely on OIDs? Josh
Attachment
I wrote: > So this *should* have resulted in the stats file shrinking. Did you > happen to notice if it did, after you did this? Oh, never mind that --- I can see that it did shrink, just from counting the write() calls in the collector's strace. So what we have here is a demonstration that the tabpurge mechanism does work for you, when it's invoked. Which is further evidence that for some reason autovacuum is not running for you. What I'd suggest at this point is cranking up log_min_messages to DEBUG2 or so in postgresql.conf, restarting the postmaster, and keeping an eye on the log to see if you can spot anything about why autovac isn't working. regards, tom lane
Josh Kupershmidt <schmiddy@gmail.com> writes: > On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Wow. �Well, we have a smoking gun here: for some reason, autovacuum >> isn't running, or isn't doing its job if it is. �If it's not running >> at all, that would explain failure to prune the stats collector's file >> too. > Hrm, well autovacuum is at least trying to do work: it's currently > stuck on those bloated pg_catalog tables, of course. Another developer > killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) > after it had been running for two weeks. See current pg_stat_activity > output attached, which shows the three autovacuum workers running plus > two manual VACUUM ANALYZEs I started yesterday. Two weeks? What have you got the autovacuum cost delays set to? Once you're up to three AV workers, no new ones can get launched until one of those finishes or is killed. So that would explain failure to prune the stats collector's tables (the tabpurge code is only run during AV worker launch). So what we need to figure out is why it's taking so obscenely long to vacuum these tables ... regards, tom lane
On Apr 16, 2010, at 9:48 AM, Tom Lane wrote: > Josh Kupershmidt <schmiddy@gmail.com> writes: >> On Fri, Apr 16, 2010 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Wow. Well, we have a smoking gun here: for some reason, autovacuum >>> isn't running, or isn't doing its job if it is. If it's not running >>> at all, that would explain failure to prune the stats collector's file >>> too. > >> Hrm, well autovacuum is at least trying to do work: it's currently >> stuck on those bloated pg_catalog tables, of course. Another developer >> killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) >> after it had been running for two weeks. See current pg_stat_activity >> output attached, which shows the three autovacuum workers running plus >> two manual VACUUM ANALYZEs I started yesterday. > > Two weeks? What have you got the autovacuum cost delays set to? > > Once you're up to three AV workers, no new ones can get launched until > one of those finishes or is killed. So that would explain failure to > prune the stats collector's tables (the tabpurge code is only run during > AV worker launch). So what we need to figure out is why it's taking so > obscenely long to vacuum these tables ... > On any large system with good I/O I have had to significantly increase the aggressiveness of autovacuum. Even with the below settings, it doesn't interfere with other activity (~2200iops random, ~900MB/sec sequential capable I/O). My relevant autovacuum parameters are (from 'show *'): autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates or deletes priorto analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates or deletesprior to analyze. autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to preventtransaction ID wraparound. autovacuum_max_workers | 3 | Sets the maximum number of simultaneously runningautovacuum worker processes. autovacuum_naptime | 1min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | 2000 | Vacuum cost amount available before napping, forautovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuumas a fraction of reltuples. autovacuum_vacuum_threshold | 50 For what it is worth, I just went onto one of my systems -- one with lots of partition tables and temp table creation/destruction-- and looked at the system tables in question there. Postgres 8.4, using dt+ (trimmed result below to interesting tables) Schema | Name | Type | Owner | Size | Description ------------+-------------------------+-------+----------+------------+------------- pg_catalog | pg_attrdef | table | postgres | 195 MB | pg_catalog | pg_attribute | table | postgres | 1447 MB | pg_catalog | pg_class | table | postgres | 1694 MB | pg_catalog | pg_constraint | table | postgres | 118 MB | pg_catalog | pg_depend | table | postgres | 195 MB | pg_catalog | pg_statistic | table | postgres | 2300 MB | pg_catalog | pg_type | table | postgres | 181 MB | So, I did a vacuum full; reindex table; analyze; sequence on each of these. I wish I could just CLUSTER them but the aboveworks. now the tables are: Schema | Name | Type | Owner | Size | Description ------------+-------------------------+-------+----------+------------+------------- pg_catalog | pg_attrdef | table | postgres | 44 MB | pg_catalog | pg_attribute | table | postgres | 364 MB | pg_catalog | pg_class | table | postgres | 1694 MB | pg_catalog | pg_constraint | table | postgres | 118 MB | pg_catalog | pg_depend | table | postgres | 195 MB | pg_catalog | pg_statistic | table | postgres | 656 MB | pg_catalog | pg_type | table | postgres | 45 MB | I've learned to accept about 50% bloat (2x the compacted size) in postgres as just the way it usually is on a busy table,but the 3x and 4x bloat of statistic, attrdef, and attribute have me wondering. I have had some 'idle in transaction' connections hanging out from time to time that have caused issues on this machine thatcould explain the above perma-bloat. That is one thing that could affect the case reported here as well. The worstthing about those, is you can't even force kill those connections from within postgres (pg_cancel_backend doesn't workon them, and killing them via the OS bounces postgres ...) so you have to hunt down the offending client. > regards, tom lane > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
> > I have had some 'idle in transaction' connections hanging out from time to time that have caused issues on this machinethat could explain the above perma-bloat. That is one thing that could affect the case reported here as well. Theworst thing about those, is you can't even force kill those connections from within postgres (pg_cancel_backend doesn'twork on them, and killing them via the OS bounces postgres ...) so you have to hunt down the offending client. > Ooh, I just noticed pg_terminate_backend() ... maybe this will let me kill annoying idle in transaction clients. I guessthis arrived in 8.4? Hopefully this won't cause the whole thing to bounce and close all other backends.... > >> regards, tom lane >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Fri, Apr 16, 2010 at 12:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Kupershmidt <schmiddy@gmail.com> writes: >> Hrm, well autovacuum is at least trying to do work: it's currently >> stuck on those bloated pg_catalog tables, of course. Another developer >> killed an autovacuum of pg_attribute (or maybe it was pg_attrdef) >> after it had been running for two weeks. See current pg_stat_activity >> output attached, which shows the three autovacuum workers running plus >> two manual VACUUM ANALYZEs I started yesterday. > > Two weeks? What have you got the autovacuum cost delays set to? SELECT name, current_setting(name), source FROM pg_settings WHERE source != 'default' AND name ILIKE '%vacuum%'; name | current_setting | source ----------------------+-----------------+-------------------- vacuum_cost_delay | 200ms | configuration file vacuum_cost_limit | 100 | configuration file vacuum_cost_page_hit | 6 | configuration file (3 rows) I'm guessing these values and the default autovacuum configuration values need to be cranked significantly to make vacuum much more aggressive :-( > Once you're up to three AV workers, no new ones can get launched until > one of those finishes or is killed. So that would explain failure to > prune the stats collector's tables (the tabpurge code is only run during > AV worker launch). So what we need to figure out is why it's taking so > obscenely long to vacuum these tables ... > Hopefully changing those three vacuum_cost_* params will speed up the manual- and auto-vacuums.. it'll take me a few days to see any results, since I still need to do something about the bloat that's already there. Josh
Josh Kupershmidt wrote: > SELECT name, current_setting(name), source FROM pg_settings WHERE > source != 'default' AND name ILIKE '%vacuum%'; > name | current_setting | source > ----------------------+-----------------+-------------------- > vacuum_cost_delay | 200ms | configuration file > vacuum_cost_limit | 100 | configuration file > vacuum_cost_page_hit | 6 | configuration file > > > Hopefully changing those three vacuum_cost_* params will speed up the > manual- and auto-vacuums.. Those only impact manual VACUUM statements. There's a different set with names like autovacuum_vacuum_cost_delay that control the daemon. You can set those to "-1" in order to match the regular VACUUM, but that's not the default. You really need to sort out the max_fsm_pages setting too, because until that issue goes away these tables are unlikely to ever stop growing. And, no, you can't use CLUSTER on the system tables to clean those up. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Fri, Apr 16, 2010 at 2:14 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Josh Kupershmidt wrote: >> >> SELECT name, current_setting(name), source FROM pg_settings WHERE >> source != 'default' AND name ILIKE '%vacuum%'; >> name | current_setting | source >> ----------------------+-----------------+-------------------- >> vacuum_cost_delay | 200ms | configuration file >> vacuum_cost_limit | 100 | configuration file >> vacuum_cost_page_hit | 6 | configuration file >> >> Hopefully changing those three vacuum_cost_* params will speed up the >> manual- and auto-vacuums.. > > Those only impact manual VACUUM statements. There's a different set with > names like autovacuum_vacuum_cost_delay that control the daemon. You can > set those to "-1" in order to match the regular VACUUM, but that's not the > default. It looks like the default which I have of autovacuum_vacuum_cost_limit = -1, which means it's inheriting the vacuum_cost_limit of 100 I had set. I'll try bumping vacuum_cost_limit up to 1000 or so. > You really need to sort out the max_fsm_pages setting too, because until > that issue goes away these tables are unlikely to ever stop growing. And, > no, you can't use CLUSTER on the system tables to clean those up. I have max_fsm_pages = 524288 , but from the hints in the logfiles this obviously needs to go up much higher. And it seems the only way to compact the pg_catalog tables is VACUUM FULL + REINDEX on 8.3 -- I had tried the CLUSTER on my 9.0 machine and wrongly assumed it would work on 8.3, too. Josh
Josh Kupershmidt wrote: > And it seems the only way > to compact the pg_catalog tables is VACUUM FULL + REINDEX on 8.3 -- I > had tried the CLUSTER on my 9.0 machine and wrongly assumed it would > work on 8.3, too. > Right; that just got implemented a couple of months ago. See the news from http://www.postgresql.org/community/weeklynews/pwn20100214 for a summary of how the code was gyrated around to support that. This is a tough situation to get out of in <9.0 because VACUUM FULL is slow and takes an exclusive lock on the table. That tends to lead toward an unpredictable window for required downtime, which is never good. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Josh Kupershmidt <schmiddy@gmail.com> writes: > name | current_setting | source > ----------------------+-----------------+-------------------- > vacuum_cost_delay | 200ms | configuration file > vacuum_cost_limit | 100 | configuration file > vacuum_cost_page_hit | 6 | configuration file > > It looks like the default which I have of autovacuum_vacuum_cost_limit > = -1, which means it's inheriting the vacuum_cost_limit of 100 I had > set. I'll try bumping vacuum_cost_limit up to 1000 or so. Actually I think the main problem is that cost_delay value, which is probably an order of magnitude too high. The way to limit vacuum's I/O impact on other stuff is to make it take frequent short delays, not have it run full speed and then sleep a long time. In any case, your current settings have got it sleeping way too much. Two WEEKS !!!?? regards, tom lane
On Fri, Apr 16, 2010 at 3:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Kupershmidt <schmiddy@gmail.com> writes: >> name | current_setting | source >> ----------------------+-----------------+-------------------- >> vacuum_cost_delay | 200ms | configuration file >> vacuum_cost_limit | 100 | configuration file >> vacuum_cost_page_hit | 6 | configuration file >> >> It looks like the default which I have of autovacuum_vacuum_cost_limit >> = -1, which means it's inheriting the vacuum_cost_limit of 100 I had >> set. I'll try bumping vacuum_cost_limit up to 1000 or so. > > Actually I think the main problem is that cost_delay value, which is > probably an order of magnitude too high. The way to limit vacuum's > I/O impact on other stuff is to make it take frequent short delays, > not have it run full speed and then sleep a long time. In any case, > your current settings have got it sleeping way too much. Two WEEKS !!!?? Yup, I was going to turn vacuum_cost_delay down to 20. The two weeks was for the pg_catalog table which has bloated to 145 GB, I think. One of those manual VACUUMs I kicked off just finished, after 48 hours -- and that table was only 25 GB or so. I wasn't the one who set up this postgresql.conf, but I am stuck fixing things :/