Thread: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
daveg@sonic.net
Date:
The following bug has been logged on the website: Bug reference: 13750 Logged by: David Gould Email address: daveg@sonic.net PostgreSQL version: 9.4.5 Operating system: Linux Description: With more than a few tens of thousands of tables in one database autovacuuming slows down radically and becomes ineffective. Increasing the number of autovacuum workers makes the slow down worse. A client has an application that loads data from thousands of external feeds many times a day. They create a new table for each batch of new data. After some months old tables are dropped. Typically the database has about 200,000 fairly small tables each of which has a few indexes and a toast. There are also a lot of temp tables that come and go. pg_class has over 1/2 million rows. The hosts have 80 hardware threads, 1TB of memory and fusionIO storage. They started seeing long running autovacuum workers doing antiwraparound vacuums. pg_stat_activity showed workers had been vacuuming a single small table (ex, 10k rows) for several hours, however in the query log the actual vacuum took less than a second. When they updated to 9.4.4 they also increased the number of autovacuum workers in anticipation of the multixact fix causing extra vacuuming. Several days later they observed massive catalog bloat, eg pg_attribute was over 200GB of mostly empty pages. This caused new connections to get stuck in startup as the catalogs no longer fit in the buffer cache. I then tried experimenting with different setting of autovacuum workers and found: /Autovacuum Actions per Hour/ Workers Actions per Worker 1 2110.1 2110.1 2 1760.8 880.4 4 647.3 161.8 8 386.2 48.3 72 62.0 0.9 I have analyzed this and created reproduction scripts. I'll send that later today.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
daveg@sonic.net wrote: > With more than a few tens of thousands of tables in one database > autovacuuming slows down radically and becomes ineffective. Increasing the > number of autovacuum workers makes the slow down worse. Yeah, you need to decrease autovacuum_vacuum_cost_delay if you want to make them go faster. (As more workers are started, the existing ones slow down. The intent is that the I/O bandwidth allocation is kept constant regardless of how many workers there are.) > When they updated to 9.4.4 they also increased the number of autovacuum > workers in anticipation of the multixact fix causing extra vacuuming. > Several days later they observed massive catalog bloat, eg pg_attribute was > over 200GB of mostly empty pages. This caused new connections to get stuck > in startup as the catalogs no longer fit in the buffer cache. Oh crap. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 10:46:46 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > daveg@sonic.net wrote: > > > With more than a few tens of thousands of tables in one database > > autovacuuming slows down radically and becomes ineffective. Increasing the > > number of autovacuum workers makes the slow down worse. > > Yeah, you need to decrease autovacuum_vacuum_cost_delay if you want to > make them go faster. (As more workers are started, the existing ones > slow down. The intent is that the I/O bandwidth allocation is kept > constant regardless of how many workers there are.) The cost delays are all 0. We care about bloat, not bandwidth. Anyway, they are not actually vacuuming. They are waiting on the VacuumScheduleLock. And requesting freshs snapshots from the stats_collector. Basically there is a loop in do_autovacuum() that looks like: ... build list of all tables to vacuum ... for tab in tables_to_vacuum: lock(VacuumScheduleLock) for worker in autovacuum_workers: if worker.working_on == tab: skip = true if skip or very_expensive_check_to_see_if_already_vacuumed(tab): unlock(VacuumScheduleLock) continue unlock(VacuumScheduleLock) actually_vacuum(tab) Since all the workers are working on the same list they all compete to vacuum the next item on the list. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
David Gould wrote: > The cost delays are all 0. We care about bloat, not bandwidth. > > Anyway, they are not actually vacuuming. They are waiting on the > VacuumScheduleLock. And requesting freshs snapshots from the > stats_collector. Oh, I see. Interesting. Proposals welcome. I especially dislike the ("very_expensive") pgstat check. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > David Gould wrote: >> Anyway, they are not actually vacuuming. They are waiting on the >> VacuumScheduleLock. And requesting freshs snapshots from the >> stats_collector. > Oh, I see. Interesting. Proposals welcome. I especially dislike the > ("very_expensive") pgstat check. Couldn't we simply move that out of the locked stanza? That is, if no other worker is working on the table, claim it, and release the lock immediately. Then do the "very expensive" check. If that fails, we have to re-take the lock to un-claim the table, but that sounds OK. regards, tom lane
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > David Gould wrote: > >> Anyway, they are not actually vacuuming. They are waiting on the > >> VacuumScheduleLock. And requesting freshs snapshots from the > >> stats_collector. > > > Oh, I see. Interesting. Proposals welcome. I especially dislike the > > ("very_expensive") pgstat check. > > Couldn't we simply move that out of the locked stanza? That is, if no > other worker is working on the table, claim it, and release the lock > immediately. Then do the "very expensive" check. If that fails, we > have to re-take the lock to un-claim the table, but that sounds OK. Hmm, yeah, that would work. Of course, if we could avoid the pgstat check completely that would be even better. The amount of pgstat traffic that causes is ridiculous. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Of course, if we could avoid the pgstat check completely that would be > even better. The amount of pgstat traffic that causes is ridiculous. Good point ... shouldn't we have already checked the stats before ever deciding to try to claim the table? regards, tom lane
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Of course, if we could avoid the pgstat check completely that would be > > even better. The amount of pgstat traffic that causes is ridiculous. > > Good point ... shouldn't we have already checked the stats before ever > deciding to try to claim the table? The second check is there to allow for some other worker (or manual vacuum) having vacuumed it after we first checked, but which had finished before we check the array of current jobs. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Jeff Janes
Date:
On Fri, Oct 30, 2015 at 8:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Of course, if we could avoid the pgstat check completely that would be >> even better. The amount of pgstat traffic that causes is ridiculous. > > Good point ... shouldn't we have already checked the stats before ever > deciding to try to claim the table? If forces another pg_stat just before it claims the table, in case the table had just finished being vacuumed. Cheers, Jeff
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Good point ... shouldn't we have already checked the stats before ever >> deciding to try to claim the table? > The second check is there to allow for some other worker (or manual > vacuum) having vacuumed it after we first checked, but which had > finished before we check the array of current jobs. I wonder whether that check costs more than it saves. regards, tom lane
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 12:17:11 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > David Gould wrote: > > > The cost delays are all 0. We care about bloat, not bandwidth. > > > > Anyway, they are not actually vacuuming. They are waiting on the > > VacuumScheduleLock. And requesting freshs snapshots from the > > stats_collector. > > Oh, I see. Interesting. Proposals welcome. I especially dislike the > ("very_expensive") pgstat check. I have an effective patch for this issue that I think is small enough to be backpatched. I'm cleaning it up a bit and will send it this weekend. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Jeff Janes
Date:
On Fri, Oct 30, 2015 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> David Gould wrote: >>> Anyway, they are not actually vacuuming. They are waiting on the >>> VacuumScheduleLock. And requesting freshs snapshots from the >>> stats_collector. > >> Oh, I see. Interesting. Proposals welcome. I especially dislike the >> ("very_expensive") pgstat check. > > Couldn't we simply move that out of the locked stanza? That is, if no > other worker is working on the table, claim it, and release the lock > immediately. Then do the "very expensive" check. If that fails, we > have to re-take the lock to un-claim the table, but that sounds OK. The attached patch does that. In a system with 4 CPUs and that had 100,000 tables, with a big chunk of them in need of vacuuming, and with 30 worker processes, this increased the throughput by a factor of 40. Presumably it will do even better with more CPUs. It is still horribly inefficient, but 40 times less so. Cheers, Jeff
Attachment
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Jeff Janes
Date:
On Fri, Oct 30, 2015 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Tom Lane wrote: >>> Good point ... shouldn't we have already checked the stats before ever >>> deciding to try to claim the table? > >> The second check is there to allow for some other worker (or manual >> vacuum) having vacuumed it after we first checked, but which had >> finished before we check the array of current jobs. > > I wonder whether that check costs more than it saves. A single autovacuum worker can run for hours or days. I don't think we should start vacuuming a TB size table because it needed vacuuming days ago, when the initial to-do list was built up, but no longer does. So some kind of recheck is needed. I thought of making the recheck first use whichever snapshot we currently have hanging around, and then only if it still needs vacuuming force a fresh snapshot and re-re-check. The problem with that is that any previous snapshot is aggressively destroyed at the end of each vacuum or analyze by the EOXact code. So we don't actually have a snapshot hanging around to use until we go to the work of re-parsing the database stats file. So you have to take special steps to exempt AutoVacuumWorker from EOXact code clearing out the stats, and I don't know what the repercussions of that might be. We could also relax the freshness requirements of even the final re-check, perhaps dynamically. No point in re-parsing a 40MB stats file to avoid unnecessary vacuuming a 16KB table. But parsing a 8KB stats files to avoid unnecessary vacuuming of a 1TB table is well worth it. But that runs into the same problem as above. Once you have destroyed your previous stats snapshot, you no longer have the ability to accept stale stats any longer. Cheers, Jeff
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 21:49:00 -0700 Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Oct 30, 2015 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> David Gould wrote: > >>> Anyway, they are not actually vacuuming. They are waiting on the > >>> VacuumScheduleLock. And requesting freshs snapshots from the > >>> stats_collector. > > > >> Oh, I see. Interesting. Proposals welcome. I especially dislike the > >> ("very_expensive") pgstat check. > > > > Couldn't we simply move that out of the locked stanza? That is, if no > > other worker is working on the table, claim it, and release the lock > > immediately. Then do the "very expensive" check. If that fails, we > > have to re-take the lock to un-claim the table, but that sounds OK. > > > The attached patch does that. In a system with 4 CPUs and that had > 100,000 tables, with a big chunk of them in need of vacuuming, and > with 30 worker processes, this increased the throughput by a factor of > 40. Presumably it will do even better with more CPUs. > > It is still horribly inefficient, but 40 times less so. That is a good result for such a small change. The attached patch against REL9_5_STABLE_goes a little further. It claims the table under the lock, but also addresses the problem of all the workers racing to redo the same table by enforcing an ordering on all the workers. No worker can claim a table with an oid smaller than the highest oid claimed by any worker. That is, instead of racing to the same table, workers leapfrog over each other. In theory the recheck of the stats could be eliminated although this patch does not do that. It does eliminate the special handling of stats snapshots for autovacuum workers which cuts back on the excess rewriting of the stats file somewhat. I'll send numbers shortly, but as I recall it is over 100 times better than the original. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Attachment
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 12:51:43 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane wrote: > >> Good point ... shouldn't we have already checked the stats before ever > >> deciding to try to claim the table? > > > The second check is there to allow for some other worker (or manual > > vacuum) having vacuumed it after we first checked, but which had > > finished before we check the array of current jobs. > > I wonder whether that check costs more than it saves. It does indeed. It drives the stats collector wild. And of course if there are lots of tables and indexes the stats temp file gets very large so that it can take a long time (seconds) to rewrite it. This happens for each worker for each table that is a candidate for vacuuming. Since it would not be convenient to provide a copy of the clients 8TB database I have made a standalone reproduction. The attached files: build_test_instance.sh - create a test instance datagen.py - used by above to populate it with lots of tables logbyv.awk - count auto analyze actions in postgres log trace.sh - strace the stats collector and autovacuum workers tracereport.sh - list top 50 calls in strace output The test process is to run the build_test_instance script to create an instance with one database with a large number of tiny tables. During the setup autovacuuming is off. Then make a tarball of the instance for reuse. For each test case, untar the instance, set the number of workers and start it. After a short time autovacuum will start workers to analyze the new tables. Expect to see the stats collector doing lots of writing. You may want to use tmpfs or a ramdisk for the data dir for building the test instance. The configuration is sized for reasonable desktop, 8 to 16GB of memory and an SSD. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Attachment
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 22:16:04 -0700 Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Oct 30, 2015 at 9:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> Tom Lane wrote: > >>> Good point ... shouldn't we have already checked the stats before ever > >>> deciding to try to claim the table? > > > >> The second check is there to allow for some other worker (or manual > >> vacuum) having vacuumed it after we first checked, but which had > >> finished before we check the array of current jobs. > > > > I wonder whether that check costs more than it saves. > > A single autovacuum worker can run for hours or days. I don't think > we should start vacuuming a TB size table because it needed vacuuming > days ago, when the initial to-do list was built up, but no longer > does. So some kind of recheck is needed. > > I thought of making the recheck first use whichever snapshot we > currently have hanging around, and then only if it still needs > vacuuming force a fresh snapshot and re-re-check. The problem with > that is that any previous snapshot is aggressively destroyed at the > end of each vacuum or analyze by the EOXact code. So we don't > actually have a snapshot hanging around to use until we go to the work > of re-parsing the database stats file. So you have to take special > steps to exempt AutoVacuumWorker from EOXact code clearing out the > stats, and I don't know what the repercussions of that might be. > > We could also relax the freshness requirements of even the final > re-check, perhaps dynamically. No point in re-parsing a 40MB stats > file to avoid unnecessary vacuuming a 16KB table. But parsing a 8KB > stats files to avoid unnecessary vacuuming of a 1TB table is well > worth it. But that runs into the same problem as above. Once you > have destroyed your previous stats snapshot, you no longer have the > ability to accept stale stats any longer. I've been thinking about this and frankly the whole scheme is a bit suspect. There are a couple of problems: 1. In an 8TB database with 200,000 tables it is not reasonable to build a list in advance of all the tables to vacuum. Inevitably some workers will be working off very stale lists. 2. Autovacuum workers do not need a statistics snapshot. They only need the current statistics for the table they are considering vacuuming. I think the following might be too big to back patch or maybe even for 9.5, but I think a better overall scheme would look something like: - The stats collector accepts a new request message, "give me the stats for this: (dboid, table oid)" and handles it by replying to the sender (autovacuum worker) with the current statistics entry for that table. - Instead of the worker building a big list of work up front, it just keeps track of a shared highwater mark of tables worked on and does an index scan for the next higher oid in pg_class. This is similar to my patch in that vacuuming is ordered so that workers never contend, but instead of working from a list they look up the next table in pg_class each time using the oid index. For each table that might need vacuuming it would send the stats collector the request for that tables statistics and use the reply to make determine what action to take. This avoids the stale work list and the need for rechecking. Also, it eliminates all the flogging of the stats files (I suspect SSD owners and laptop users will rejoice!). -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 23:19:52 -0700 David Gould <daveg@sonic.net> wrote: > On Fri, 30 Oct 2015 21:49:00 -0700 > Jeff Janes <jeff.janes@gmail.com> wrote: > > The attached patch does that. In a system with 4 CPUs and that had > > 100,000 tables, with a big chunk of them in need of vacuuming, and > > with 30 worker processes, this increased the throughput by a factor of > > 40. Presumably it will do even better with more CPUs. > > > > It is still horribly inefficient, but 40 times less so. > > That is a good result for such a small change. > > The attached patch against REL9_5_STABLE_goes a little further. It > claims the table under the lock, but also addresses the problem of all the > workers racing to redo the same table by enforcing an ordering on all the > workers. No worker can claim a table with an oid smaller than the highest > oid claimed by any worker. That is, instead of racing to the same table, > workers leapfrog over each other. > > In theory the recheck of the stats could be eliminated although this patch > does not do that. It does eliminate the special handling of stats snapshots > for autovacuum workers which cuts back on the excess rewriting of the stats > file somewhat. > > I'll send numbers shortly, but as I recall it is over 100 times better than > the original. As promised here are numbers. The setup is a 2 core haswell i3 with a single SSD. The system is fanless, so it slows down after a few minutes of load. The database has 40,000 tiny tables freshly created. Autovacuum will try to analyze them, but that is not much work per table so the number of tables analyzed per minute is a pretty good measure of the recheck overhead and contention among the workers. Unpatched postgresql 9.5beta1 (I let it run for over an hour but it did not get very far): seconds elapsed actions chunk sec/av av/min 430.1 430.1 1000 1000 0.430 139.5 1181.2 751.1 2000 1000 0.751 79.9 1954.0 772.7 3000 1000 0.773 77.6 2618.5 664.5 4000 1000 0.664 90.3 3305.7 687.2 5000 1000 0.687 87.3 4010.1 704.4 6000 1000 0.704 85.2 A ps sample from partway through the run. Most of the cpu used is by the stats collector: $ ps xww | awk '/collector|autovacuum worker/ && !/awk/' 30212 ? Ss 0:00 postgres: autovacuum launcher process 30213 ? Ds 0:55 postgres: stats collector process 30221 ? Ss 0:23 postgres: autovacuum worker process avac 30231 ? Ss 0:12 postgres: autovacuum worker process avac 30243 ? Ss 0:11 postgres: autovacuum worker process avac 30257 ? Ss 0:10 postgres: autovacuum worker process avac postgresql 9.5beta1 plus my ordered oids/high watermark autovacuum patch: seconds elapsed actions chunk sec/av av/min 13.4 13.4 1000 1000 0.013 4471.9 22.9 9.5 2000 1000 0.010 6299.9 31.9 8.9 3000 1000 0.009 6718.9 40.2 8.3 4000 1000 0.008 7220.2 52.2 12.1 5000 1000 0.012 4973.1 59.5 7.2 6000 1000 0.007 8318.3 69.4 10.0 7000 1000 0.010 6024.7 78.9 9.5 8000 1000 0.010 6311.8 93.5 14.6 9000 1000 0.015 4105.1 104.3 10.7 10000 1000 0.011 5601.7 114.4 10.2 11000 1000 0.010 5887.0 127.5 13.1 12000 1000 0.013 4580.9 140.1 12.6 13000 1000 0.013 4763.0 153.8 13.7 14000 1000 0.014 4388.9 166.7 12.9 15000 1000 0.013 4638.6 181.6 14.8 16000 1000 0.015 4043.9 200.9 19.3 17000 1000 0.019 3113.5 217.5 16.7 18000 1000 0.017 3600.8 231.5 14.0 19000 1000 0.014 4285.7 245.5 14.0 20000 1000 0.014 4286.3 259.0 13.5 21000 1000 0.013 4449.7 274.5 15.5 22000 1000 0.015 3874.2 292.5 18.0 23000 1000 0.018 3332.4 311.3 18.8 24000 1000 0.019 3190.3 326.1 14.8 25000 1000 0.015 4047.8 345.1 19.0 26000 1000 0.019 3158.1 363.5 18.3 27000 1000 0.018 3270.6 382.4 18.9 28000 1000 0.019 3167.6 403.4 21.0 29000 1000 0.021 2855.0 419.6 16.2 30000 1000 0.016 3701.6 A ps sample from partway through the run. Most of the cpu used is by workers, not the collector. $ ps xww | awk '/collector|autovacuum worker/ && !/awk/' 872 ? Ds 0:49 postgres: stats collector process 882 ? Ds 3:42 postgres: autovacuum worker process avac 953 ? Ds 3:21 postgres: autovacuum worker process avac 1062 ? Ds 2:56 postgres: autovacuum worker process avac 1090 ? Ds 2:34 postgres: autovacuum worker process avac It seems to slow down a bit after a few minutes. I think this may be because of filling the OS page cache with dirty pages as it is fully IO bound for most of the test duration. Or possibly cpu throttling. I'll see about retesting on better hardware. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Sat, 31 Oct 2015 01:37:14 -0700 David Gould <daveg@sonic.net> wrote: I've added numbers from the same test setup with Jeff Janes patch. postgresql-9.5beta1 with Jeff Janes vac_move_lock.patch: seconds elapsed actions chunk sec/av av/min 116.1 116.1 1000 1000 0.116 516.8 224.0 107.9 2000 1000 0.108 556.1 356.4 132.4 3000 1000 0.132 453.2 489.8 133.4 4000 1000 0.133 449.8 602.5 112.7 5000 1000 0.113 532.3 706.6 104.1 6000 1000 0.104 576.3 801.9 95.3 7000 1000 0.095 629.5 886.3 84.4 8000 1000 0.084 710.9 983.0 96.7 9000 1000 0.097 620.5 1074.5 91.5 10000 1000 0.091 656.1 1168.2 93.7 11000 1000 0.094 640.1 1259.5 91.3 12000 1000 0.091 656.9 1350.3 90.8 13000 1000 0.091 661.1 1434.2 83.9 14000 1000 0.084 715.4 1516.0 81.9 15000 1000 0.082 733.0 ps sample during run. Note that stats collector is very busy. 3006 ? Ds 2:43 postgres: stats collector process 3014 ? Ss 1:42 postgres: autovacuum worker process avac 3044 ? Ss 1:29 postgres: autovacuum worker process avac 3053 ? Ds 1:27 postgres: autovacuum worker process avac 3060 ? Ss 1:25 postgres: autovacuum worker process avac > Unpatched postgresql 9.5beta1 (I let it run for over an hour but it did not > get very far): > > seconds elapsed actions chunk sec/av av/min > 430.1 430.1 1000 1000 0.430 139.5 > 1181.2 751.1 2000 1000 0.751 79.9 > 1954.0 772.7 3000 1000 0.773 77.6 > 2618.5 664.5 4000 1000 0.664 90.3 > 3305.7 687.2 5000 1000 0.687 87.3 > 4010.1 704.4 6000 1000 0.704 85.2 > > > A ps sample from partway through the run. Most of the cpu used is by > the stats collector: > $ ps xww | awk '/collector|autovacuum worker/ && !/awk/' > 30212 ? Ss 0:00 postgres: autovacuum launcher process > 30213 ? Ds 0:55 postgres: stats collector process > 30221 ? Ss 0:23 postgres: autovacuum worker process avac > 30231 ? Ss 0:12 postgres: autovacuum worker process avac > 30243 ? Ss 0:11 postgres: autovacuum worker process avac > 30257 ? Ss 0:10 postgres: autovacuum worker process avac > postgresql 9.5beta1 plus my ordered oids/high watermark autovacuum patch: > > seconds elapsed actions chunk sec/av av/min > 13.4 13.4 1000 1000 0.013 4471.9 > 22.9 9.5 2000 1000 0.010 6299.9 > 31.9 8.9 3000 1000 0.009 6718.9 > 40.2 8.3 4000 1000 0.008 7220.2 > 52.2 12.1 5000 1000 0.012 4973.1 > 59.5 7.2 6000 1000 0.007 8318.3 > 69.4 10.0 7000 1000 0.010 6024.7 > 78.9 9.5 8000 1000 0.010 6311.8 > 93.5 14.6 9000 1000 0.015 4105.1 > 104.3 10.7 10000 1000 0.011 5601.7 > 114.4 10.2 11000 1000 0.010 5887.0 > 127.5 13.1 12000 1000 0.013 4580.9 > 140.1 12.6 13000 1000 0.013 4763.0 > 153.8 13.7 14000 1000 0.014 4388.9 > 166.7 12.9 15000 1000 0.013 4638.6 ,,, > A ps sample from partway through the run. Most of the cpu used is by > workers, not the collector. > $ ps xww | awk '/collector|autovacuum worker/ && !/awk/' > 872 ? Ds 0:49 postgres: stats collector process > 882 ? Ds 3:42 postgres: autovacuum worker process avac > 953 ? Ds 3:21 postgres: autovacuum worker process avac > 1062 ? Ds 2:56 postgres: autovacuum worker process avac > 1090 ? Ds 2:34 postgres: autovacuum worker process avac -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
I've collected strace output for a one minute interval with 4 autovacuum workers on my test setup for stock 9.5beta1, Jeff Janes vac_mov_lock patch, and my ordered oids / high watermark patch. These results are normalized to times and number of calls per table analyzed by autovacuum. To refresh, on this setup: Patch Version TPM none 85 vac_mov_lock 600 ordered oids 5225 Strace summary: postgresql 9.5 beta1 cost per table @ 85 tables per minute. ----------------------------------------------------------- calls msec system call [ 4 autovacuum workers ] ------ ------ ------------------- 19.46 196.09 select(0, <<< Waiting for stats snapshot 3.26 1040.46 semop(43188238, <<< Waiting for AutovacuumScheduleLock 2.05 0.83 sendto(8, <<< Asking for stats snapshot calls msec system call [ stats collector ] ------ ------ ------------------- 3.02 0.05 recvfrom(8, <<< Request for snapshot refresh 1.55 248.64 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh 9.5beta1 + Jeff Janes vac_mov_lock.patch cost per table @ 600 tpm ----------------------------------------------------------------- calls msec system call [ 4 autovacuum workers ] ------ ------ ------------------- 24.90 260.16 select(0, <<< Waiting for stats snapshot 1.41 0.02 sendto(8, <<< Asking for stats snapshot calls msec system call [ stats collector ] ------ ------ ------------------- 1.29 0.02 recvfrom(8, <<< Request for snapshot refresh 0.59 62.40 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh 9.5beta1 + David Gould autovacuum ordering patch cost per table @ 5225 tpm -------------------------------------------------------------------------- calls msec system call [ 4 autovacuum workers ] ------ ------ ------------------- 0.63 6.34 select(0, <<< Waiting for stats snapshot 0.21 0.01 sendto(8, <<< Asking for stats snapshot 0.07 0.00 semop(43712518, <<< Waiting for AutovacuumLock calls msec system call [ stats collector ] ------ ------ ------------------- 0.40 0.01 recvfrom(8, <<< Request for snapshot refresh 0.04 6.75 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 30 Oct 2015 23:19:52 -0700 David Gould <daveg@sonic.net> wrote: > The attached patch against REL9_5_STABLE_goes a little further. It > claims the table under the lock, but also addresses the problem of all the > workers racing to redo the same table by enforcing an ordering on all the > workers. No worker can claim a table with an oid smaller than the highest > oid claimed by any worker. That is, instead of racing to the same table, > workers leapfrog over each other. > > In theory the recheck of the stats could be eliminated although this patch > does not do that. It does eliminate the special handling of stats snapshots > for autovacuum workers which cuts back on the excess rewriting of the stats > file somewhat. > > I'll send numbers shortly, but as I recall it is over 100 times better than > the original. I sent performance test data and a setup for reproducing it elsewhere in the thread. I also ran tests on a larger system (128GB mem, many cores, 2x SSD with battery backed raid). This is for an idle system with 100,000 new small tables to analyze. I ran all the test for an hour or 5000 tables processed. "jj" refers to the patch from Jeff Janes, "dg" refers to the attached patch (same as previous). /autovacuum actions per minute/ workers 9.5b1 jj dg ------- ----- ---- ----- 1 183 171 285 4 45 212 1158 8 23 462 1225 Could someone please take a look at the patch and comment? Thanks. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Attachment
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
David Gould wrote: > This is for an idle system with 100,000 new small tables to analyze. I ran > all the test for an hour or 5000 tables processed. "jj" refers to the patch > from Jeff Janes, "dg" refers to the attached patch (same as previous). > > /autovacuum actions per minute/ > workers 9.5b1 jj dg > ------- ----- ---- ----- > 1 183 171 285 > 4 45 212 1158 > 8 23 462 1225 Nice numbers. > Could someone please take a look at the patch and comment? Thanks. 1. What's with all the FIXMEs? 2. I think you need more of an explanation of what your patch actually does. 3. Do we want to backpatch? Changes in behavior aren't acceptable on existing branches, because it might destabilize autovacuum behavior that's been carefully tuned in existing systems. So if we want something to backpatch, ideally it shouldn't change the ordering in which tables are vacuumed, and instead arrive at the same results faster. (I don't care about this restriction myself, but others do and strongly so.) 4. In the master branch, behavior changes are acceptable, so we can do something more invasive. 5. Is it easier to use a binary heap rather than the OID list thing you have? (see src/include/lib/binaryheap.h) I don't think so, but it's worth asking. Note that older branches don't have this, so backpatchable should not rely on it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Tue, 3 Nov 2015 19:24:25 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > David Gould wrote: > > > This is for an idle system with 100,000 new small tables to analyze. I ran > > all the test for an hour or 5000 tables processed. "jj" refers to the patch > > from Jeff Janes, "dg" refers to the attached patch (same as previous). > > > > /autovacuum actions per minute/ > > workers 9.5b1 jj dg > > ------- ----- ---- ----- > > 1 183 171 285 > > 4 45 212 1158 > > 8 23 462 1225 > > Nice numbers. > > > Could someone please take a look at the patch and comment? Thanks. > > 1. What's with all the FIXMEs? Those were a trail of breadcrumbs left to explain some of the issues. I have removed them and revised the comments. > 2. I think you need more of an explanation of what your patch actually > does. I am writing a more complete description of the issues with the current code and what the patch does to to address them. > 3. Do we want to backpatch? Changes in behavior aren't acceptable on > existing branches, because it might destabilize autovacuum behavior > that's been carefully tuned in existing systems. So if we want > something to backpatch, ideally it shouldn't change the ordering in > which tables are vacuumed, and instead arrive at the same results > faster. (I don't care about this restriction myself, but others do and > strongly so.) The current order of autovacuum operations is the physical order of the rows in pg_class plus some jitter depending on which worker is able to grab a table first. It seems unlikely anything could depend on this particular order. The heart of this patch is that it establishes a consistent order for all the workers to take tables from so that they don't compete for work. The issue is that autovacuum is ineffective and costly with large numbers of tables. The recent multixact fixes will expose more users to this issue as they update to 9.3.9/9.4.4 and beyond. > 4. In the master branch, behavior changes are acceptable, so we can do > something more invasive. I will make a version for master. > 5. Is it easier to use a binary heap rather than the OID list thing you > have? (see src/include/lib/binaryheap.h) I don't think so, but it's > worth asking. Note that older branches don't have this, so > backpatchable should not rely on it. Thanks for the suggestion, but the binary heap requires knowing the size in advance. The method of using an array and sorting is modeled on: src/backend/catalog/pg_inherits.c:find_inheritance_children() which has a similar need for a sorted list of indefinite size. Let's move this discussion to hackers to get a bit more coverage, especially if we are considering a version against master. I will post an expanded problem description, the revised patch and test results there after the weekend. -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Josh Berkus
Date:
On 11/06/2015 04:46 PM, David Gould wrote: >> 3. Do we want to backpatch? Changes in behavior aren't acceptable on >> > existing branches, because it might destabilize autovacuum behavior >> > that's been carefully tuned in existing systems. So if we want >> > something to backpatch, ideally it shouldn't change the ordering in >> > which tables are vacuumed, and instead arrive at the same results >> > faster. (I don't care about this restriction myself, but others do and >> > strongly so.) > The current order of autovacuum operations is the physical order of the > rows in pg_class plus some jitter depending on which worker is able to grab > a table first. It seems unlikely anything could depend on this > particular order. I don't know anyone who depends on the ordering of autovacuum, because nobody knows what it is. It's not exactly documented. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
Hi David, did you ever post an updated version of this patch? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Mon, 29 Feb 2016 18:33:50 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Hi David, did you ever post an updated version of this patch? No. Let me fix that now. I've attached my current revision of the patch based on master. This version is significantly better than the original version and resolves multiple issues: - autovacuum workers no longer race each other - autovacuum workers do not revacuum each others tables - autovacuums workers no longer thrash the stats collector which saves a lot of IO when the stats is large. Hopefully the earlier discussion and the comments in the patch are sufficient, but please ask questions if it is not clear. The result is that on a freshly created 40,000 table database with tiny tables that all need an initial analyze the unpatched postgres saturates an SSD updating the stats and manages to process less than tables per minute. With the attached patch it processes several thousand tables per minute. The following is a summary of strace output for the autovacuum workers and the stats collector while the 40k table test is running. The counts and times are the cost per table. postgresql 9.5: 85 tables per minute. Operations per Table calls msec system call [ 4 autovacuum workers ] ------ ------ ------------------- 19.46 196.09 select(0, <<< Waiting for stats snapshot 3.26 1040.46 semop(43188238, <<< Waiting for AutovacuumScheduleLock 2.05 0.83 sendto(8, <<< Asking for stats snapshot calls msec system call [ stats collector ] ------ ------ ------------------- 3.02 0.05 recvfrom(8, <<< Request for snapshot refresh 1.55 248.64 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh + autovacuum contention patch: 5225 tables per minute Operations per Table calls msec system call [ 4 autovacuum workers ] ------ ------ ------------------- 0.63 6.34 select(0, <<< Waiting for stats snapshot 0.21 0.01 sendto(8, <<< Asking for stats snapshot 0.07 0.00 semop(43712518, <<< Waiting for AutovacuumLock calls msec system call [ stats collector ] ------ ------ ------------------- 0.40 0.01 recvfrom(8, <<< Request for snapshot refresh 0.04 6.75 rename("pg_stat_tmp/db_16385.tmp", <<< Snapshot refresh Regards, -dg -- David Gould daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Attachment
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
David Gould wrote: > The following is a summary of strace output for the autovacuum workers > and the stats collector while the 40k table test is running. The counts and > times are the cost per table. > > postgresql 9.5: 85 tables per minute. > + autovacuum contention patch: 5225 tables per minute Great stuff. I intend to look at this as a backpatchable bug fix during this commitfest. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Joe Conway
Date:
On 03/02/2016 07:35 AM, Alvaro Herrera wrote: > David Gould wrote: >> The following is a summary of strace output for the autovacuum workers >> and the stats collector while the 40k table test is running. The counts = and >> times are the cost per table. >> >> postgresql 9.5: 85 tables per minute. >=20 >> + autovacuum contention patch: 5225 tables per minute >=20 > Great stuff. I intend to look at this as a backpatchable bug fix during > this commitfest. +1! --=20 Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
David Gould wrote: > On Mon, 29 Feb 2016 18:33:50 -0300 > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > Hi David, did you ever post an updated version of this patch? > > No. Let me fix that now. I've attached my current revision of the patch > based on master. This version is significantly better than the original > version and resolves multiple issues: Thanks for the patch. I spent some time studying it and testing it. I think there's a minor bug in this latest version: the "continue" when the "for" loop exits because of all relids being less than highest_oid_claimed should be a "break" instead. Otherwise the worker will uselessly loop over the whole list of OIDs only to find each time that it has nothing to do. After seeing it work under pgbench inserting and deleting large numbers of tuples into a large number of tables, I figured that the algorithm you described doesn't actually work in certain corner cases: consider what happens if table with oid INT_MAX is huge and very bloated. At some point one worker is going to be stuck working on it for a long time, and from that point until the big vacuum is finished, all workers will see that one worker is processing a table with maximal OID and decide they can skip all tables in their work lists. I tested this by creating one table after all the other test tables were created (so it'd have a higher OID than the others -- but keep an eye on OID wraparound in your test install), setting a nonzero vacuum_cost_limit, then filling it with dead tuples in pgbench via "begin; insert into deadtups select * from generate_series(1, 100000); rollback". I can see that one autovacuum process is stuck with it for over a minute, and while this one runs there is no other autovacuum activity in the database. We need to cater for this scenario, because it is going to bit somebody. I think it might work to still use the OID-sorted list of tables to process, but instead of pruning based just on the highest OID across all workers, we need to consider only the OID ranges that we've seen processed while the current worker has been running. Seems fiddly :-( Another thing that makes me uneasy about this approach is that we've had complaints that some tables that have just enough dead tuples generated between each autovacuum run on a DB are vacuumed by all workers all the time, cause vacuum-starvation of other tables that accumulate more dead tuples. It seems to me we should prioritise tables to vacuum depending on how much they actually need vacuum. If we depend on an algorithm that sorts by OID, we will never be able to prioritise anything. One seemingly trivial way to attack this problem is to forget the sorting idea and just keep track in shared memory of the last (say) 100 tables that have been vacuumed by each worker, alongside with a timestamp for each. Using that, each worker can determine whether each table on its own list has been vacuumed after it obtained its list, and skip it if so. Since there's no specific sort order required, we could in the future change the ordering in which tables are processed for other reasons. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Tue, 15 Mar 2016 17:40:26 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > David Gould wrote: > > On Mon, 29 Feb 2016 18:33:50 -0300 > > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > > Hi David, did you ever post an updated version of this patch? > > > > No. Let me fix that now. I've attached my current revision of the patch > > based on master. This version is significantly better than the original > > version and resolves multiple issues: > > Thanks for the patch. I spent some time studying it and testing it. I > think there's a minor bug in this latest version: the "continue" when > the "for" loop exits because of all relids being less than > highest_oid_claimed should be a "break" instead. Otherwise the worker > will uselessly loop over the whole list of OIDs only to find each time > that it has nothing to do. Good catch. Thanks! > After seeing it work under pgbench inserting and deleting large numbers > of tuples into a large number of tables, I figured that the algorithm > you described doesn't actually work in certain corner cases: consider > what happens if table with oid INT_MAX is huge and very bloated. At > some point one worker is going to be stuck working on it for a long > time, and from that point until the big vacuum is finished, all workers > will see that one worker is processing a table with maximal OID and > decide they can skip all tables in their work lists. > > I tested this by creating one table after all the other test tables were > created (so it'd have a higher OID than the others -- but keep an eye on > OID wraparound in your test install), setting a nonzero > vacuum_cost_limit, then filling it with dead tuples in pgbench via > "begin; insert into deadtups select * from generate_series(1, 100000); > rollback". I can see that one autovacuum process is stuck with it for > over a minute, and while this one runs there is no other autovacuum > activity in the database. > > We need to cater for this scenario, because it is going to bit somebody. > I think it might work to still use the OID-sorted list of tables to > process, but instead of pruning based just on the highest OID across all > workers, we need to consider only the OID ranges that we've seen > processed while the current worker has been running. Seems fiddly :-( This is a good observation. I was focused on the bad behavior when there are many small tables, not one large table at the end of the list. The problem with all this vacuum code is that every one who worked on it seemed to focus on a particular use scenario and was blind to the other. Including me apparently. However, I think I have a simple fix. I'll think about it a bit and post it once it has gelled a bit. > Another thing that makes me uneasy about this approach is that we've had > complaints that some tables that have just enough dead tuples generated > between each autovacuum run on a DB are vacuumed by all workers all the > time, cause vacuum-starvation of other tables that accumulate more dead > tuples. It seems to me we should prioritise tables to vacuum depending > on how much they actually need vacuum. If we depend on an algorithm > that sorts by OID, we will never be able to prioritise anything. > > > One seemingly trivial way to attack this problem is to forget the > sorting idea and just keep track in shared memory of the last (say) 100 > tables that have been vacuumed by each worker, alongside with a > timestamp for each. Using that, each worker can determine whether each > table on its own list has been vacuumed after it obtained its list, and > skip it if so. Since there's no specific sort order required, we could > in the future change the ordering in which tables are processed for > other reasons. Your discomfort is well founded. The more I learned about autovacuum scheduling the less it made sense. Really, there should be some sort of priority order for vacuuming based on some metric of need and tables should be processed in that order. I'm willing to work on that, and some other autovacuum issues that I found too. But this patch is intended to be back patchable. More later. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Tue, 15 Mar 2016 14:28:16 -0700 David Gould <daveg@sonic.net> wrote: > On Tue, 15 Mar 2016 17:40:26 -0300 > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > David Gould wrote: > > > On Mon, 29 Feb 2016 18:33:50 -0300 > > > Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > > > > Hi David, did you ever post an updated version of this patch? > > > > > > No. Let me fix that now. I've attached my current revision of the patch > > > based on master. This version is significantly better than the original > > > version and resolves multiple issues: > > > > Thanks for the patch. I spent some time studying it and testing it. I > > think there's a minor bug in this latest version: the "continue" when > > the "for" loop exits because of all relids being less than > > highest_oid_claimed should be a "break" instead. Otherwise the worker > > will uselessly loop over the whole list of OIDs only to find each time > > that it has nothing to do. > > Good catch. Thanks! Actually, looking at this more closely, I don't see the problem. I'm assuming you meant the continue after "if (relid ==InvalidOid)": /* * 2. While there are items that have not been visited... */ for(oid_idx = 0; oid_idx < numoids; ) { ... /* * 4a. Skip past the highest_oid_claimed to find the next oid to work on. */ relid = InvalidOid; while (relid <= highest_oid_claimed && oid_idx < numoids) relid = table_oids[oid_idx++]; if (relid <= highest_oid_claimed) relid = InvalidOid; /* 4b. Claim the chosen table by storing its oid in shared memory. */ MyWorkerInfo->wi_tableoid = relid; LWLockRelease(AutovacuumLock); /* If we reached the end of the list there is nothing to do. */ if (relid == InvalidOid) continue; ... } At the continue oid_idx will have been incremented by: while (relid <= highest_oid_claimed && oid_idx < numoids) relid = table_oids[oid_idx++]; to be equal to numoids so the for loop will exit immediately as the loop condition is "oid_idx < numoids". "break" would work as well, but basically there is only one reason to exit the for loop, exhausting the list of oids, so continue seemed preferable. Or am I missing something? -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Jim Nasby
Date:
On 3/15/16 4:28 PM, David Gould wrote: > The more I learned about autovacuum > scheduling the less it made sense. Really, there should be some sort of > priority order for vacuuming based on some metric of need and tables should be > processed in that order. +1. What's there now is incredibly braindead. I actually wonder if instead of doing all the the hard way in C whether we should just use SPI for each worker to build it's list of tables. The big advantage that would provide is the ability for users to customize the scheduling, but I suspect it'd make the code simpler too. The same is also true for deciding what database needs to be vacuumed next. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tom Lane
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > I actually wonder if instead of doing all the the hard way in C whether > we should just use SPI for each worker to build it's list of tables. The > big advantage that would provide is the ability for users to customize > the scheduling, but I suspect it'd make the code simpler too. By that you mean "user can write a SQL query that determines autovacuum targets"? -1. That would bring us back to the bad old days where a poorly-thought-out vacuum cron job would miss tables and lead to a database shutdown. Not to mention SQL injection risks. If we need to improve autovac's strategy, let's do that, but not by deeming it the user's problem. regards, tom lane
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 18 Mar 2016 09:39:34 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > > I actually wonder if instead of doing all the the hard way in C whether > > we should just use SPI for each worker to build it's list of tables. The > > big advantage that would provide is the ability for users to customize > > the scheduling, but I suspect it'd make the code simpler too. > > By that you mean "user can write a SQL query that determines autovacuum > targets"? -1. That would bring us back to the bad old days where a > poorly-thought-out vacuum cron job would miss tables and lead to a > database shutdown. Not to mention SQL injection risks. > > If we need to improve autovac's strategy, let's do that, but not by > deeming it the user's problem. I have some thoughts for a different approach. In short, the stats collector actually knows what needs vacuuming because queries that create dead tuples tell it. I'm considering have the stats collector maintain a queue of vacuum work and that autovacuum request work from the stats collector. When I have something more concrete I'll post it on hackers. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Alvaro Herrera
Date:
David Gould wrote: > I have some thoughts for a different approach. In short, the stats collector > actually knows what needs vacuuming because queries that create dead tuples > tell it. I'm considering have the stats collector maintain a queue of > vacuum work and that autovacuum request work from the stats collector. When I > have something more concrete I'll post it on hackers. Hm, ideally we would have something backpatchable, but this new idea doesn't sound so to me. I think that having some variant of the patch you proposed previously would be better for the stable branches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tomas Vondra
Date:
Hi, On 03/18/2016 11:13 PM, Alvaro Herrera wrote: > David Gould wrote: > >> I have some thoughts for a different approach. In short, the stats collector >> actually knows what needs vacuuming because queries that create dead tuples >> tell it. I'm considering have the stats collector maintain a queue of >> vacuum work and that autovacuum request work from the stats collector. When I >> have something more concrete I'll post it on hackers. > > Hm, ideally we would have something backpatchable, but this new idea > doesn't sound so to me. I think that having some variant of the patch > you proposed previously would be better for the stable branches. I believe implementing such queue in the pgstat will be rather difficult, for example because you can tweak the limits in reloptions, and those are only accessible in the database. So pgstat does not know that and thus can't really decide what needs autovacuuming etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
Tom Lane
Date:
David Gould <daveg@sonic.net> writes: > I have some thoughts for a different approach. In short, the stats collector > actually knows what needs vacuuming because queries that create dead tuples > tell it. I'm considering have the stats collector maintain a queue of > vacuum work and that autovacuum request work from the stats collector. When I > have something more concrete I'll post it on hackers. Uh, what? The autovacuum code already looks at the stats maintained by the collector. If what you said means anything, it means "let's move the autovac scheduling logic into the collector", which seems neither useful nor sound from a modularity standpoint. regards, tom lane
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 18 Mar 2016 19:13:17 -0300 Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > David Gould wrote: > > > I have some thoughts for a different approach. In short, the stats collector > > actually knows what needs vacuuming because queries that create dead tuples > > tell it. I'm considering have the stats collector maintain a queue of > > vacuum work and that autovacuum request work from the stats collector. When I > > have something more concrete I'll post it on hackers. > > Hm, ideally we would have something backpatchable, but this new idea > doesn't sound so to me. I think that having some variant of the patch > you proposed previously would be better for the stable branches. Sorry for the alarm. My plan is to update the current patch to address the problem you described. This is intended to be backpatchable. The other thoughts would be for new development for a future release. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.
Re: BUG #13750: Autovacuum slows down with large numbers of tables. More workers makes it slower.
From
David Gould
Date:
On Fri, 18 Mar 2016 18:23:51 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Gould <daveg@sonic.net> writes: > > I have some thoughts for a different approach. In short, the stats collector > > actually knows what needs vacuuming because queries that create dead tuples > > tell it. I'm considering have the stats collector maintain a queue of > > vacuum work and that autovacuum request work from the stats collector. When I > > have something more concrete I'll post it on hackers. > > Uh, what? The autovacuum code already looks at the stats maintained by > the collector. If what you said means anything, it means "let's move the > autovac scheduling logic into the collector", which seems neither useful > nor sound from a modularity standpoint. Well, there is that. Thats why I'm still considering and not yet posting a concrete proposal on hackers. Really, there is no convenient location for this decision making as no single process has all the information needed to optimize autovacuum scheduling across databases. It's a bit of a puzzle. -dg -- David Gould 510 282 0869 daveg@sonic.net If simplicity worked, the world would be overrun with insects.