On Fri, 30 Oct 2015 12:51:43 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> Good point ... shouldn't we have already checked the stats before ever
> >> deciding to try to claim the table?
>
> > The second check is there to allow for some other worker (or manual
> > vacuum) having vacuumed it after we first checked, but which had
> > finished before we check the array of current jobs.
>
> I wonder whether that check costs more than it saves.
It does indeed. It drives the stats collector wild. And of course if there
are lots of tables and indexes the stats temp file gets very large so that
it can take a long time (seconds) to rewrite it. This happens for each
worker for each table that is a candidate for vacuuming.
Since it would not be convenient to provide a copy of the clients 8TB
database I have made a standalone reproduction. The attached files:
build_test_instance.sh - create a test instance
datagen.py - used by above to populate it with lots of tables
logbyv.awk - count auto analyze actions in postgres log
trace.sh - strace the stats collector and autovacuum workers
tracereport.sh - list top 50 calls in strace output
The test process is to run the build_test_instance script to create an
instance with one database with a large number of tiny tables. During the
setup autovacuuming is off. Then make a tarball of the instance for reuse.
For each test case, untar the instance, set the number of workers and start
it. After a short time autovacuum will start workers to analyze the new
tables. Expect to see the stats collector doing lots of writing.
You may want to use tmpfs or a ramdisk for the data dir for building the
test instance. The configuration is sized for reasonable desktop, 8 to 16GB
of memory and an SSD.
-dg
--
David Gould 510 282 0869 daveg@sonic.net
If simplicity worked, the world would be overrun with insects.