Re: Postgres 9.1.4 - high stats collector IO usage - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Postgres 9.1.4 - high stats collector IO usage
Date
Msg-id CAMkU=1z2mCOuue7XktPDHt9D2HWR4D+8FyvCKmv+YDfd0ZfBHQ@mail.gmail.com
Whole thread Raw
In response to Postgres 9.1.4 - high stats collector IO usage  (David Barton <dave@oneit.com.au>)
Responses Re: Postgres 9.1.4 - high stats collector IO usage
List pgsql-performance
On Fri, Jul 27, 2012 at 9:33 PM, David Barton <dave@oneit.com.au> wrote:
> Hi,
>
> I am running postgres 9.1.4 on Ubuntu 12.04 and the stats collector is
> generating very high IO usage even when nothing appears to be happening on
> the system.
>
> I have roughly 150 different databases, each of which is running in 1 of
> roughly 30 tablespaces.  The databases are small (the dump of most is are
> under 100M, and all but 3 are under 1G, nothing larger than 2G).

It isn't the size of the data that matters, but the number of objects.
 It sounds like your databases have about 150 statistics-containing
objects each, in order to come up with a 3.5MB stats file.

What do you gain by using databases rather than schema to do the segregation?

> Previously iotop reported the disk write speed, at ~6MB / second.

So that corresponds to about 2 physical write-outs of the stats file
per second.  Are you using ext4?  It has the peculiar (to me) property
that when a file is renamed out of existence, it writes out all of
that file's obsolete dirty buffers, rather than just dropping them as
uninteresting to anyone.  That generates about 10 times the physical
IO as the ext3 file system does.  And of course about infinite times
the physical IO as a tmpfs.

> FWIW, I just migrated all these databases over to this new server by
> restoring from pg_dump  I was previously experiencing this on 8.3, which was
> why I upgraded to 9.1 and I also have another server with similar problems
> on 9.1.
>
> Any help would be sincerely appreciated.

I think the first line of defense would be using /dev/shm to hold the
stats file.  I don't see any downside to that.  You are reading and
writing that file so ferociously anyway that it is always going to be
taking up RAM, no matter where you put it.  Indeed, under ext4 you
might use even have several copies of it all locked into RAM as they
wait to reach the disk before being dropped.

Increasing the naptime, as you have already done, will also decrease
the physical IO, but that has the trade-off of risking bloat.  (But
since you are running 150 databases on one machine, I doubt any of
them are active enough for the risk of bloat to be all that great).
However using /dev/shm should eliminate the IO entirely with no
trade-off at all.

But with /dev/shm the CPU usage of repeatedly formatting, writing,
reading, and parsing the stat file will still be considerable, while
increasing the naptime will reduce that as well.

As far as coding changes to overcome the fundamental problem:

A relatively easy change would be to make any given autovacuum worker
on start up tolerate a stats file that is out of date by up to, say,
naptime/5.  That would greatly reduce the amount of writing the stats
collector needs to do (assuming that few tables actually need
vacuuming during any given cycle), but wouldn't change the amount of
reading a worker needs to do because it still needs to read the file
each time as it doesn't inherit the stats from anyone.  I don't think
it would be a problem that a table which becomes eligible for
vacuuming in the last 20% of a cycle would have to wait for one more
round.  Especially as this change might motivate one to reduce the
naptime since doing so will be cheaper.

But it seems like maybe the stats collector could use a ground-up
recoding.  Maybe it could use a shared relation to store the stats
within the database cluster itself, so that edits could be done in
place per database rather than re-writing the entire cluster's stats?
 But I certainly am not volunteering to take on that task.

A compromise might be to have one stats file per database.  That way
any given backend only needs to read in the database file it cares
about, and the stat's collector only needs to write out the one
database asked of it.  This change could be mostly localized to just
pgstat.c, I think.

Cheers,

Jeff


pgsql-performance by date:

Previous
From: Anthony Presley
Date:
Subject: Improve DB Size / Performance with Table Refactoring
Next
From: Jeff Davis
Date:
Subject: Re: Deferred constraints performance impact ?