Re: Postgres stats collector showing high disk I/O - Mailing list pgsql-general

From Justin Pasher
Subject Re: Postgres stats collector showing high disk I/O
Date
Msg-id 4BF5974D.8030505@newmediagateway.com
Whole thread Raw
In response to Re: Postgres stats collector showing high disk I/O  (Justin Pasher <justinp@newmediagateway.com>)
Responses Re: Postgres stats collector showing high disk I/O  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-general
----- Original Message -----
> From: Justin Pasher <justinp@newmediagateway.com>
> Date: Fri, 23 Apr 2010 17:46:16 -0500
> Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
> To: Alvaro Herrera <alvherre@commandprompt.com>
> CC: pgsql-general@postgresql.org
>
> ----- Original Message -----
> From: Alvaro Herrera <alvherre@commandprompt.com>
> Date: Fri, 23 Apr 2010 18:28:03 -0400
> Subject: Re: [GENERAL] Postgres stats collector showing high disk I/O
> To: Justin Pasher <justinp@newmediagateway.com>
> CC: depesz@depesz.com, pgsql-general@postgresql.org
>> Justin Pasher wrote:
>>
>>
>>> Agh... I used pg_stats_reset (with an s) when searching for it. I
>>> ran the function and it returned true, but the stats file only
>>> shrunk by ~100k (still over 18MB total). Is there something else I
>>> need to do? Does this mean the file is mostly bloated with bogus
>>> data that it can't "reset"? I'm guessing I should just try to delete
>>> the file outright?
>>>
>>
>> Err, yeah, pg_stat_reset only resets the stats for the current database.
>> You need to reset for all databases, or alternatively, shut down the
>> server, remove the file, and restart
>>
>
> Ahh, yes. I probably should have realized that. I ran the function on
> all of the databases (138 total), and now the stats file is down to
> ~400k. The disk I/O is also practically nothing now.
>
> So now as a continuation of my original message, what would cause the
> stats file to get so big for what seems like (IMO) a small number of
> databases? I have a Postgres 7.4 cluster that has about 250 databases,
> but it's stats file is only 3.5MB. Do I need to look into avoiding a
> bunch of CREATE/DROP statements (a suggestion in the archives)? I
> don't know the actual usage patterns of the code base (I'll have to
> get with the developers), but I think at worst, they might create temp
> tables from time to time. As a matter of fact, I just checked the
> stats file again (about 10 minutes later) and it's doubled to 800K. Is
> Postgres just trying to store too much information in the statistics
> file? Ultimately, the main statistics I care about are current
> connections and queries being run.
>
> A previous post in the archives from Tom said that vacuum (even
> autovacuum) should clean up potential stat file bloat. Do I need to
> tweak my autovacuum settings? Everything is currently set to the
> defaults because I've never had any performance issues that warranted
> tweaking the settings.

Can anyone provide any more direction on this one?

Whenever I clear out the stats for all of the databases, the file
shrinks down to <1MB. However, it only takes about a day for it to get
back up to ~18MB and then the stats collector process start the heavy
disk writing again. I do know there are some tables in the database that
are filled and emptied quite a bit (they are used as temporary "queue"
tables). The code will VACUUM FULL ANALYZE after the table is emptied to
get the physical size back down and update the (empty) stats. A plain
ANALYZE is also run right after the table is filled but before it starts
processing, so the planner will have good stats on the contents of the
table. Would this lead to pg_stat file bloat like I'm seeing? Would a
CLUSTER then ANALYZE instead of a VACUUM FULL ANALYZE make any
difference? The VACUUM FULL code was setup quite a while back before the
coders knew about CLUSTER.

Thanks.

--
Justin Pasher

pgsql-general by date:

Previous
From: dennis jenkins
Date:
Subject: Re: Connection lost
Next
From: Spencer Anderson
Date:
Subject: PostgreSQL database and other daemons crashing