Thread: Large pgstat.stat file causes I/O storm

Large pgstat.stat file causes I/O storm

From
Cristian Gafton
Date:
Hello all,

I have a ~150GB sized server, containing two databases that are active in 
mostly read mode. I have noticed lately that the global/pgstat.stat file 
is somewhere around 1MB freshly after a restart, but at some point it 
baloons to 74MB in size for no apparent reason, after a few hours of 
uptime. Needless to say, having the stats collector dump 74MB of stuff on 
disk on its every loop takes a big bite of the I/O capabilities of this 
box.

Looking at all the othe replicas I have of this database (but which are 
under a more lightweight read load), the pgstat.stat file again is rather 
small in size. Am I right to assume that a 74MB pgstat.stat file is not 
normal - and what might have caused it?

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.



Re: Large pgstat.stat file causes I/O storm

From
Cristian Gafton
Date:
On Tue, 29 Jan 2008, Cristian Gafton wrote:

> I have a ~150GB sized server, containing two databases that are active in 
> mostly read mode. I have noticed lately that the global/pgstat.stat file is 
> somewhere around 1MB freshly after a restart, but at some point it baloons to 
> 74MB in size for no apparent reason, after a few hours of uptime. Needless to 
> say, having the stats collector dump 74MB of stuff on disk on its every loop 
> takes a big bite of the I/O capabilities of this box.

Of course, leaving out the most important thing - this is postgresql 8.2.6 
on x86_64

> Looking at all the othe replicas I have of this database (but which are under 
> a more lightweight read load), the pgstat.stat file again is rather small in 
> size. Am I right to assume that a 74MB pgstat.stat file is not normal - and 
> what might have caused it?

Cristian
-- 
Cristian Gafton
rPath, Inc.



Re: Large pgstat.stat file causes I/O storm

From
Tom Lane
Date:
Cristian Gafton <gafton@rpath.com> writes:
> On Tue, 29 Jan 2008, Cristian Gafton wrote:
>> I have a ~150GB sized server, containing two databases that are active in 
>> mostly read mode. I have noticed lately that the global/pgstat.stat file is 
>> somewhere around 1MB freshly after a restart, but at some point it baloons to 
>> 74MB in size for no apparent reason, after a few hours of uptime. Needless to 
>> say, having the stats collector dump 74MB of stuff on disk on its every loop 
>> takes a big bite of the I/O capabilities of this box.

> Of course, leaving out the most important thing - this is postgresql 8.2.6 
> on x86_64

Hmm ... do you have autovacuum enabled?  If not, what's the vacuuming
policy on that box?  I'm wondering if this is triggered by something
deciding to vacuum or analyze a bunch of otherwise-unused tables, and
thereby causing stats entries to be created for those tables.

You could investigate by comparing the contents of the stats views
before and after the file balloons.  I would expect to see a lot more
rows, and the key is exactly what non-null activity is recorded in
the extra rows.
        regards, tom lane


Re: Large pgstat.stat file causes I/O storm

From
Cristian Gafton
Date:
On Tue, 29 Jan 2008, Tom Lane wrote:

>> On Tue, 29 Jan 2008, Cristian Gafton wrote:
>>> I have a ~150GB sized server, containing two databases that are active in
>>> mostly read mode. I have noticed lately that the global/pgstat.stat file is
>>> somewhere around 1MB freshly after a restart, but at some point it baloons to
>>> 74MB in size for no apparent reason, after a few hours of uptime. Needless to
>>> say, having the stats collector dump 74MB of stuff on disk on its every loop
>>> takes a big bite of the I/O capabilities of this box.
>
>> Of course, leaving out the most important thing - this is postgresql 8.2.6
>> on x86_64
>
> Hmm ... do you have autovacuum enabled?  If not, what's the vacuuming
> policy on that box?  I'm wondering if this is triggered by something
> deciding to vacuum or analyze a bunch of otherwise-unused tables, and
> thereby causing stats entries to be created for those tables.

Autovacuum is disabled, since the database is mostly read only. There is a 
"vacuumdb -a -z" running nightly on the box. However, the application that 
queries it does a lot of work with temporary tables - would those bloat 
the stats at all?

> You could investigate by comparing the contents of the stats views
> before and after the file balloons.  I would expect to see a lot more
> rows, and the key is exactly what non-null activity is recorded in
> the extra rows.

Any one of the stats views in particular? Currently all of the stats_*
flags are set to "on".

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.



Re: Large pgstat.stat file causes I/O storm

From
Tom Lane
Date:
Cristian Gafton <gafton@rpath.com> writes:
> Autovacuum is disabled, since the database is mostly read only. There is a 
> "vacuumdb -a -z" running nightly on the box. However, the application that 
> queries it does a lot of work with temporary tables - would those bloat 
> the stats at all?

Conceivably, if you mean a lot of short-lived tables rather than a lot
of operations on a few tables.  However, I'd think that would result in
a steady accumulation of stats entries, not a sudden jump as you seemed
to describe.
        regards, tom lane


Re: Large pgstat.stat file causes I/O storm

From
Cristian Gafton
Date:
On Tue, 29 Jan 2008, Tom Lane wrote:

> Cristian Gafton <gafton@rpath.com> writes:
>> Autovacuum is disabled, since the database is mostly read only. There is a
>> "vacuumdb -a -z" running nightly on the box. However, the application that
>> queries it does a lot of work with temporary tables - would those bloat
>> the stats at all?
>
> Conceivably, if you mean a lot of short-lived tables rather than a lot
> of operations on a few tables.  However, I'd think that would result in
> a steady accumulation of stats entries, not a sudden jump as you seemed
> to describe.

We are churning through a bunch of short-lived temp tables. Since I 
reported the problem, the pgstat file is now sitting at 85M, yet the 
pg_stat* tables barely have any entries in them:
            count(*)
pg_stats        298
pg_statistic        298
pg_stat_all_indexes    76
pg_stat_all_tables    76
pg_statio_all_tables    56
pg_statio_all_indexes    76

Is there a way to inspect the pgstat file and see what's in it that it is 
taking all this space? (it's not the space that bothers me, it's the fact 
that the statistics collector has to dump 85MB of stuff once a second to 
disk...)

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.



Re: Large pgstat.stat file causes I/O storm

From
Tom Lane
Date:
Cristian Gafton <gafton@rpath.com> writes:
> We are churning through a bunch of short-lived temp tables.

I think that's probably the root of the problem ...

> Since I 
> reported the problem, the pgstat file is now sitting at 85M, yet the 
> pg_stat* tables barely have any entries in them:

>              count(*)
> pg_stats        298
> pg_statistic        298
> pg_stat_all_indexes    76
> pg_stat_all_tables    76
> pg_statio_all_tables    56
> pg_statio_all_indexes    76

Those views are joins against pg_class, so only tables that have live
pg_class rows can possibly show up there.  You could try remembering the
OIDs of some temp tables and probing the underlying pg_stat_get_xxx()
functions to see if there are stats-table entries for them.

(Pokes around in the code...)  I think the problem here is that the only
active mechanism for flushing dead stats-table entries is
pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an
autovacuum.  Once-a-day VACUUM isn't gonna cut it for you under those
circumstances.  What you might do is just issue a VACUUM on some
otherwise-uninteresting small table, once an hour or however often you
need to keep the stats file bloat to a reasonable level.

There is a pgstat_drop_relation() function to tell the stats collector
to drop a single table entry, but it's not being called from anyplace.
We probably ought to try a bit harder to make that work.  The problem
is described here:

2007-07-08 18:23  tgl
* src/: backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c,include/pgstat.h (REL8_1_STABLE),
backend/postmaster/pgstat.c,backend/storage/smgr/smgr.c,include/pgstat.h (REL8_2_STABLE),backend/postmaster/pgstat.c,
backend/storage/smgr/smgr.c,include/pgstat.h:Remove the pgstat_drop_relation() call fromsmgr_internal_unlink(), because
wedon't know at that point whichrelation OID to tell pgstat to forget.    The code was passing therelfilenode, which is
incorrect,and could possibly cause someother relation's stats to be zeroed out.  While we could try toclean this up, it
seemsmuch simpler and more reliable to let thenext invocation of pgstat_vacuum_tabstat() fix things; which indeedis how
itworked before I introduced the buggy code into 8.1.3 andlater :-(.  Problem noticed by Itagaki Takahiro, fix is
persubsequentdiscussion.
 
        regards, tom lane


Re: Large pgstat.stat file causes I/O storm

From
Cristian Gafton
Date:
On Tue, 29 Jan 2008, Tom Lane wrote:

> (Pokes around in the code...)  I think the problem here is that the only
> active mechanism for flushing dead stats-table entries is
> pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an
> autovacuum.  Once-a-day VACUUM isn't gonna cut it for you under those
> circumstances.  What you might do is just issue a VACUUM on some
> otherwise-uninteresting small table, once an hour or however often you
> need to keep the stats file bloat to a reasonable level.

I just ran a vacuumdb -a on the box - the pgstat file is still >90MB in 
size. If vacuum is supposed to clean up the cruft from pgstat, then I 
don't know if we're looking at the right cruft - I kind of expected the 
pgstat file to go down in size and the I/O storm to subside, but that is 
not happening after vacuum.

I will try to instrument the application to record the oids of the temp 
tables it creates and investigate from that angle, but in the meantime is 
there any way to reset the stats collector altogether? Could this be a 
corrupt stat file that gets read and written right back on every loop 
without any sort of validation?

Thanks,

Cristian
-- 
Cristian Gafton
rPath, Inc.



Re: Large pgstat.stat file causes I/O storm

From
Tom Lane
Date:
Cristian Gafton <gafton@rpath.com> writes:
> I just ran a vacuumdb -a on the box - the pgstat file is still >90MB in
> size. If vacuum is supposed to clean up the cruft from pgstat, then I
> don't know if we're looking at the right cruft - I kind of expected the
> pgstat file to go down in size and the I/O storm to subside, but that is
> not happening after vacuum.

Hmph ... I did a simple test here involving creating a lot of temp
tables, and indeed it made the stats file bigger, but the size went
right down again after vacuuming.  Is it possible that the vacuumdb
failed to connect to the particular database in which the temp tables
are coming and going?

> I will try to instrument the application to record the oids of the temp
> tables it creates and investigate from that angle, but in the meantime is
> there any way to reset the stats collector altogether? Could this be a
> corrupt stat file that gets read and written right back on every loop
> without any sort of validation?

There's stats_reset_on_server_start (sp?), and I think 8.2 also has a
stats-reset function.  But what might be more interesting is to pull the
file-reading function out of pgstat.c and dump out the stats file in
readable form to see what the heck is in there.  (If you decide to try
resetting the stats, I'd suggest saving a copy of the stats file first
for possible analysis later.)  I have the beginnings of such a program
laying about, which I'll attach --- note that it was last used for 8.1
and might require some tweaks for 8.2, and that you'd need to flesh it
out a lot if you want details about individual entries instead of just
a count.

            regards, tom lane

/*
 * dumpstat --- simple standalone program to read and analyze a PG stats
 * file.  Based on pgstat_read_statsfile() from 8.1 sources.
 *
 * Currently works with either 8.0 or 8.1 formats depending on which
 * headers it is compiled against.
 */
#include "postgres.h"

#include "pgstat.h"


int
main(int argc, char **argv)
{
    PgStat_StatDBEntry dbbuf;
    PgStat_StatTabEntry tabbuf;
    PgStat_StatBeEntry beentry;
    FILE       *fpin;
    int32        format_id;
    int            maxbackends = 0;
    int            havebackends = 0;
    int            havedbs = 0;
    int            havetabs = 0;

    /*
     * Try to open the status file. If it doesn't exist, the backends simply
     * return zero for anything and the collector simply starts from scratch
     * with empty counters.
     */
    if ((fpin = fopen(argv[1], "rb")) == NULL)
    {
        perror(argv[1]);
        return 1;
    }

    /*
     * Verify it's of the expected format.
     */
#ifdef PGSTAT_FILE_FORMAT_ID
    if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
        || format_id != PGSTAT_FILE_FORMAT_ID)
    {
        fprintf(stderr, "corrupted pgstat.stat file\n");
        goto done;
    }
#endif

    /*
     * We found an existing collector stats file. Read it and put all the
     * hashtable entries into place.
     */
    for (;;)
    {
        switch (fgetc(fpin))
        {
                /*
                 * 'D'    A PgStat_StatDBEntry struct describing a database
                 * follows. Subsequently, zero to many 'T' entries will follow
                 * until a 'd' is encountered.
                 */
            case 'D':
                if (fread(&dbbuf, 1, sizeof(dbbuf), fpin) != sizeof(dbbuf))
                {
                    fprintf(stderr, "corrupted pgstat.stat file\n");
                    goto done;
                }
                havedbs++;
                break;

                /*
                 * 'd'    End of this database.
                 */
            case 'd':
                break;

                /*
                 * 'T'    A PgStat_StatTabEntry follows.
                 */
            case 'T':
                if (fread(&tabbuf, 1, sizeof(tabbuf), fpin) != sizeof(tabbuf))
                {
                    fprintf(stderr, "corrupted pgstat.stat file\n");
                    goto done;
                }
                havetabs++;
                break;

                /*
                 * 'M'    The maximum number of backends to expect follows.
                 */
            case 'M':
                if (fread(&maxbackends, 1, sizeof(maxbackends), fpin) !=
                    sizeof(maxbackends))
                {
                    fprintf(stderr, "corrupted pgstat.stat file\n");
                    goto done;
                }
                if (maxbackends == 0)
                    goto done;
                break;

                /*
                 * 'B'    A PgStat_StatBeEntry follows.
                 */
            case 'B':
                /*
                 * Read it directly into the table.
                 */
                if (fread(&beentry, 1, sizeof(PgStat_StatBeEntry), fpin) !=
                    sizeof(PgStat_StatBeEntry))
                {
                    fprintf(stderr, "corrupted pgstat.stat file\n");
                    goto done;
                }

                havebackends++;
                break;

                /*
                 * 'E'    The EOF marker of a complete stats file.
                 */
            case 'E':
                goto done;

            default:
                fprintf(stderr, "corrupted pgstat.stat file at %ld\n",
                        ftell(fpin) - 1);
                goto done;
        }
    }

done:
    fclose(fpin);

    printf("found %d backends of %d (%ld bytes)\n", havebackends, maxbackends,
           havebackends * (long) sizeof(PgStat_StatBeEntry));
    printf("%d databases (%ld bytes)\n", havedbs,
           havedbs * (long) sizeof(PgStat_StatDBEntry));
    printf("%d tables (%ld bytes)\n", havetabs,
           havetabs * (long) sizeof(PgStat_StatTabEntry));

    return 0;
}