Thread: corrupted statistics file "pg_stat_tmp/pgstat.stat"

corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
"Carl von Clausewitz"
Date:

Hi All,

 

I’ve just made a clean install for PostgreSQL 9.1 (postgresql-server-9.1.4, postgresql-contrib-9.1.4) on a FreeBSD 9 (FreeBSD 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64).

 

I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf (or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS in every second) of error messages like that in /var/log/postgresql.log:

 

*** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING:  corrupted statistics file "pg_stat_tmp/pgstat.stat"

 

The logs are going to the syslog, and there are no more Warning, Error, or other messages. If I do a full Vacuum on the database, and I restart the system, everything works fine, no more error messages, until 2-3 days – and everything starts from the beginning.

 

Two weeks ago, we just bought a new server, I’ve installed the same version of FreeBSD and PostgreSQL, and there is the same problem on that machine too – without any usage, or load, this is a test server for us now.

 

The disks (SATA3 HDD), and the controllers has been tested, they are OK. All the directories’ permissons are the defaults, pg_stat_temp is owned by pgsql:pgsql – so I think this is not a permission issue.

 

I’m trying to tune my config with pgtune, and I’ve made some changes on sysctl for the following:

 

kern.ipc.semmni="512"

kern.ipc.semmns="1024"

kern.ipc.semume="64"

kern.ipc.semmnu="512"

 

I will restart the machines today, and will see, what’s going on. Did anybody has seen this problem before? On our early version 9.0.8 PostgreSQL server, there wasn’t any kind of problem, but it was a FreeBSD 8.2. How can I investigate, what could be the problem? Kernel settings? Or any other idea?

 

Thank you in advance.

 

Best regards,

Csaba

Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
Amitabh Kant
Date:
I run the same config (FreeBSD 9 with PG 9.1.x) on couple of servers, and they seem to be working fine without any error messages. The only other setting I have in my sysctl.conf is kern.maxfiles .

Amitabh

On Wed, Aug 15, 2012 at 1:27 PM, Carl von Clausewitz <clausewitz45@gmail.com> wrote:

Hi All,

 

I’ve just made a clean install for PostgreSQL 9.1 (postgresql-server-9.1.4, postgresql-contrib-9.1.4) on a FreeBSD 9 (FreeBSD 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64).

 

I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf (or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS in every second) of error messages like that in /var/log/postgresql.log:

 

*** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING:  corrupted statistics file "pg_stat_tmp/pgstat.stat"

 

The logs are going to the syslog, and there are no more Warning, Error, or other messages. If I do a full Vacuum on the database, and I restart the system, everything works fine, no more error messages, until 2-3 days – and everything starts from the beginning.

 

Two weeks ago, we just bought a new server, I’ve installed the same version of FreeBSD and PostgreSQL, and there is the same problem on that machine too – without any usage, or load, this is a test server for us now.

 

The disks (SATA3 HDD), and the controllers has been tested, they are OK. All the directories’ permissons are the defaults, pg_stat_temp is owned by pgsql:pgsql – so I think this is not a permission issue.

 

I’m trying to tune my config with pgtune, and I’ve made some changes on sysctl for the following:

 

kern.ipc.semmni="512"

kern.ipc.semmns="1024"

kern.ipc.semume="64"

kern.ipc.semmnu="512"

 

I will restart the machines today, and will see, what’s going on. Did anybody has seen this problem before? On our early version 9.0.8 PostgreSQL server, there wasn’t any kind of problem, but it was a FreeBSD 8.2. How can I investigate, what could be the problem? Kernel settings? Or any other idea?

 

Thank you in advance.

 

Best regards,

Csaba


Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
Alban Hertroys
Date:
On 15 August 2012 09:57, Carl von Clausewitz <clausewitz45@gmail.com> wrote:
> I’ve restored from TAR backup our databases, and everything looked fine.

What exactly is in that TAR backup? Is that a tar/gzipped PG dump or a
file-system snapshot?
If the latter:
- did you halt the database while creating the snapshot or at least
force a checkpoint?
- is the version of PG that you're restoring to exactly the same (same
CPU type, same major/minor version of PG) as the one that the backup
is from?

I suspect that you restored a database with corrupt statistics and
that those only get found out once the corrupted statistics are
needed.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
Tom Lane
Date:
"Carl von Clausewitz" <clausewitz45@gmail.com> writes:
> I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in
postgresql.conf(or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS
inevery second) of error messages like that in /var/log/postgresql.log: 
> *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING:  corrupted statistics file "pg_stat_tmp/pgstat.stat"

Huh.  The stats collector process ought to rewrite that file fairly
often, so this suggests it's consistently failing to rewrite it.

You might take a look at what the file looks like after a day or so
of normal operation (eg, how big is it, how often does it get updated)
and then compare to what it looks like after the errors start.

Also, try strace'ing the stats collector process for a little while
(long enough to capture a stats file rewrite sequence) during normal
operation, and then again after the errors start.

I don't want to speculate too much in advance of the data, but I'm
wondering about a ulimit setting that limits how much data the stats
collector can write during its lifetime (ulimit -f or local equivalent).
That would eventually cause problems for any postgres process, but if
you did accidentally have one in place when starting the postmaster,
maybe the stats collector would be first to show symptoms.

            regards, tom lane


Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
"Carl von Clausewitz"
Date:
Hi,

I've made the backups with pg_dump, compressed, and there wasn't any kind of error messages. I've made the restoration
withpgAdmin (but it also uses pg_dump for restore).  

How can I check if the statistics are corrupt?

regards,
Csaba

-----Original Message-----
From: Alban Hertroys [mailto:haramrae@gmail.com]
Sent: Wednesday, August 15, 2012 3:33 PM
To: Carl von Clausewitz
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat"

On 15 August 2012 09:57, Carl von Clausewitz <clausewitz45@gmail.com> wrote:
> I’ve restored from TAR backup our databases, and everything looked fine.

What exactly is in that TAR backup? Is that a tar/gzipped PG dump or a file-system snapshot?
If the latter:
- did you halt the database while creating the snapshot or at least force a checkpoint?
- is the version of PG that you're restoring to exactly the same (same CPU type, same major/minor version of PG) as the
onethat the backup is from? 

I suspect that you restored a database with corrupt statistics and that those only get found out once the corrupted
statisticsare needed. 

--
If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.



Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From
"Carl von Clausewitz"
Date:
Hi,

I've made the kernel changes that I wrote in my original e-mail, and I've created some additional logging (both csvlog
andsyslog), to gather more informations.  

/boot/loader.conf:

kern.ipc.semmni="512"
kern.ipc.semmns="1024"
kern.ipc.semume="64"
kern.ipc.semmnu="512"

/etc/sysctl.cong:

kern.ipc.shmall=262144
kern.ipc.shmmax=1073742336
kern.ipc.semmap=256

pgTune made this config changes for me in /usr/local/pgsql/data/postgresql.conf (the server has 4GB RAM)
default_statistics_target         = 50         # pgtune wizard 2012-08-15
maintenance_work_mem         = 240MB     # pgtune wizard 2012-08-15
constraint_exclusion             = on        # pgtune wizard 2012-08-15
checkpoint_completion_target     = 0.9         # pgtune wizard 2012-08-15
effective_cache_size             = 2816MB     # pgtune wizard 2012-08-15
work_mem                 = 24MB     # pgtune wizard 2012-08-15
wal_buffers                 = 8MB         # pgtune wizard 2012-08-15
checkpoint_segments             = 16         # pgtune wizard 2012-08-15
shared_buffers             = 960MB     # pgtune wizard 2012-08-15
max_connections             = 80         # pgtune wizard 2012-08-15

After a day, the file is 412kb large. I've just installed strace, and I try to capture a 2-4 hours work, and check what
isgoing on.  

ulimit (& ulimit -f) output is unlimited.

I'll be back (:-)) within few days with the results. Thank you all the informations.

Regards,
Csaba

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, August 15, 2012 3:34 PM
To: Carl von Clausewitz
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat"

"Carl von Clausewitz" <clausewitz45@gmail.com> writes:
> I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in
postgresql.conf(or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number
(~5-10PCS in every second) of error messages like that in /var/log/postgresql.log: 
> *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING:  corrupted statistics file "pg_stat_tmp/pgstat.stat"

Huh.  The stats collector process ought to rewrite that file fairly often, so this suggests it's consistently failing
torewrite it. 

You might take a look at what the file looks like after a day or so of normal operation (eg, how big is it, how often
doesit get updated) and then compare to what it looks like after the errors start. 

Also, try strace'ing the stats collector process for a little while (long enough to capture a stats file rewrite
sequence)during normal operation, and then again after the errors start. 

I don't want to speculate too much in advance of the data, but I'm wondering about a ulimit setting that limits how
muchdata the stats collector can write during its lifetime (ulimit -f or local equivalent). 
That would eventually cause problems for any postgres process, but if you did accidentally have one in place when
startingthe postmaster, maybe the stats collector would be first to show symptoms. 

            regards, tom lane