Thread: corrupted statistics file "pg_stat_tmp/pgstat.stat"
Hi All,
I’ve just made a clean install for PostgreSQL 9.1 (postgresql-server-9.1.4, postgresql-contrib-9.1.4) on a FreeBSD 9 (FreeBSD 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64).
I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf (or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS in every second) of error messages like that in /var/log/postgresql.log:
*** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING: corrupted statistics file "pg_stat_tmp/pgstat.stat"
The logs are going to the syslog, and there are no more Warning, Error, or other messages. If I do a full Vacuum on the database, and I restart the system, everything works fine, no more error messages, until 2-3 days – and everything starts from the beginning.
Two weeks ago, we just bought a new server, I’ve installed the same version of FreeBSD and PostgreSQL, and there is the same problem on that machine too – without any usage, or load, this is a test server for us now.
The disks (SATA3 HDD), and the controllers has been tested, they are OK. All the directories’ permissons are the defaults, pg_stat_temp is owned by pgsql:pgsql – so I think this is not a permission issue.
I’m trying to tune my config with pgtune, and I’ve made some changes on sysctl for the following:
kern.ipc.semmni="512"
kern.ipc.semmns="1024"
kern.ipc.semume="64"
kern.ipc.semmnu="512"
I will restart the machines today, and will see, what’s going on. Did anybody has seen this problem before? On our early version 9.0.8 PostgreSQL server, there wasn’t any kind of problem, but it was a FreeBSD 8.2. How can I investigate, what could be the problem? Kernel settings? Or any other idea?
Thank you in advance.
Best regards,
Csaba
Hi All,
I’ve just made a clean install for PostgreSQL 9.1 (postgresql-server-9.1.4, postgresql-contrib-9.1.4) on a FreeBSD 9 (FreeBSD 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue Jun 12 02:52:29 UTC 2012 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64).
I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf (or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS in every second) of error messages like that in /var/log/postgresql.log:
*** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING: corrupted statistics file "pg_stat_tmp/pgstat.stat"
The logs are going to the syslog, and there are no more Warning, Error, or other messages. If I do a full Vacuum on the database, and I restart the system, everything works fine, no more error messages, until 2-3 days – and everything starts from the beginning.
Two weeks ago, we just bought a new server, I’ve installed the same version of FreeBSD and PostgreSQL, and there is the same problem on that machine too – without any usage, or load, this is a test server for us now.
The disks (SATA3 HDD), and the controllers has been tested, they are OK. All the directories’ permissons are the defaults, pg_stat_temp is owned by pgsql:pgsql – so I think this is not a permission issue.
I’m trying to tune my config with pgtune, and I’ve made some changes on sysctl for the following:
kern.ipc.semmni="512"
kern.ipc.semmns="1024"
kern.ipc.semume="64"
kern.ipc.semmnu="512"
I will restart the machines today, and will see, what’s going on. Did anybody has seen this problem before? On our early version 9.0.8 PostgreSQL server, there wasn’t any kind of problem, but it was a FreeBSD 8.2. How can I investigate, what could be the problem? Kernel settings? Or any other idea?
Thank you in advance.
Best regards,
Csaba
On 15 August 2012 09:57, Carl von Clausewitz <clausewitz45@gmail.com> wrote: > I’ve restored from TAR backup our databases, and everything looked fine. What exactly is in that TAR backup? Is that a tar/gzipped PG dump or a file-system snapshot? If the latter: - did you halt the database while creating the snapshot or at least force a checkpoint? - is the version of PG that you're restoring to exactly the same (same CPU type, same major/minor version of PG) as the one that the backup is from? I suspect that you restored a database with corrupt statistics and that those only get found out once the corrupted statistics are needed. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
"Carl von Clausewitz" <clausewitz45@gmail.com> writes: > I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf(or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS inevery second) of error messages like that in /var/log/postgresql.log: > *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING: corrupted statistics file "pg_stat_tmp/pgstat.stat" Huh. The stats collector process ought to rewrite that file fairly often, so this suggests it's consistently failing to rewrite it. You might take a look at what the file looks like after a day or so of normal operation (eg, how big is it, how often does it get updated) and then compare to what it looks like after the errors start. Also, try strace'ing the stats collector process for a little while (long enough to capture a stats file rewrite sequence) during normal operation, and then again after the errors start. I don't want to speculate too much in advance of the data, but I'm wondering about a ulimit setting that limits how much data the stats collector can write during its lifetime (ulimit -f or local equivalent). That would eventually cause problems for any postgres process, but if you did accidentally have one in place when starting the postmaster, maybe the stats collector would be first to show symptoms. regards, tom lane
Hi, I've made the backups with pg_dump, compressed, and there wasn't any kind of error messages. I've made the restoration withpgAdmin (but it also uses pg_dump for restore). How can I check if the statistics are corrupt? regards, Csaba -----Original Message----- From: Alban Hertroys [mailto:haramrae@gmail.com] Sent: Wednesday, August 15, 2012 3:33 PM To: Carl von Clausewitz Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat" On 15 August 2012 09:57, Carl von Clausewitz <clausewitz45@gmail.com> wrote: > I’ve restored from TAR backup our databases, and everything looked fine. What exactly is in that TAR backup? Is that a tar/gzipped PG dump or a file-system snapshot? If the latter: - did you halt the database while creating the snapshot or at least force a checkpoint? - is the version of PG that you're restoring to exactly the same (same CPU type, same major/minor version of PG) as the onethat the backup is from? I suspect that you restored a database with corrupt statistics and that those only get found out once the corrupted statisticsare needed. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Hi, I've made the kernel changes that I wrote in my original e-mail, and I've created some additional logging (both csvlog andsyslog), to gather more informations. /boot/loader.conf: kern.ipc.semmni="512" kern.ipc.semmns="1024" kern.ipc.semume="64" kern.ipc.semmnu="512" /etc/sysctl.cong: kern.ipc.shmall=262144 kern.ipc.shmmax=1073742336 kern.ipc.semmap=256 pgTune made this config changes for me in /usr/local/pgsql/data/postgresql.conf (the server has 4GB RAM) default_statistics_target = 50 # pgtune wizard 2012-08-15 maintenance_work_mem = 240MB # pgtune wizard 2012-08-15 constraint_exclusion = on # pgtune wizard 2012-08-15 checkpoint_completion_target = 0.9 # pgtune wizard 2012-08-15 effective_cache_size = 2816MB # pgtune wizard 2012-08-15 work_mem = 24MB # pgtune wizard 2012-08-15 wal_buffers = 8MB # pgtune wizard 2012-08-15 checkpoint_segments = 16 # pgtune wizard 2012-08-15 shared_buffers = 960MB # pgtune wizard 2012-08-15 max_connections = 80 # pgtune wizard 2012-08-15 After a day, the file is 412kb large. I've just installed strace, and I try to capture a 2-4 hours work, and check what isgoing on. ulimit (& ulimit -f) output is unlimited. I'll be back (:-)) within few days with the results. Thank you all the informations. Regards, Csaba -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Wednesday, August 15, 2012 3:34 PM To: Carl von Clausewitz Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat" "Carl von Clausewitz" <clausewitz45@gmail.com> writes: > I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf(or in kernel settings) – only “track_counts=onâ€, after 2-3 days, I’m receiving huge number (~5-10PCS in every second) of error messages like that in /var/log/postgresql.log: > *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING: corrupted statistics file "pg_stat_tmp/pgstat.stat" Huh. The stats collector process ought to rewrite that file fairly often, so this suggests it's consistently failing torewrite it. You might take a look at what the file looks like after a day or so of normal operation (eg, how big is it, how often doesit get updated) and then compare to what it looks like after the errors start. Also, try strace'ing the stats collector process for a little while (long enough to capture a stats file rewrite sequence)during normal operation, and then again after the errors start. I don't want to speculate too much in advance of the data, but I'm wondering about a ulimit setting that limits how muchdata the stats collector can write during its lifetime (ulimit -f or local equivalent). That would eventually cause problems for any postgres process, but if you did accidentally have one in place when startingthe postmaster, maybe the stats collector would be first to show symptoms. regards, tom lane