Thread: Stats collector on rampage (8.2.3)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, it seems that the stats collector on my box is using more CPU than it did in the past. This is what I'm observing: CPU usage for the stat process: 25% flat $ psql -c "select version()" version - -------------------------------------------------------------------------------------------------------PostgreSQL 8.2.3on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) (1 row) $ strace -tt -p 10773 [...] 09:47:37.867655 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 09:47:37.867738 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 823) = 823 09:47:37.867820 close(3) = 0 09:47:37.867862 munmap(0xb7ced000, 4096) = 0 09:47:37.867906 rename("global/pgstat.tmp", "global/pgstat.stat") = 0 09:47:37.868188 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868245 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976 09:47:37.868317 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0 09:47:37.868372 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868428 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0g\n\0"..., 1000, 0) = 976 09:47:37.868501 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868559 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0i\n\0"..., 1000, 0) = 976 09:47:37.868629 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868687 recv(7, "\1\0\0\0\4\3\0\0\rg\0\0\v\0\0\0\0\0\0\0\0\0\0\0^\n\0\0"..., 1000, 0) = 772 09:47:37.868757 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868815 recv(7, "\1\0\0\0\240\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\354"..., 1000, 0) = 160 09:47:37.868886 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.868944 recv(7, "\1\0\0\0\240\0\0\0\rg\0\0\2\0\0\0\1\0\0\0\0\0\0\0<\n\0"..., 1000, 0) = 160 09:47:37.869012 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.869070 recv(7, "\1\0\0\0l\1\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\354\4\0"..., 1000, 0) = 364 09:47:37.869141 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.869198 recv(7, "\1\0\0\0\300\2\0\0\rg\0\0\n\0\0\0\1\0\0\0\0\0\0\0007\n"..., 1000, 0) = 704 09:47:37.869267 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.869329 recv(7, "\1\0\0\0(\1\0\0\rg\0\0\4\0\0\0\1\0\0\0\0\0\0\0/\n\0\0\0"..., 1000, 0) = 296 09:47:37.869398 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1 09:47:37.869456 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976 09:47:37.869524 --- SIGALRM (Alarm clock) @ 0 (0) --- 09:47:37.869575 sigreturn() = ? (mask now []) 09:47:37.869659 getppid() = 10768 09:47:37.869702 open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 09:47:37.869775 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 09:47:37.869871 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ced000 09:47:37.869928 write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 09:47:37.870252 write(3, ";\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 [...] and doing the statistics on the system calls: $ time strace -c -p 10773 Process 10773 attached - interrupt to quit Process 10773 detached % time seconds usecs/call calls errors syscall - ------ ----------- ----------- --------- --------- ----------------62.70 4.980721 16 307851 write25.72 2.043299 156 13058 3039 poll 9.61 0.763046 248 3078 rename 0.64 0.050992 17 3079 open 0.49 0.038819 4 10019 recv 0.26 0.020469 7 3078 munmap 0.13 0.010344 3 3078 close 0.12 0.009425 3 3079 mmap2 0.11 0.008353 3 3079 setitimer 0.09 0.007114 2 3079 3039 sigreturn 0.07 0.005923 2 3079 fstat64 0.06 0.004734 2 3079 getppid - ------ ----------- ----------- --------- --------- ---------------- 100.00 7.943239 358636 6078 total real 0m16.313s user 0m1.428s sys 0m3.802s so instead of 32 or such rename it did the rename 3K times. To solve the problem is it possible to kill that process? (will it be respawned?) Regards Gaetano Mendola -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHJZ9I7UpzwH2SGd4RAqwrAJ9vyt5fd1rdEu+uTnef6QpFYVBbhwCePB69 b0jA7Ko85TyEfMqAmVVRy/w= =ax+l -----END PGP SIGNATURE-----
Gaetano Mendola wrote: > Hi all, > it seems that the stats collector on my box is using more CPU than > it did in the past. This is a known bug that was fixed in 8.2.4, so you need to upgrade. //Magnus
On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: > it seems that the stats collector on my box is using more CPU than > it did in the past. it's well known bug, and it was fixed in 8.2.4: http://www.postgresql.org/docs/current/interactive/release-8-2-4.html ... Prevent the statistics collector from writing to disk too frequently (Tom) ... best regards, depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
hubert depesz lubaczewski wrote: > On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >> it seems that the stats collector on my box is using more CPU than >> it did in the past. > > it's well known bug, and it was fixed in 8.2.4: > http://www.postgresql.org/docs/current/interactive/release-8-2-4.html > ... > Prevent the statistics collector from writing to disk too frequently > (Tom) > ... I saw that, upgrading the DB at this very moment is not doable, killing that process will the postmaster respawn another one? BTW I discover that it was triggered by the time change due the daylight saving. Regards Gaetano Mendola
hubert depesz lubaczewski wrote: > > On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >> >> it seems that the stats collector on my box is using more CPU than >> >> it did in the past. > > > > it's well known bug, and it was fixed in 8.2.4: > > http://www.postgresql.org/docs/current/interactive/release-8-2-4.html > > ... > > Prevent the statistics collector from writing to disk too frequently > > (Tom) > > ... I saw that, upgrading the DB at this very moment is not doable, killing that process will the postmaster respawn another one? BTW I discover that it was triggered by the time change due the daylight saving. Regards Gaetano Mendola
Gaetano Mendola wrote: > hubert depesz lubaczewski wrote: >> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >>> it seems that the stats collector on my box is using more CPU than >>> it did in the past. >> it's well known bug, and it was fixed in 8.2.4: >> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html >> ... >> Prevent the statistics collector from writing to disk too frequently >> (Tom) >> ... > > I saw that, upgrading the DB at this very moment is not doable, killing > that process will the postmaster respawn another one? BTW I discover that > it was triggered by the time change due the daylight saving. IIRC, it will. You need to change postgresql.conf and disable the stats collector. If you do that, it won't be started. Shouldn't be trigged by DST. //Magnus
Gaetano Mendola wrote: > hubert depesz lubaczewski wrote: > >>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >>> >>>>> it seems that the stats collector on my box is using more CPU than >>>>> it did in the past. >>>>> >>> it's well known bug, and it was fixed in 8.2.4: >>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html >>> ... >>> Prevent the statistics collector from writing to disk too frequently >>> (Tom) >>> ... >>> > > I saw that, upgrading the DB at this very moment is not doable, killing > that process will the postmaster respawn another one? BTW I discover that > it was triggered by the time change due the daylight saving. > > > you do realize that this upgrade wouldn't require a dump/restore, don't you? It would be close to instantaneous. cheers andrew
Magnus Hagander wrote: > Gaetano Mendola wrote: >> hubert depesz lubaczewski wrote: >>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >>>> it seems that the stats collector on my box is using more CPU than >>>> it did in the past. >>> it's well known bug, and it was fixed in 8.2.4: >>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html >>> ... >>> Prevent the statistics collector from writing to disk too frequently >>> (Tom) >>> ... >> I saw that, upgrading the DB at this very moment is not doable, killing >> that process will the postmaster respawn another one? BTW I discover that >> it was triggered by the time change due the daylight saving. > > IIRC, it will. You need to change postgresql.conf and disable the stats > collector. If you do that, it won't be started. > > Shouldn't be trigged by DST. The high cpu usage started at that time, may be that is another problem of some applications stuck on it. I will investigate further. Gaetano
Andrew Dunstan wrote: > > > Gaetano Mendola wrote: >> hubert depesz lubaczewski wrote: >> >>>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote: >>>> >>>>>> it seems that the stats collector on my box is using more CPU than >>>>>> it did in the past. >>>>>> >>>> it's well known bug, and it was fixed in 8.2.4: >>>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html >>>> ... >>>> Prevent the statistics collector from writing to disk too frequently >>>> (Tom) >>>> ... >>>> >> >> I saw that, upgrading the DB at this very moment is not doable, killing >> that process will the postmaster respawn another one? BTW I discover that >> it was triggered by the time change due the daylight saving. >> >> >> > > you do realize that this upgrade wouldn't require a dump/restore, don't > you? It would be close to instantaneous. Sure I do, for me it's even easier than that, I use DRBD so I can just shut down one node upgrade it and then upgrade the other one; however upgrade the DB means for me "touch" an entire satellite trasmission platform, I have to schedule a platform maintenance for that... Regards Gaetano Mendola