Stats collector on rampage (8.2.3) - Mailing list pgsql-hackers

From Gaetano Mendola
Subject Stats collector on rampage (8.2.3)
Date
Msg-id 47259F48.1010107@bigfoot.com
Whole thread Raw
Responses Re: Stats collector on rampage (8.2.3)
Re: Stats collector on rampage (8.2.3)
List pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
it seems that the stats collector on my box is using more CPU than
it did in the past.

This is what I'm observing:

CPU usage for the stat process: 25% flat

$ psql -c "select version()"                                               version
- -------------------------------------------------------------------------------------------------------PostgreSQL
8.2.3on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)
 
(1 row)


$ strace -tt -p 10773
[...]
09:47:37.867655 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.867738 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 823) = 823
09:47:37.867820 close(3)                = 0
09:47:37.867862 munmap(0xb7ced000, 4096) = 0
09:47:37.867906 rename("global/pgstat.tmp", "global/pgstat.stat") = 0
09:47:37.868188 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868245 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.868317 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0
09:47:37.868372 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868428 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0g\n\0"..., 1000, 0) = 976
09:47:37.868501 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868559 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0i\n\0"..., 1000, 0) = 976
09:47:37.868629 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868687 recv(7, "\1\0\0\0\4\3\0\0\rg\0\0\v\0\0\0\0\0\0\0\0\0\0\0^\n\0\0"..., 1000, 0) = 772
09:47:37.868757 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868815 recv(7, "\1\0\0\0\240\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\354"..., 1000, 0) = 160
09:47:37.868886 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868944 recv(7, "\1\0\0\0\240\0\0\0\rg\0\0\2\0\0\0\1\0\0\0\0\0\0\0<\n\0"..., 1000, 0) = 160
09:47:37.869012 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869070 recv(7, "\1\0\0\0l\1\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\354\4\0"..., 1000, 0) = 364
09:47:37.869141 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869198 recv(7, "\1\0\0\0\300\2\0\0\rg\0\0\n\0\0\0\1\0\0\0\0\0\0\0007\n"..., 1000, 0) = 704
09:47:37.869267 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869329 recv(7, "\1\0\0\0(\1\0\0\rg\0\0\4\0\0\0\1\0\0\0\0\0\0\0/\n\0\0\0"..., 1000, 0) = 296
09:47:37.869398 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869456 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.869524 --- SIGALRM (Alarm clock) @ 0 (0) ---
09:47:37.869575 sigreturn()             = ? (mask now [])
09:47:37.869659 getppid()               = 10768
09:47:37.869702 open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
09:47:37.869775 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
09:47:37.869871 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ced000
09:47:37.869928 write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.870252 write(3, ";\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[...]

and doing the statistics on the system calls:


$ time strace -c -p 10773
Process 10773 attached - interrupt to quit
Process 10773 detached
% time     seconds  usecs/call     calls    errors syscall
- ------ ----------- ----------- --------- --------- ----------------62.70    4.980721          16    307851
write25.72   2.043299         156     13058      3039 poll 9.61    0.763046         248      3078           rename 0.64
  0.050992          17      3079           open 0.49    0.038819           4     10019           recv 0.26    0.020469
        7      3078           munmap 0.13    0.010344           3      3078           close 0.12    0.009425
3     3079           mmap2 0.11    0.008353           3      3079           setitimer 0.09    0.007114           2
3079     3039 sigreturn 0.07    0.005923           2      3079           fstat64 0.06    0.004734           2      3079
         getppid
 
- ------ ----------- ----------- --------- --------- ----------------
100.00    7.943239                358636      6078 total

real    0m16.313s
user    0m1.428s
sys     0m3.802s


so instead of 32 or such rename it did the rename 3K times.

To solve the problem is it possible to kill that process? (will it be respawned?)


Regards
Gaetano Mendola











-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHJZ9I7UpzwH2SGd4RAqwrAJ9vyt5fd1rdEu+uTnef6QpFYVBbhwCePB69
b0jA7Ko85TyEfMqAmVVRy/w=
=ax+l
-----END PGP SIGNATURE-----


pgsql-hackers by date:

Previous
From: ITAGAKI Takahiro
Date:
Subject: Re: PANIC caused by open_sync on Linux
Next
From: Magnus Hagander
Date:
Subject: Re: Stats collector on rampage (8.2.3)