Re: cvs tip - stats buffer process consuming 100% cpu - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: cvs tip - stats buffer process consuming 100% cpu
Date
Msg-id 200601031953.k03JrNP02379@candle.pha.pa.us
Whole thread Raw
In response to cvs tip - stats buffer process consuming 100% cpu  (Joe Conway <mail@joeconway.com>)
Responses Re: cvs tip - stats buffer process consuming 100% cpu
List pgsql-hackers
Interesting.  Here is the patch I just applied:

    http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/postmaster/pgstat.c.diff?r1=1.116&r2=1.117

The only guess I have is that select() is modifying the timeout
structure on return, but I didn't think it did that, does it?

Googling shows Linux does modify the structure (see bottom):


http://groups.google.com/group/comp.unix.programmer/browse_frm/thread/a53c7c4a71cb48e5/5f0bbcc9fe0230a2?lnk=st&q=select+timeout+modify&rnum=9#5f0bbcc9fe0230a2

so I will fix the code accordingly.  Patch attached and applied.

---------------------------------------------------------------------------

Joe Conway wrote:
> I just noticed that the stats buffer process is consuming 100% cpu as
> soon as a backend is started, and continues after that backend is ended:
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 15150 postgres  25   0 27004  948  508 S 99.9  0.0   0:30.97 postmaster
>
>
> # ps -ef |grep 15150
> postgres 15150 15143 78 11:29 pts/3    00:00:38 postgres: stats buffer
> process
> postgres 15151 15150  0 11:29 pts/3    00:00:00 postgres: stats
> collector process
>
>
> (gdb) bt
> #0  0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000055e896 in PgstatBufferMain (argc=Variable "argc" is not
> available.
> ) at pgstat.c:1921
> #2  0x000000000055f73b in pgstat_start () at pgstat.c:614
> #3  0x0000000000562fda in reaper (postgres_signal_arg=Variable
> "postgres_signal_arg" is not available.
> ) at postmaster.c:2175
> #4  <signal handler called>
> #5  0x000000383b8c2633 in __select_nocancel () from /lib64/libc.so.6
> #6  0x0000000000560d0f in ServerLoop () at postmaster.c:1180
> #7  0x0000000000562443 in PostmasterMain (argc=7, argv=0x88df20) at
> postmaster.c:943
> #8  0x00000000005217fe in main (argc=7, argv=0x88df20) at main.c:263
>
> I noticed a recent discussion on the stats collector -- is this related
> to a recent change?
>
> Joe
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/postmaster/pgstat.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.117
diff -c -c -r1.117 pgstat.c
*** src/backend/postmaster/pgstat.c    3 Jan 2006 16:42:17 -0000    1.117
--- src/backend/postmaster/pgstat.c    3 Jan 2006 19:52:14 -0000
***************
*** 1871,1884 ****
      msgbuffer = (char *) palloc(PGSTAT_RECVBUFFERSZ);

      /*
-      * Wait for some work to do; but not for more than 10 seconds. (This
-      * determines how quickly we will shut down after an ungraceful
-      * postmaster termination; so it needn't be very fast.)
-      */
-     timeout.tv_sec = 10;
-     timeout.tv_usec = 0;
-
-     /*
       * Loop forever
       */
      for (;;)
--- 1871,1876 ----
***************
*** 1918,1923 ****
--- 1910,1924 ----
                  maxfd = writePipe;
          }

+         /*
+          * Wait for some work to do; but not for more than 10 seconds. (This
+          * determines how quickly we will shut down after an ungraceful
+          * postmaster termination; so it needn't be very fast.)  struct timeout
+          * is modified by some operating systems.
+          */
+         timeout.tv_sec = 10;
+         timeout.tv_usec = 0;
+
          if (select(maxfd + 1, &rfds, &wfds, NULL, &timeout) < 0)
          {
              if (errno == EINTR)

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: cvs tip - stats buffer process consuming 100% cpu
Next
From: Jon Jensen
Date:
Subject: Re: Why don't we allow DNS names in pg_hba.conf?