Runaway backend at 100% CPU PostgreSQL v8.3.5 - Mailing list pgsql-hackers

From Stephen R. van den Berg
Subject Runaway backend at 100% CPU PostgreSQL v8.3.5
Date
Msg-id 20090106231134.GA13379@cuci.nl
Whole thread Raw
Responses Re: Runaway backend at 100% CPU PostgreSQL v8.3.5  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I'm running Debian PostgreSQL v8.3.5-1 on x86 in 32-bit mode.
Every once in a while, some backends start taking 100% CPU, as can be seen
below in the excerpt from the process table:

27256 ?        Ss     0:04 /usr/lib/postgresql/8.3/bin/postgres -D /var/lib/post
27299 ?        Ss     0:00  \_ postgres: writer process                        
27300 ?        Ss     0:00  \_ postgres: wal writer process                    
27301 ?        Ss     0:00  \_ postgres: autovacuum launcher process           
27302 ?        Ss     0:26  \_ postgres: stats collector process               
12076 ?        Rs   5681:50  \_ postgres: cms cms 10.0.0.5(59125) PARSE   
12853 ?        Rs   5549:25  \_ postgres: cms cms 10.0.0.5(48437) PARSE   
25985 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.18(45807) idle    
25986 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.18(45808) idle    

It seems that the backend is stuck in some kind of endless loop.  Since
it's a production Debian server, the backend is not compiled with debugging
turned on.  The best I can do is ltrace it, in hopes of someone recognising
the infinite sequence.
Ltracing the 12853 process reveals the following libc calls:

Label a:
[pid 12853] __sigsetjmp(0xbfec7000, 0, 0x852e740, 0xb5b3fc16, 0x85e9228) = 0
[pid 12853] strlen("ExecutorState")              = 13
[pid 12853] strcpy(0x84fb0e8, "ExecutorState")   = 0x84fb0e8
[pid 12853] malloc(8192)                         = 0x860da48
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb060, "ExprContext")     = 0x84fb060
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fbda0, "ExprContext")     = 0x84fbda0
[pid 12853] strncpy(0x860e498, "parent", 64)     = 0x860e498
[pid 12853] strncpy(0x860e500, "sid", 64)        = 0x860e500
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd810, "ExprContext")     = 0x85cd810
[pid 12853] malloc(8256)                         = 0x8600a28
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0, 0x82eada9) = 0x860e738
[pid 12853] strncpy(0x860e7f0, "parent", 64)     = 0x860e7f0
[pid 12853] strncpy(0x860e858, "sid", 64)        = 0x860e858
[pid 12853] strlen("SPI TupTable")               = 12
[pid 12853] strcpy(0x85cd1b0, "SPI TupTable")    = 0x85cd1b0
[pid 12853] malloc(8192)                         = 0x860fa50
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0x830fab4, 0x84fb088) = 0x860e738
[pid 12853] free(0x8600a28)                      = <void>
[pid 12853] free(0x860da48)                      = <void>
[pid 12853] free(0x860fa50)                      = <void>
[pid 12853] __sigsetjmp(0xbfec7000, 0, 0x852e740, 0xb5b3fc16, 0x85e9228) = 0
[pid 12853] strlen("ExecutorState")              = 13
[pid 12853] strcpy(0x85cd1b0, "ExecutorState")   = 0x85cd1b0
[pid 12853] malloc(8192)                         = 0x860da48
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb0e8, "ExprContext")     = 0x84fb0e8
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb060, "ExprContext")     = 0x84fb060
[pid 12853] strncpy(0x860e498, "parent", 64)     = 0x860e498
[pid 12853] strncpy(0x860e500, "sid", 64)        = 0x860e500
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fbda0, "ExprContext")     = 0x84fbda0
[pid 12853] malloc(8256)                         = 0x8600a28
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0, 0x82eada9) = 0x860e738
[pid 12853] strncpy(0x860e7f0, "parent", 64)     = 0x860e7f0
[pid 12853] strncpy(0x860e858, "sid", 64)        = 0x860e858
[pid 12853] strlen("SPI TupTable")               = 12
[pid 12853] strcpy(0x85cd810, "SPI TupTable")    = 0x85cd810
[pid 12853] malloc(8192)                         = 0x860fa50
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0x830fab4, 0x85cd150) = 0x860e738
[pid 12853] free(0x8600a28)                      = <void>
[pid 12853] free(0x860da48)                      = <void>
[pid 12853] free(0x860fa50)                      = <void>
[pid 12853] __sigsetjmp(0xbfec7000, 0, 0x852e740, 0xb5b3fc16, 0x85e9228) = 0
[pid 12853] strlen("ExecutorState")              = 13
[pid 12853] strcpy(0x85cd810, "ExecutorState")   = 0x85cd810
[pid 12853] malloc(8192)                         = 0x860da48
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd1b0, "ExprContext")     = 0x85cd1b0
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb0e8, "ExprContext")     = 0x84fb0e8
[pid 12853] strncpy(0x860e498, "parent", 64)     = 0x860e498
[pid 12853] strncpy(0x860e500, "sid", 64)        = 0x860e500
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb060, "ExprContext")     = 0x84fb060
[pid 12853] malloc(8256)                         = 0x8600a28
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0, 0x82eada9) = 0x860e738
[pid 12853] strncpy(0x860e7f0, "parent", 64)     = 0x860e7f0
[pid 12853] strncpy(0x860e858, "sid", 64)        = 0x860e858
[pid 12853] strlen("SPI TupTable")               = 12
[pid 12853] strcpy(0x84fbda0, "SPI TupTable")    = 0x84fbda0
[pid 12853] malloc(8192)                         = 0x860fa50
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0x830fab4, 0x85cd7b0) = 0x860e738
[pid 12853] free(0x8600a28)                      = <void>
[pid 12853] free(0x860da48)                      = <void>
[pid 12853] free(0x860fa50)                      = <void>
[pid 12853] __sigsetjmp(0xbfec7000, 0, 0x852e740, 0xb5b3fc16, 0x85e9228) = 0
[pid 12853] strlen("ExecutorState")              = 13
[pid 12853] strcpy(0x84fbda0, "ExecutorState")   = 0x84fbda0
[pid 12853] malloc(8192)                         = 0x860da48
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd810, "ExprContext")     = 0x85cd810
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd1b0, "ExprContext")     = 0x85cd1b0
[pid 12853] strncpy(0x860e498, "parent", 64)     = 0x860e498
[pid 12853] strncpy(0x860e500, "sid", 64)        = 0x860e500
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fb0e8, "ExprContext")     = 0x84fb0e8
[pid 12853] malloc(8256)                         = 0x8600a28
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0, 0x82eada9) = 0x860e738
[pid 12853] strncpy(0x860e7f0, "parent", 64)     = 0x860e7f0
[pid 12853] strncpy(0x860e858, "sid", 64)        = 0x860e858
[pid 12853] strlen("SPI TupTable")               = 12
[pid 12853] strcpy(0x84fb060, "SPI TupTable")    = 0x84fb060
[pid 12853] malloc(8192)                         = 0x860fa50
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0x830fab4, 0x84fbd40) = 0x860e738
[pid 12853] free(0x8600a28)                      = <void>
[pid 12853] free(0x860da48)                      = <void>
[pid 12853] free(0x860fa50)                      = <void>
[pid 12853] __sigsetjmp(0xbfec7000, 0, 0x852e740, 0xb5b3fc16, 0x85e9228) = 0
[pid 12853] strlen("ExecutorState")              = 13
[pid 12853] strcpy(0x84fb060, "ExecutorState")   = 0x84fb060
[pid 12853] malloc(8192)                         = 0x860da48
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x84fbda0, "ExprContext")     = 0x84fbda0
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd810, "ExprContext")     = 0x85cd810
[pid 12853] strncpy(0x860e498, "parent", 64)     = 0x860e498
[pid 12853] strncpy(0x860e500, "sid", 64)        = 0x860e500
[pid 12853] strlen("ExprContext")                = 11
[pid 12853] strcpy(0x85cd1b0, "ExprContext")     = 0x85cd1b0
[pid 12853] malloc(8256)                         = 0x8600a28
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0, 0x82eada9) = 0x860e738
[pid 12853] strncpy(0x860e7f0, "parent", 64)     = 0x860e7f0
[pid 12853] strncpy(0x860e858, "sid", 64)        = 0x860e858
[pid 12853] strlen("SPI TupTable")               = 12
[pid 12853] strcpy(0x84fb0e8, "SPI TupTable")    = 0x84fb0e8
[pid 12853] malloc(8192)                         = 0x860fa50
[pid 12853] memmove(0x860e738, 0x860e600, 40, 0x830fab4, 0x84fb000) = 0x860e738
[pid 12853] free(0x8600a28)                      = <void>
[pid 12853] free(0x860da48)                      = <void>
[pid 12853] free(0x860fa50)                      = <void>
Jump back to label a:, repeat ad infinitum, chewing up 100% CPU

Below a complete process list of all running backends:

27256 ?        Ss     0:04 /usr/lib/postgresql/8.3/bin/postgres -D /var/lib/post
27299 ?        Ss     0:00  \_ postgres: writer process                        
27300 ?        Ss     0:00  \_ postgres: wal writer process                    
27301 ?        Ss     0:00  \_ postgres: autovacuum launcher process           
27302 ?        Ss     0:26  \_ postgres: stats collector process               
12076 ?        Rs   5681:50  \_ postgres: cms cms 10.0.0.5(59125) PARSE   
12853 ?        Ts   5549:25  \_ postgres: cms cms 10.0.0.5(48437) PARSE   
25985 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.18(45807) idle    
25986 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.18(45808) idle    
21100 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.12(45544) idle    
21101 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.12(45545) idle    
26777 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.12(43123) idle    
24711 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(34684) idle     
24715 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(34687) idle     
24716 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(34688) idle     
24721 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(34693) idle     8355 ?        Ss     0:01  \_ postgres: cms
cms10.0.0.5(59188) idle     
 
13504 ?        Ss     0:05  \_ postgres: cms cms 10.0.0.5(35244) idle     
17568 ?        Ss     0:01  \_ postgres: cms cms 10.0.0.5(50414) idle     
17615 ?        Ss     0:02  \_ postgres: cms cms 10.0.0.5(42313) idle     
17617 ?        Ss     0:01  \_ postgres: cms cms 10.0.0.5(42315) idle     
20412 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43902) idle     
20425 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43905) idle     
20427 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43907) idle     
20428 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43908) idle     
20429 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43909) idle     
20430 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43910) idle     
20431 ?        Ss     0:00  \_ postgres: cms cms 10.0.0.5(43911) idle     

-- 
Sincerely,          Stephen R. van den Berg.

"Very funny, Mr. Scott. Now beam down my clothes!"


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Re: [COMMITTERS] pgsql: This makes all the \dX commands (most importantly to most: \df)
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Re: [COMMITTERS] pgsql: This makes all the \dX commands (most importantly to most: \df)