Thread: Immortal backend .... (kill -9 ?..)
Immortal backend .... (kill -9 ?..)
From
"Rajesh Kumar Mallah." (by way of Rajesh Kumar Mallah.
Date:
Hi i have just discovered a query in my database system that would take extremely long durations to finish. the query plan is like: explain SELECT source_id , cnt from (SELECT source_id,count(source_id) as cnt from email_source group by source_id ) subsel join sources using(source_id) order by source_id tradein_clients-# ; NOTICE: QUERY PLAN: Merge Join (cost=145610.94..147300.41 rows=112596 width=16) -> Sort (cost=145597.57..145597.57 rows=112596 width=4) -> Subquery Scan subsel (cost=130520.56..136150.35 rows=112596 width=4) -> Aggregate (cost=130520.56..136150.35 rows=112596 width=4) -> Group (cost=130520.56..133335.46 rows=1125957 width=4) -> Sort (cost=130520.56..130520.56 rows=1125957 width=4) -> Seq Scan on email_source (cost=0.00..17346.57 rows=1125957 width=4) -> Sort (cost=13.37..13.37 rows=213 width=4) -> Seq Scan on sources (cost=0.00..5.13 rows=213 width=4) EXPLAIN unfortunately it has been running since yesterday and i am discovering it now. now even kill -9 <pid of backend> does not releases it below is the current status of backends: postgres 8594 0.0 0.0 139416 148 ? S May11 0:25 /usr/local/pgsql/bin/postmaster postgres 8595 1.3 0.1 140408 1040 ? S May11 78:47 postgres: stats buffer process postgres 8597 1.1 0.0 139704 324 ? S May11 65:18 postgres: stats collector process postgres 32444 0.0 0.0 238700 0 ? DW May14 0:20 postgres: tradein tradein_clients 130.94.20.27 SELECT postgres 32482 0.0 0.0 139808 0 ? DW May14 0:00 postgres: checkpoint subprocess postgres 16907 0.0 0.3 140296 3912 ? S 17:51 0:00 postgres: postgres tradein_clients 203.196.129.235 idle postgres 20382 1.3 4.1 140816 42588 ? S 18:27 0:01 postgres: tradein tradein_clients 130.94.20.27 idle in transaction postgres 20388 0.5 6.7 141200 68940 ? S 18:27 0:00 postgres: tradein tradein_clients 130.94.20.27 idle in transaction The pid of the backend that is executing the query is 32444 . also note that pid 32482 is also (running?) is since may 14 and 32444 also is (running?) since may 14. (is there any relation between them?) what is checkpoint subprocess? the load average is [root@linux10320 root2]# uptime 6:30pm up 4 days, 6:26, 1 user, load average: 11.08, 10.54, 10.78 [root@linux10320 root2]# can anyone tell me what might me causing so much load in the system, or someway to get see what is going on to make such a load? also if i were to bring down postgresql what should have been a safe (and effective) way. Disk I/O is like: # iostat -d 2 6 Linux 2.4.9 (linux10320) 05/15/02 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 32.25 450.01 445.92 166025844 164517760 dev8-1 34.26 434.75 640.21 160395794 236199552 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 2.50 24.00 12.00 48 24 dev8-1 1.00 0.00 12.00 0 24 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 13.00 0.00 112.00 0 224 dev8-1 13.00 0.00 112.00 0 224 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 23.50 0.00 200.00 0 400 dev8-1 23.50 0.00 200.00 0 400 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 3.00 0.00 44.00 0 88 dev8-1 3.00 0.00 44.00 0 88 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 3.00 0.00 36.00 0 72 dev8-1 3.00 0.00 36.00 0 72 [root@linux10320 root2]# Regds Mallah. -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
"Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes: > now even kill -9 <pid of backend> does not releases it below is > the current status of backends: I think there is something broken about your system. It is not normal to be in a state where you have unkillable processes. I'm wondering about disk drive problems leading to I/O requests that never complete, myself... regards, tom lane
On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote: > "Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes: > > now even kill -9 <pid of backend> does not releases it below is > > the current status of backends: > I have checked by diskdrive there are no badblocks at least in the drive. It is 34+34 GB SCSI IN raid 1 configuration. But does "iostat" hints too much disk i/o at the moment? is there anyway i can bring the load avergage to normal < 1 with rebooting and fscking... can i give you more information ? regds mallah. the system load has be > 10 for past 1 hrs at least > I think there is something broken about your system. It is not normal > to be in a state where you have unkillable processes. I'm wondering > about disk drive problems leading to I/O requests that never complete, > myself... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
Hi, Two of the backends are showing status "D" which "man ps" describes as "uninterruptible sleep (usually IO)" since kill -9 does not get me rid of these should i expect postmaster which is still the parent process for those two spoilt kids to terminate them? regds Mallah On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote: > "Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes: > > now even kill -9 <pid of backend> does not releases it below is > > the current status of backends: > > I think there is something broken about your system. It is not normal > to be in a state where you have unkillable processes. I'm wondering > about disk drive problems leading to I/O requests that never complete, > myself... > > regards, tom lane -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
Rajesh Kumar Mallah. wrote: > Hi, > > Two of the backends are showing status "D" which "man ps" > describes as "uninterruptible sleep (usually IO)" > > since kill -9 does not get me rid of these should i expect > postmaster which is still the parent process for those two > spoilt kids to terminate them? That does sound like a hardware-related problem. I saw the same thing on a flaky Jaz drive. The only thing I could do was reboot, which got rid of the offending processes. But that won't fix your hardware problem. --Jeremy > On Wednesday 15 May 2002 07:46 pm, Tom Lane wrote: > >>"Rajesh Kumar Mallah." <mallah@trade-india.com>(by way of Rajesh Kumar Mallah. <mallah@trade-india.com>) writes: >> >>>now even kill -9 <pid of backend> does not releases it below is >>>the current status of backends: >> >>I think there is something broken about your system. It is not normal >>to be in a state where you have unkillable processes. I'm wondering >>about disk drive problems leading to I/O requests that never complete, >>myself... >> >> regards, tom lane > >