Thread: pg_stat_get_backend_pid seems to be listing non existant pids !!
Hi Folks, please help , therse seems to be too much lag between the access collector and system status. even the pids of backend does not seems to be matching. tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid, pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT pg_stat_get_backend_idset() AS backendid) s; procpid | current_query ---------+------------------------------- 27134 | <IDLE> in transaction 26958 | <IDLE> in transaction 26953 | <IDLE> intransaction 26960 | <IDLE> in transaction 27008 | <IDLE> in transaction 12839 | <IDLE> 26977 | <IDLE> in transaction 27012 | <IDLE> in transaction 31354 | <IDLE> 27014 | <IDLE> in transaction 27015 | <IDLE> in transaction 26978 | <IDLE> in transaction 26985 | <IDLE> in transaction 27135 | select count(*) from ( select distinct on (email_id) email_id,email,contact from email_bank a join (select email_id from email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1 12262 | SELECT source_id , cnt from (SELECT source_id,count(source_id)as cnt from email_source group by source_id ) subsel join sources using(source_id) order by source_id 27136 | <IDLE> in transaction (16 rows) tradein_clients=# why does the above not match with the "top" output at the same time: ============================================================================== 4:10pm up 2 days, 23:06, 2 users, load average: 6.21, 6.06, 5.60 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped CPU states: 55.6% user, 2.0% system, 0.0% nice, 42.3% idle Mem: 1028484K av, 980320K used, 48164K free, 0K shrd, 3744K buff Swap: 971004K av, 102532K used, 868472K free 912724K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND5456 postgres 17 0 59456 57M 57156 R 099.1 5.7 3:12 postmaster6601 postgres 9 0 79964 77M 78328 S 0 3.3 7.7 0:01 postmaster6779 postgres 9 0 88412 86M 86752 S 0 1.9 8.5 0:00 postmaster6703 postgres 9 0 81668 79M 80276 S 0 1.7 7.9 0:01postmaster6943 postgres 9 0 78732 76M 77520 S 0 1.7 7.6 0:01 postmaster6940 postgres 9 0 44180 42M42668 S 0 0.5 4.2 0:00 postmaster6776 postgres 9 0 121M 121M 119M S 0 0.3 12.0 0:01 postmaster5597postgres 8 0 624 248 216 S 0 0.0 0.0 0:24 postmaster5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 2:31 postmaster5599 postgres 9 0 2052 4 4 S 0 0.0 0.0 28:05 postmaster 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 0:19 postmaster 13039 postgres 9 0 656 4 4 D 0 0.0 0.0 0:00 postmaster 29440 postgres 9 0 20928 19M 20332 S 0 0.0 1.9 0:01 postmaster1652 postgres 9 0 3356 2324 2144 S 0 0.0 0.2 3:21 postmaster2219 postgres 9 0 2744 2120 2068 S 0 0.0 0.2 0:00 postmaster6772 postgres 9 0 100M 100M 99.3M S 0 0.0 9.9 0:00 postmaster6805 postgres 9 0 4440 4168 3532 S 0 0.0 0.4 0:00 postmaster6809 postgres 9 0 35280 34M 33948 S 0 0.0 3.4 0:00 postmaster6846 postgres 9 0 98.9M 98M 99804 S 0 0.0 9.8 0:01 postmaster6931 postgres 9 0 21744 20M 20428 S 0 0.0 2.0 0:02postmaster6934 postgres 9 0 19020 18M 17868 S 0 0.0 1.8 0:00 postmaster6941 postgres 9 0 63280 61M61756 S 0 0.0 6.1 0:01 postmaster ================================================================================ [root@linux10320 root2]# kill -INT 27135 bash: kill: (27135) - No such pid [root@linux10320 root2]# and # kill -INT 12262 does not actually kills it ?? regds mallah. -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
Rajesh Kumar Mallah. wrote: > Hi Folks, > please help , > > therse seems to be too much lag between the access collector > and system status. even the pids of backend does not seems to be matching. The delay is on average 250 milliseconds for a busy database (1/4 second). The controlling definition is in src/include/pgstat.h: #define PGSTAT_STAT_INTERVAL 500 This means, from the moment ANY statistic packet has arrived in the collector, it waits 500 milliseconds beforewriting out all information. Thus, the above 250 milliseconds average is only true assuming a constantflow of packets. And, before you discover this one: The backends send their statistic collection information via UDP packets. In thecase of heavy database load, some of these packets can get lost so that the statistics will not be 100% accurate. This is a wanted feature and implemented on purpose! It is because counting the number of scans isn't considered as much important as responding to the client as fast as possible during the rushhour. Jan > > tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid, > pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT > pg_stat_get_backend_idset() AS backendid) s; > > procpid | current_query > ---------+------------------------------- > 27134 | <IDLE> in transaction > 26958 | <IDLE> in transaction > 26953 | <IDLE> in transaction > 26960 | <IDLE> in transaction > 27008 | <IDLE> in transaction > 12839 | <IDLE> > 26977 | <IDLE> in transaction > 27012 | <IDLE> in transaction > 31354 | <IDLE> > 27014 | <IDLE> in transaction > 27015 | <IDLE> in transaction > 26978 | <IDLE> in transaction > 26985 | <IDLE> in transaction > 27135 | select count(*) from ( select distinct on (email_id) > email_id,email,contact from email_bank a join (select email_id from > email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, > 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1 > 12262 | SELECT source_id , cnt from (SELECT source_id,count(source_id) as > cnt from email_source group by source_id ) subsel join sources > using(source_id) order by source_id > 27136 | <IDLE> in transaction > (16 rows) > > > > tradein_clients=# > why does the above not match with the "top" output at > the same time: > > ============================================================================== > 4:10pm up 2 days, 23:06, 2 users, load average: 6.21, 6.06, 5.60 > 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped > CPU states: 55.6% user, 2.0% system, 0.0% nice, 42.3% idle > Mem: 1028484K av, 980320K used, 48164K free, 0K shrd, 3744K buff > Swap: 971004K av, 102532K used, 868472K free 912724K > cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > 5456 postgres 17 0 59456 57M 57156 R 0 99.1 5.7 3:12 postmaster > 6601 postgres 9 0 79964 77M 78328 S 0 3.3 7.7 0:01 postmaster > 6779 postgres 9 0 88412 86M 86752 S 0 1.9 8.5 0:00 postmaster > 6703 postgres 9 0 81668 79M 80276 S 0 1.7 7.9 0:01 postmaster > 6943 postgres 9 0 78732 76M 77520 S 0 1.7 7.6 0:01 postmaster > 6940 postgres 9 0 44180 42M 42668 S 0 0.5 4.2 0:00 postmaster > 6776 postgres 9 0 121M 121M 119M S 0 0.3 12.0 0:01 postmaster > 5597 postgres 8 0 624 248 216 S 0 0.0 0.0 0:24 postmaster > 5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 2:31 postmaster > 5599 postgres 9 0 2052 4 4 S 0 0.0 0.0 28:05 postmaster > 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 0:19 postmaster > 13039 postgres 9 0 656 4 4 D 0 0.0 0.0 0:00 postmaster > 29440 postgres 9 0 20928 19M 20332 S 0 0.0 1.9 0:01 postmaster > 1652 postgres 9 0 3356 2324 2144 S 0 0.0 0.2 3:21 postmaster > 2219 postgres 9 0 2744 2120 2068 S 0 0.0 0.2 0:00 postmaster > 6772 postgres 9 0 100M 100M 99.3M S 0 0.0 9.9 0:00 postmaster > 6805 postgres 9 0 4440 4168 3532 S 0 0.0 0.4 0:00 postmaster > 6809 postgres 9 0 35280 34M 33948 S 0 0.0 3.4 0:00 postmaster > 6846 postgres 9 0 98.9M 98M 99804 S 0 0.0 9.8 0:01 postmaster > 6931 postgres 9 0 21744 20M 20428 S 0 0.0 2.0 0:02 postmaster > 6934 postgres 9 0 19020 18M 17868 S 0 0.0 1.8 0:00 postmaster > 6941 postgres 9 0 63280 61M 61756 S 0 0.0 6.1 0:01 postmaster > ================================================================================ > > > [root@linux10320 root2]# kill -INT 27135 > bash: kill: (27135) - No such pid > [root@linux10320 root2]# > > > and # kill -INT 12262 does not actually kills it ?? > > regds > mallah. > > > -- > Rajesh Kumar Mallah, > Project Manager (Development) > Infocom Network Limited, New Delhi > phone: +91(11)6152172 (221) (L) ,9811255597 (M) > > Visit http://www.trade-india.com , > India's Leading B2B eMarketplace. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Hi my postmaster died just now, only a bunch of backends running [rmallah@server rmallah]$ psql -h 130.94.22.209 -U tradein tradein_clients psql: could not connect to server: Connection refused Is the server running on host 130.94.22.209 and accepting TCP/IP connections on port 5432? [rmallah@server rmallah]$ output of ps ============ [root@linux10320 root2]# ps auxwww| grep post postgres 5598 0.0 0.0 140412 4 ? D May07 2:31 postgres: stats buffer process postgres 5599 1.1 0.0 142396 20 ? R May07 48:48 postgres: stats collector process postgres 12262 0.0 0.0 238712 4 ? D May09 0:19 postgres: tradein tradein_clients 130.94.20.27 SELECT postgres 13039 0.0 0.0 139812 4 ? D May09 0:00 postgres: checkpoint subprocess postgres 29440 0.0 0.8 140664 9256 ? S 14:35 0:01 postgres: tradein tradein_clients 203.196.129.235 idle postgres 6805 0.0 0.0 140196 4 ? S 16:08 0:00 postgres: tradein tradein_clients 203.196.129.235 idle postgres 10154 0.0 0.0 140196 4 ? S 16:38 0:00 postgres: tradein tradein_clients 203.196.129.235 idle postgres 10446 0.0 0.0 140164 4 ? S 16:43 0:00 postgres: postgres tradein_clients 203.196.129.235 idle [root@linux10320 root2]# ============= output of top ============= 5:29pm up 3 days, 24 min, 4 users, load average: 5.68, 5.61, 5.70 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 5.8% user, 44.3% system, 0.0% nice, 49.8% idle Mem: 1028484K av, 900084K used, 128400K free, 0K shrd, 2968K buff Swap: 971004K av, 99288K used, 871716K free 857220K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND5599 postgres 17 0 2064 20 20 R 099.9 0.0 53:25 postmaster5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 2:31 postmaster 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 0:19 postmaster 13039 postgres 9 0 656 4 4 D 0 0.0 0.0 0:00 postmaster 29440 postgres 9 0 10512 9256 9256 S 0 0.0 0.8 0:01 postmaster6805 postgres 9 0 972 4 4 S 0 0.0 0.0 0:00 postmaster 10154 postgres 9 0 968 4 4 S 0 0.0 0.0 0:00 postmaster 10446 postgres 9 0 964 4 4 S 0 0.0 0.0 0:00 postmaster ========================================================================== On Friday 10 May 2002 04:18 pm, Jan Wieck wrote: > Rajesh Kumar Mallah. wrote: > > Hi Folks, > > please help , > > > > therse seems to be too much lag between the access collector > > and system status. even the pids of backend does not seems to be > > matching. > > The delay is on average 250 milliseconds for a busy database > (1/4 second). The controlling definition is in > > src/include/pgstat.h: > #define PGSTAT_STAT_INTERVAL 500 > > This means, from the moment ANY statistic packet has arrived > in the collector, it waits 500 milliseconds before writing > out all information. Thus, the above 250 milliseconds > average is only true assuming a constant flow of packets. > > And, before you discover this one: The backends send their > statistic collection information via UDP packets. In the case > of heavy database load, some of these packets can get lost so > that the statistics will not be 100% accurate. This is a > wanted feature and implemented on purpose! It is because > counting the number of scans isn't considered as much > important as responding to the client as fast as possible > during the rushhour. > > > Jan > > > tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid, > > pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT > > pg_stat_get_backend_idset() AS backendid) s; > > > > procpid | current_query > > ---------+------------------------------- > > 27134 | <IDLE> in transaction > > 26958 | <IDLE> in transaction > > 26953 | <IDLE> in transaction > > 26960 | <IDLE> in transaction > > 27008 | <IDLE> in transaction > > 12839 | <IDLE> > > 26977 | <IDLE> in transaction > > 27012 | <IDLE> in transaction > > 31354 | <IDLE> > > 27014 | <IDLE> in transaction > > 27015 | <IDLE> in transaction > > 26978 | <IDLE> in transaction > > 26985 | <IDLE> in transaction > > 27135 | select count(*) from ( select distinct on (email_id) > > email_id,email,contact from email_bank a join (select email_id from > > email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, > > 10, 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1 > > 12262 | SELECT source_id , cnt from (SELECT > > source_id,count(source_id) as cnt from email_source group by source_id ) > > subsel join sources > > using(source_id) order by source_id > > 27136 | <IDLE> in transaction > > (16 rows) > > > > > > > > tradein_clients=# > > why does the above not match with the "top" output at > > the same time: > > > > ========================================================================= > >===== 4:10pm up 2 days, 23:06, 2 users, load average: 6.21, 6.06, 5.60 > > 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped > > CPU states: 55.6% user, 2.0% system, 0.0% nice, 42.3% idle > > Mem: 1028484K av, 980320K used, 48164K free, 0K shrd, 3744K > > buff Swap: 971004K av, 102532K used, 868472K free > > 912724K cached > > > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME > > COMMAND 5456 postgres 17 0 59456 57M 57156 R 0 99.1 5.7 3:12 > > postmaster 6601 postgres 9 0 79964 77M 78328 S 0 3.3 7.7 > > 0:01 postmaster 6779 postgres 9 0 88412 86M 86752 S 0 1.9 > > 8.5 0:00 postmaster 6703 postgres 9 0 81668 79M 80276 S 0 > > 1.7 7.9 0:01 postmaster 6943 postgres 9 0 78732 76M 77520 S > > 0 1.7 7.6 0:01 postmaster 6940 postgres 9 0 44180 42M 42668 S > > 0 0.5 4.2 0:00 postmaster 6776 postgres 9 0 121M 121M 119M S > > 0 0.3 12.0 0:01 postmaster 5597 postgres 8 0 624 248 > > 216 S 0 0.0 0.0 0:24 postmaster 5598 postgres 9 0 1440 > > 4 4 D 0 0.0 0.0 2:31 postmaster 5599 postgres 9 0 2052 > > 4 4 S 0 0.0 0.0 28:05 postmaster 12262 postgres 9 0 > > 88564 4 4 D 0 0.0 0.0 0:19 postmaster 13039 postgres 9 > > 0 656 4 4 D 0 0.0 0.0 0:00 postmaster 29440 postgres > > 9 0 20928 19M 20332 S 0 0.0 1.9 0:01 postmaster 1652 > > postgres 9 0 3356 2324 2144 S 0 0.0 0.2 3:21 postmaster > > 2219 postgres 9 0 2744 2120 2068 S 0 0.0 0.2 0:00 > > postmaster 6772 postgres 9 0 100M 100M 99.3M S 0 0.0 9.9 > > 0:00 postmaster 6805 postgres 9 0 4440 4168 3532 S 0 0.0 > > 0.4 0:00 postmaster 6809 postgres 9 0 35280 34M 33948 S 0 > > 0.0 3.4 0:00 postmaster 6846 postgres 9 0 98.9M 98M 99804 S > > 0 0.0 9.8 0:01 postmaster 6931 postgres 9 0 21744 20M 20428 S > > 0 0.0 2.0 0:02 postmaster 6934 postgres 9 0 19020 18M 17868 S > > 0 0.0 1.8 0:00 postmaster 6941 postgres 9 0 63280 61M > > 61756 S 0 0.0 6.1 0:01 postmaster > > ========================================================================= > >======= > > > > > > [root@linux10320 root2]# kill -INT 27135 > > bash: kill: (27135) - No such pid > > [root@linux10320 root2]# > > > > > > and # kill -INT 12262 does not actually kills it ?? > > > > regds > > mallah. > > > > > > -- > > Rajesh Kumar Mallah, > > Project Manager (Development) > > Infocom Network Limited, New Delhi > > phone: +91(11)6152172 (221) (L) ,9811255597 (M) > > > > Visit http://www.trade-india.com , > > India's Leading B2B eMarketplace. > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
Also wanted to highlight that the "stats collector process" (PID: 5599)is the one which is taking up 99% CPU in my top output , postgres 5599 1.3 0.0 142396 20 ? S May07 57:13 postgres: stats collector process On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote: > Hi > my postmaster died just now, > > only a bunch of backends running > > [rmallah@server rmallah]$ psql -h 130.94.22.209 -U tradein > tradein_clients psql: could not connect to server: Connection refused > Is the server running on host 130.94.22.209 and accepting > TCP/IP connections on port 5432? > [rmallah@server rmallah]$ > > output of ps > ============ > > [root@linux10320 root2]# ps auxwww| grep post > postgres 5598 0.0 0.0 140412 4 ? D May07 2:31 postgres: > stats buffer process > postgres 5599 1.1 0.0 142396 20 ? R May07 48:48 postgres: > stats collector process > postgres 12262 0.0 0.0 238712 4 ? D May09 0:19 postgres: > tradein tradein_clients 130.94.20.27 SELECT > postgres 13039 0.0 0.0 139812 4 ? D May09 0:00 postgres: > checkpoint subprocess > postgres 29440 0.0 0.8 140664 9256 ? S 14:35 0:01 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 6805 0.0 0.0 140196 4 ? S 16:08 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10154 0.0 0.0 140196 4 ? S 16:38 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10446 0.0 0.0 140164 4 ? S 16:43 0:00 postgres: > postgres tradein_clients 203.196.129.235 idle > [root@linux10320 root2]# > > ============= > output of top > ============= > 5:29pm up 3 days, 24 min, 4 users, load average: 5.68, 5.61, 5.70 > 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped > CPU states: 5.8% user, 44.3% system, 0.0% nice, 49.8% idle > Mem: 1028484K av, 900084K used, 128400K free, 0K shrd, 2968K > buff Swap: 971004K av, 99288K used, 871716K free > 857220K cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > 5599 postgres 17 0 2064 20 20 R 0 99.9 0.0 53:25 > postmaster 5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 > 2:31 postmaster 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 > 0:19 postmaster 13039 postgres 9 0 656 4 4 D 0 0.0 > 0.0 0:00 postmaster 29440 postgres 9 0 10512 9256 9256 S 0 > 0.0 0.8 0:01 postmaster 6805 postgres 9 0 972 4 4 S 0 > 0.0 0.0 0:00 postmaster 10154 postgres 9 0 968 4 4 S > 0 0.0 0.0 0:00 postmaster 10446 postgres 9 0 964 4 4 S > 0 0.0 0.0 0:00 postmaster > ==========================================================================
Hi Folks, Please tell me how to bring down postgresql system , the postmaster is dead but some backends still seems to be running... Please Help , i do not want to loose data again !! [root@linux10320 root2]# su - postgres bash-2.03$ pg_ctl stop /usr/local/pgsql/bin/pg_ctl: kill: (5597) - No such pid waiting for postmaster to shut down................................................................ failed pg_ctl: postmaster does not shut down bash-2.03$ On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote: > Hi > my postmaster died just now, > > only a bunch of backends running > > [rmallah@server rmallah]$ psql -h 130.94.22.209 -U tradein > tradein_clients psql: could not connect to server: Connection refused > Is the server running on host 130.94.22.209 and accepting > TCP/IP connections on port 5432? > [rmallah@server rmallah]$ > > output of ps > ============ > > [root@linux10320 root2]# ps auxwww| grep post > postgres 5598 0.0 0.0 140412 4 ? D May07 2:31 postgres: > stats buffer process > postgres 5599 1.1 0.0 142396 20 ? R May07 48:48 postgres: > stats collector process > postgres 12262 0.0 0.0 238712 4 ? D May09 0:19 postgres: > tradein tradein_clients 130.94.20.27 SELECT > postgres 13039 0.0 0.0 139812 4 ? D May09 0:00 postgres: > checkpoint subprocess > postgres 29440 0.0 0.8 140664 9256 ? S 14:35 0:01 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 6805 0.0 0.0 140196 4 ? S 16:08 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10154 0.0 0.0 140196 4 ? S 16:38 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10446 0.0 0.0 140164 4 ? S 16:43 0:00 postgres: > postgres tradein_clients 203.196.129.235 idle > [root@linux10320 root2]# > > ============= > output of top > ============= > 5:29pm up 3 days, 24 min, 4 users, load average: 5.68, 5.61, 5.70 > 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped > CPU states: 5.8% user, 44.3% system, 0.0% nice, 49.8% idle > Mem: 1028484K av, 900084K used, 128400K free, 0K shrd, 2968K > buff Swap: 971004K av, 99288K used, 871716K free > 857220K cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > 5599 postgres 17 0 2064 20 20 R 0 99.9 0.0 53:25 > postmaster 5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 > 2:31 postmaster 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 > 0:19 postmaster 13039 postgres 9 0 656 4 4 D 0 0.0 > 0.0 0:00 postmaster 29440 postgres 9 0 10512 9256 9256 S 0 > 0.0 0.8 0:01 postmaster 6805 postgres 9 0 972 4 4 S 0 > 0.0 0.0 0:00 postmaster 10154 postgres 9 0 968 4 4 S > 0 0.0 0.0 0:00 postmaster 10446 postgres 9 0 964 4 4 S > 0 0.0 0.0 0:00 postmaster > ==========================================================================
A core file was also found in postgres's Home directory regds mallah. On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote: > Hi > my postmaster died just now, > > only a bunch of backends running > > [rmallah@server rmallah]$ psql -h 130.94.22.209 -U tradein > tradein_clients psql: could not connect to server: Connection refused > Is the server running on host 130.94.22.209 and accepting > TCP/IP connections on port 5432? > [rmallah@server rmallah]$ > > output of ps > ============ > > [root@linux10320 root2]# ps auxwww| grep post > postgres 5598 0.0 0.0 140412 4 ? D May07 2:31 postgres: > stats buffer process > postgres 5599 1.1 0.0 142396 20 ? R May07 48:48 postgres: > stats collector process > postgres 12262 0.0 0.0 238712 4 ? D May09 0:19 postgres: > tradein tradein_clients 130.94.20.27 SELECT > postgres 13039 0.0 0.0 139812 4 ? D May09 0:00 postgres: > checkpoint subprocess > postgres 29440 0.0 0.8 140664 9256 ? S 14:35 0:01 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 6805 0.0 0.0 140196 4 ? S 16:08 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10154 0.0 0.0 140196 4 ? S 16:38 0:00 postgres: > tradein tradein_clients 203.196.129.235 idle > postgres 10446 0.0 0.0 140164 4 ? S 16:43 0:00 postgres: > postgres tradein_clients 203.196.129.235 idle > [root@linux10320 root2]# > > ============= > output of top > ============= > 5:29pm up 3 days, 24 min, 4 users, load average: 5.68, 5.61, 5.70 > 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped > CPU states: 5.8% user, 44.3% system, 0.0% nice, 49.8% idle > Mem: 1028484K av, 900084K used, 128400K free, 0K shrd, 2968K > buff Swap: 971004K av, 99288K used, 871716K free > 857220K cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > 5599 postgres 17 0 2064 20 20 R 0 99.9 0.0 53:25 > postmaster 5598 postgres 9 0 1440 4 4 D 0 0.0 0.0 > 2:31 postmaster 12262 postgres 9 0 88564 4 4 D 0 0.0 0.0 > 0:19 postmaster 13039 postgres 9 0 656 4 4 D 0 0.0 > 0.0 0:00 postmaster 29440 postgres 9 0 10512 9256 9256 S 0 > 0.0 0.8 0:01 postmaster 6805 postgres 9 0 972 4 4 S 0 > 0.0 0.0 0:00 postmaster 10154 postgres 9 0 968 4 4 S > 0 0.0 0.0 0:00 postmaster 10446 postgres 9 0 964 4 4 S > 0 0.0 0.0 0:00 postmaster > ========================================================================== > > On Friday 10 May 2002 04:18 pm, Jan Wieck wrote: > > Rajesh Kumar Mallah. wrote: > > > Hi Folks, > > > please help , > > > > > > therse seems to be too much lag between the access collector > > > and system status. even the pids of backend does not seems to be > > > matching. > > > > The delay is on average 250 milliseconds for a busy database > > (1/4 second). The controlling definition is in > > > > src/include/pgstat.h: > > #define PGSTAT_STAT_INTERVAL 500 > > > > This means, from the moment ANY statistic packet has arrived > > in the collector, it waits 500 milliseconds before writing > > out all information. Thus, the above 250 milliseconds > > average is only true assuming a constant flow of packets. > > > > And, before you discover this one: The backends send their > > statistic collection information via UDP packets. In the case > > of heavy database load, some of these packets can get lost so > > that the statistics will not be 100% accurate. This is a > > wanted feature and implemented on purpose! It is because > > counting the number of scans isn't considered as much > > important as responding to the client as fast as possible > > during the rushhour. > > > > > > Jan > > > > > tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS > > > procpid, pg_stat_get_backend_activity(s.backendid) AS current_query > > > FROM (SELECT pg_stat_get_backend_idset() AS backendid) s; > > > > > > procpid | current_query > > > ---------+------------------------------- > > > 27134 | <IDLE> in transaction > > > 26958 | <IDLE> in transaction > > > 26953 | <IDLE> in transaction > > > 26960 | <IDLE> in transaction > > > 27008 | <IDLE> in transaction > > > 12839 | <IDLE> > > > 26977 | <IDLE> in transaction > > > 27012 | <IDLE> in transaction > > > 31354 | <IDLE> > > > 27014 | <IDLE> in transaction > > > 27015 | <IDLE> in transaction > > > 26978 | <IDLE> in transaction > > > 26985 | <IDLE> in transaction > > > 27135 | select count(*) from ( select distinct on (email_id) > > > email_id,email,contact from email_bank a join (select email_id from > > > email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, > > > 10, 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1 > > > 12262 | SELECT source_id , cnt from (SELECT > > > source_id,count(source_id) as cnt from email_source group by source_id > > > ) subsel join sources > > > using(source_id) order by source_id > > > 27136 | <IDLE> in transaction > > > (16 rows) > > > > > > > > > > > > tradein_clients=# > > > why does the above not match with the "top" output at > > > the same time: > > > > > > ======================================================================= > > >== ===== 4:10pm up 2 days, 23:06, 2 users, load average: 6.21, 6.06, > > > 5.60 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped > > > CPU states: 55.6% user, 2.0% system, 0.0% nice, 42.3% idle > > > Mem: 1028484K av, 980320K used, 48164K free, 0K shrd, > > > 3744K buff Swap: 971004K av, 102532K used, 868472K free > > > 912724K cached > > > > > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME > > > COMMAND 5456 postgres 17 0 59456 57M 57156 R 0 99.1 5.7 > > > 3:12 postmaster 6601 postgres 9 0 79964 77M 78328 S 0 3.3 > > > 7.7 0:01 postmaster 6779 postgres 9 0 88412 86M 86752 S 0 > > > 1.9 8.5 0:00 postmaster 6703 postgres 9 0 81668 79M 80276 S > > > 0 1.7 7.9 0:01 postmaster 6943 postgres 9 0 78732 76M 77520 S > > > 0 1.7 7.6 0:01 postmaster 6940 postgres 9 0 44180 42M 42668 S > > > 0 0.5 4.2 0:00 postmaster 6776 postgres 9 0 121M 121M 119M S > > > 0 0.3 12.0 0:01 postmaster 5597 postgres 8 0 624 248 216 S > > > 0 0.0 0.0 0:24 postmaster 5598 postgres 9 0 1440 4 4 D > > > 0 0.0 0.0 2:31 postmaster 5599 postgres 9 0 2052 4 4 > > > S 0 0.0 0.0 28:05 postmaster 12262 postgres 9 0 88564 4 > > > 4 D 0 0.0 0.0 0:19 postmaster 13039 postgres 9 0 656 > > > 4 4 D 0 0.0 0.0 0:00 postmaster 29440 postgres 9 0 > > > 20928 19M 20332 S 0 0.0 1.9 0:01 postmaster 1652 postgres > > > 9 0 3356 2324 2144 S 0 0.0 0.2 3:21 postmaster 2219 > > > postgres 9 0 2744 2120 2068 S 0 0.0 0.2 0:00 postmaster > > > 6772 postgres 9 0 100M 100M 99.3M S 0 0.0 9.9 0:00 > > > postmaster 6805 postgres 9 0 4440 4168 3532 S 0 0.0 0.4 > > > 0:00 postmaster 6809 postgres 9 0 35280 34M 33948 S 0 0.0 > > > 3.4 0:00 postmaster 6846 postgres 9 0 98.9M 98M 99804 S 0 0.0 > > > 9.8 0:01 postmaster 6931 postgres 9 0 21744 20M 20428 S 0 0.0 > > > 2.0 0:02 postmaster 6934 postgres 9 0 19020 18M 17868 S 0 0.0 > > > 1.8 0:00 postmaster 6941 postgres 9 0 63280 61M 61756 S 0 > > > 0.0 6.1 0:01 postmaster > > > ======================================================================= > > >== ======= > > > > > > > > > [root@linux10320 root2]# kill -INT 27135 > > > bash: kill: (27135) - No such pid > > > [root@linux10320 root2]# > > > > > > > > > and # kill -INT 12262 does not actually kills it ?? > > > > > > regds > > > mallah. > > > > > > > > > -- > > > Rajesh Kumar Mallah, > > > Project Manager (Development) > > > Infocom Network Limited, New Delhi > > > phone: +91(11)6152172 (221) (L) ,9811255597 (M) > > > > > > Visit http://www.trade-india.com , > > > India's Leading B2B eMarketplace. > > > > > > > > > > > > ---------------------------(end of > > > broadcast)--------------------------- TIP 3: if posting/reading through > > > Usenet, please send an appropriate subscribe-nomail command to > > > majordomo@postgresql.org so that your message can get through to the > > > mailing list cleanly -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
"Rajesh Kumar Mallah." <mallah@trade-india.com> writes: > A core file was also found in postgres's Home directory Can you provide a gdb backtrace from the core file? regards, tom lane
Hi , This is what i have regds mallah. [root@linux10320 data]# gdb -c core-10may GNU gdb 19991004 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux". Core was generated by `/usr/local/pgsql/bin/postmaster'. Program terminated with signal 25, File size limit exceeded. #0 0x40272af4 in ?? () (gdb) On Friday 10 May 2002 09:26 pm, Tom Lane wrote: > "Rajesh Kumar Mallah." <mallah@trade-india.com> writes: > > A core file was also found in postgres's Home directory > > Can you provide a gdb backtrace from the core file? > > regards, tom lane --
"Rajesh Kumar Mallah." <mallah@trade-india.com> writes: > This is what i have > Core was generated by `/usr/local/pgsql/bin/postmaster'. > Program terminated with signal 25, File size limit exceeded. Well, there you have it. Probably the postmaster's log output exceeded whatever ulimit setting you have it running under. It's a real good idea to start postmasters under "ulimit -a unlimited" (or whatever your local syntax is). I'd recommend putting such a command directly into the postmaster start script you use. regards, tom lane
Jan Wieck <janwieck@yahoo.com> writes: > And, before you discover this one: The backends send their > statistic collection information via UDP packets. In the case > of heavy database load, some of these packets can get lost so > that the statistics will not be 100% accurate. Recently the SourceForge DBAs got quite confused by this: under load, the pg_stats_activity view would show query-in-progress entries for backends that were not only not busy any more, but actually had terminated long since. It took awhile to realize that this was pgstats operating as designed and not a symptom of serious problems. Although it's okay for pg_stats to lag the true state of affairs by some amount of time, it's not good for a view that claims to be current state to be wrong for indefinitely long periods. Would it be possible to improve the reliability of transmission of backend-quit messages somehow? One idea that comes to mind is for pgstats to look through the shared memory PROC list occasionally to see if its idea of active processes still matches reality. Both idle and dead processes could be reliably detected that way; also, we could detect busy (or at least in-a-transaction) processes and change their viewable state to "<unknown query>" if we hadn't gotten any query text from them. Interestingly, this approach would allow a somewhat useful pg_stats_activity view to be maintained even without *any* messages transmitted by backends. regards, tom lane
Yes you are very correct, postmaster's log had exceed the OS file limit which was ~ 2.1 GB . i wanted to rotate these log but i do not know how to make postmaster recreate a log file while it is running. I start postmaster as $ pg_ctl -l /var/log/pgsql start for a reason i will explain in a seperate post thanks very much , On Saturday 11 May 2002 08:31 pm, Tom Lane wrote: > "Rajesh Kumar Mallah." <mallah@trade-india.com> writes: > > This is what i have > > > > Core was generated by `/usr/local/pgsql/bin/postmaster'. > > Program terminated with signal 25, File size limit exceeded. > > Well, there you have it. Probably the postmaster's log output exceeded > whatever ulimit setting you have it running under. > > It's a real good idea to start postmasters under "ulimit -a unlimited" > (or whatever your local syntax is). I'd recommend putting such a > command directly into the postmaster start script you use. > > regards, tom lane -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace.
"Rajesh Kumar Mallah." <mallah@trade-india.com> wrote: > Yes you are very correct, > > postmaster's log had exceed the OS file limit > which was ~ 2.1 GB . > > i wanted to rotate these log but i do not know > how to make postmaster recreate a log file while > it is running. Well, the most correct way to do a logrotate is ( Redhat ): 1) Put on your postgresql.conf the following lines: syslog = 2 syslog_facility = 'LOCAL0' syslog_ident = 'postgres' 2) Put on the directory /etc/logrotate.d a file called 'postgres' with the following lines: /var/log/postgresql.log { compress rotate 2 size=10000k errors mendola@bigfoot.com create 0664 postgres postgres daily postrotate /usr/bin/killall -HUP syslogd endscript } change the email address of course :-) 3) Put the following line on your /etc/syslog.conf # Save postgresql logs LOCAL0.* /var/log/postgresql.log Ciao Gaetano -- #exclude <windows> #include <CSRSS> printf("\t\t\b\b\b\b\b\b");. printf("\t\t\b\b\b\b\b\b");