Thread: pg_stat_get_backend_pid seems to be listing non existant pids !!

pg_stat_get_backend_pid seems to be listing non existant pids !!

From

"Rajesh Kumar Mallah."

Date:

10 May 2002, 06:27:19

Hi Folks,
please help ,

therse seems to be too much lag between the access collector
and system status. even the pids of backend does not seems to be matching.

tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT
pg_stat_get_backend_idset() AS backendid) s;
procpid |   current_query
---------+-------------------------------  27134 | <IDLE> in transaction  26958 | <IDLE> in transaction  26953 | <IDLE>
intransaction  26960 | <IDLE> in transaction  27008 | <IDLE> in transaction  12839 | <IDLE>  26977 | <IDLE> in
transaction 27012 | <IDLE> in transaction  31354 | <IDLE>  27014 | <IDLE> in transaction  27015 | <IDLE> in transaction
26978 | <IDLE> in transaction  26985 | <IDLE> in transaction  27135 | select count(*) from ( select distinct on
(email_id) 
email_id,email,contact from  email_bank a  join (select email_id from
email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1  12262 | SELECT source_id , cnt from (SELECT
source_id,count(source_id)as  
cnt from email_source group by source_id ) subsel join sources
using(source_id) order by source_id  27136 | <IDLE> in transaction
(16 rows)



tradein_clients=#
why does the above not match with the "top" output at
the same time:

==============================================================================
4:10pm  up 2 days, 23:06,  2 users,  load average: 6.21, 6.06, 5.60
69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 55.6% user,  2.0% system,  0.0% nice, 42.3% idle
Mem:  1028484K av,  980320K used,   48164K free,       0K shrd,    3744K buff
Swap:  971004K av,  102532K used,  868472K free                  912724K
cached
 PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND5456 postgres  17   0 59456  57M 57156 R
099.1  5.7   3:12 postmaster6601 postgres   9   0 79964  77M 78328 S       0  3.3  7.7   0:01 postmaster6779 postgres
9  0 88412  86M 86752 S       0  1.9  8.5   0:00 postmaster6703 postgres   9   0 81668  79M 80276 S       0  1.7  7.9
0:01postmaster6943 postgres   9   0 78732  76M 77520 S       0  1.7  7.6   0:01 postmaster6940 postgres   9   0 44180
42M42668 S       0  0.5  4.2   0:00 postmaster6776 postgres   9   0  121M 121M  119M S       0  0.3 12.0   0:01
postmaster5597postgres   8   0   624  248   216 S       0  0.0  0.0   0:24 postmaster5598 postgres   9   0  1440    4
 4 D       0  0.0  0.0   2:31 postmaster5599 postgres   9   0  2052    4     4 S       0  0.0  0.0  28:05 postmaster 
12262 postgres   9   0 88564    4     4 D       0  0.0  0.0   0:19 postmaster
13039 postgres   9   0   656    4     4 D       0  0.0  0.0   0:00 postmaster
29440 postgres   9   0 20928  19M 20332 S       0  0.0  1.9   0:01 postmaster1652 postgres   9   0  3356 2324  2144 S
   0  0.0  0.2   3:21 postmaster2219 postgres   9   0  2744 2120  2068 S       0  0.0  0.2   0:00 postmaster6772
postgres  9   0  100M 100M 99.3M S       0  0.0  9.9   0:00 postmaster6805 postgres   9   0  4440 4168  3532 S       0
0.0 0.4   0:00 postmaster6809 postgres   9   0 35280  34M 33948 S       0  0.0  3.4   0:00 postmaster6846 postgres   9
0 98.9M  98M 99804 S       0  0.0  9.8   0:01 postmaster6931 postgres   9   0 21744  20M 20428 S       0  0.0  2.0
0:02postmaster6934 postgres   9   0 19020  18M 17868 S       0  0.0  1.8   0:00 postmaster6941 postgres   9   0 63280
61M61756 S       0  0.0  6.1   0:01 postmaster 
================================================================================


[root@linux10320 root2]# kill -INT 27135
bash: kill: (27135) - No such pid
[root@linux10320 root2]#


and # kill -INT 12262   does not actually kills it ??

regds
mallah.


--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

Re: pg_stat_get_backend_pid seems to be listing non existant

From

Jan Wieck

Date:

10 May 2002, 06:49:50

Rajesh Kumar Mallah. wrote:
> Hi Folks,
> please help ,
>
> therse seems to be too much lag between the access collector
> and system status. even the pids of backend does not seems to be matching.
   The  delay is on average 250 milliseconds for a busy database   (1/4 second).  The controlling definition is in
       src/include/pgstat.h:       #define PGSTAT_STAT_INTERVAL    500
   This means, from the moment ANY statistic packet has  arrived   in  the  collector,  it waits 500 milliseconds
beforewriting   out  all  information.   Thus,  the  above  250  milliseconds   average is only true assuming a
constantflow of packets.
 
   And,  before  you  discover this one: The backends send their   statistic collection information via UDP packets. In
thecase   of heavy database load, some of these packets can get lost so   that the statistics will not be  100%
accurate. This  is  a   wanted  feature  and  implemented  on purpose!  It is because   counting  the  number  of
scans isn't  considered  as  much   important  as  responding  to  the client as fast as possible   during the
rushhour.


Jan

>
> tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
> pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT
> pg_stat_get_backend_idset() AS backendid) s;
>
>  procpid |   current_query
> ---------+-------------------------------
>    27134 | <IDLE> in transaction
>    26958 | <IDLE> in transaction
>    26953 | <IDLE> in transaction
>    26960 | <IDLE> in transaction
>    27008 | <IDLE> in transaction
>    12839 | <IDLE>
>    26977 | <IDLE> in transaction
>    27012 | <IDLE> in transaction
>    31354 | <IDLE>
>    27014 | <IDLE> in transaction
>    27015 | <IDLE> in transaction
>    26978 | <IDLE> in transaction
>    26985 | <IDLE> in transaction
>    27135 | select count(*) from ( select distinct on (email_id)
> email_id,email,contact from  email_bank a  join (select email_id from
> email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
> 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1
>    12262 | SELECT source_id , cnt from (SELECT  source_id,count(source_id) as
> cnt from email_source group by source_id ) subsel join sources
> using(source_id) order by source_id
>    27136 | <IDLE> in transaction
> (16 rows)
>
>
>
> tradein_clients=#
> why does the above not match with the "top" output at
> the same time:
>
> ==============================================================================
> 4:10pm  up 2 days, 23:06,  2 users,  load average: 6.21, 6.06, 5.60
> 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped
> CPU states: 55.6% user,  2.0% system,  0.0% nice, 42.3% idle
> Mem:  1028484K av,  980320K used,   48164K free,       0K shrd,    3744K buff
> Swap:  971004K av,  102532K used,  868472K free                  912724K
> cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
>  5456 postgres  17   0 59456  57M 57156 R       0 99.1  5.7   3:12 postmaster
>  6601 postgres   9   0 79964  77M 78328 S       0  3.3  7.7   0:01 postmaster
>  6779 postgres   9   0 88412  86M 86752 S       0  1.9  8.5   0:00 postmaster
>  6703 postgres   9   0 81668  79M 80276 S       0  1.7  7.9   0:01 postmaster
>  6943 postgres   9   0 78732  76M 77520 S       0  1.7  7.6   0:01 postmaster
>  6940 postgres   9   0 44180  42M 42668 S       0  0.5  4.2   0:00 postmaster
>  6776 postgres   9   0  121M 121M  119M S       0  0.3 12.0   0:01 postmaster
>  5597 postgres   8   0   624  248   216 S       0  0.0  0.0   0:24 postmaster
>  5598 postgres   9   0  1440    4     4 D       0  0.0  0.0   2:31 postmaster
>  5599 postgres   9   0  2052    4     4 S       0  0.0  0.0  28:05 postmaster
> 12262 postgres   9   0 88564    4     4 D       0  0.0  0.0   0:19 postmaster
> 13039 postgres   9   0   656    4     4 D       0  0.0  0.0   0:00 postmaster
> 29440 postgres   9   0 20928  19M 20332 S       0  0.0  1.9   0:01 postmaster
>  1652 postgres   9   0  3356 2324  2144 S       0  0.0  0.2   3:21 postmaster
>  2219 postgres   9   0  2744 2120  2068 S       0  0.0  0.2   0:00 postmaster
>  6772 postgres   9   0  100M 100M 99.3M S       0  0.0  9.9   0:00 postmaster
>  6805 postgres   9   0  4440 4168  3532 S       0  0.0  0.4   0:00 postmaster
>  6809 postgres   9   0 35280  34M 33948 S       0  0.0  3.4   0:00 postmaster
>  6846 postgres   9   0 98.9M  98M 99804 S       0  0.0  9.8   0:01 postmaster
>  6931 postgres   9   0 21744  20M 20428 S       0  0.0  2.0   0:02 postmaster
>  6934 postgres   9   0 19020  18M 17868 S       0  0.0  1.8   0:00 postmaster
>  6941 postgres   9   0 63280  61M 61756 S       0  0.0  6.1   0:01 postmaster
> ================================================================================
>
>
> [root@linux10320 root2]# kill -INT 27135
> bash: kill: (27135) - No such pid
> [root@linux10320 root2]#
>
>
> and # kill -INT 12262   does not actually kills it ??
>
> regds
> mallah.
>
>
> --
> Rajesh Kumar Mallah,
> Project Manager (Development)
> Infocom Network Limited, New Delhi
> phone: +91(11)6152172 (221) (L) ,9811255597 (M)
>
> Visit http://www.trade-india.com ,
> India's Leading B2B eMarketplace.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

postmaster dead now !!

From

"Rajesh Kumar Mallah."

Date:

10 May 2002, 07:44:00


Hi
my postmaster died just now,

only a bunch of backends running

[rmallah@server rmallah]$ psql -h 130.94.22.209  -U tradein  tradein_clients
psql: could not connect to server: Connection refused       Is the server running on host 130.94.22.209 and accepting
   TCP/IP connections on port 5432? 
[rmallah@server rmallah]$

output of ps
============

[root@linux10320 root2]# ps auxwww| grep post
postgres  5598  0.0  0.0 140412   4 ?        D    May07   2:31 postgres: stats
buffer process
postgres  5599  1.1  0.0 142396  20 ?        R    May07  48:48 postgres: stats
collector process
postgres 12262  0.0  0.0 238712   4 ?        D    May09   0:19 postgres:
tradein tradein_clients 130.94.20.27 SELECT
postgres 13039  0.0  0.0 139812   4 ?        D    May09   0:00 postgres:
checkpoint subprocess
postgres 29440  0.0  0.8 140664 9256 ?       S    14:35   0:01 postgres:
tradein tradein_clients 203.196.129.235 idle
postgres  6805  0.0  0.0 140196   4 ?        S    16:08   0:00 postgres:
tradein tradein_clients 203.196.129.235 idle
postgres 10154  0.0  0.0 140196   4 ?        S    16:38   0:00 postgres:
tradein tradein_clients 203.196.129.235 idle
postgres 10446  0.0  0.0 140164   4 ?        S    16:43   0:00 postgres:
postgres tradein_clients 203.196.129.235 idle
[root@linux10320 root2]#

=============
output of top
============= 5:29pm  up 3 days, 24 min,  4 users,  load average: 5.68, 5.61, 5.70
54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  5.8% user, 44.3% system,  0.0% nice, 49.8% idle
Mem:  1028484K av,  900084K used,  128400K free,       0K shrd,    2968K buff
Swap:  971004K av,   99288K used,  871716K free                  857220K
cached
 PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND5599 postgres  17   0  2064   20    20 R
099.9  0.0  53:25 postmaster5598 postgres   9   0  1440    4     4 D       0  0.0  0.0   2:31 postmaster 
12262 postgres   9   0 88564    4     4 D       0  0.0  0.0   0:19 postmaster
13039 postgres   9   0   656    4     4 D       0  0.0  0.0   0:00 postmaster
29440 postgres   9   0 10512 9256  9256 S       0  0.0  0.8   0:01 postmaster6805 postgres   9   0   972    4     4 S
   0  0.0  0.0   0:00 postmaster 
10154 postgres   9   0   968    4     4 S       0  0.0  0.0   0:00 postmaster
10446 postgres   9   0   964    4     4 S       0  0.0  0.0   0:00 postmaster
==========================================================================










On Friday 10 May 2002 04:18 pm, Jan Wieck wrote:
> Rajesh Kumar Mallah. wrote:
> > Hi Folks,
> > please help ,
> >
> > therse seems to be too much lag between the access collector
> > and system status. even the pids of backend does not seems to be
> > matching.
>
>     The  delay is on average 250 milliseconds for a busy database
>     (1/4 second).  The controlling definition is in
>
>         src/include/pgstat.h:
>         #define PGSTAT_STAT_INTERVAL    500
>
>     This means, from the moment ANY statistic packet has  arrived
>     in  the  collector,  it waits 500 milliseconds before writing
>     out  all  information.   Thus,  the  above  250  milliseconds
>     average is only true assuming a constant flow of packets.
>
>     And,  before  you  discover this one: The backends send their
>     statistic collection information via UDP packets. In the case
>     of heavy database load, some of these packets can get lost so
>     that the statistics will not be  100%  accurate.  This  is  a
>     wanted  feature  and  implemented  on purpose!  It is because
>     counting  the  number  of  scans  isn't  considered  as  much
>     important  as  responding  to  the client as fast as possible
>     during the rushhour.
>
>
> Jan
>
> > tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
> > pg_stat_get_backend_activity(s.backendid) AS current_query FROM (SELECT
> > pg_stat_get_backend_idset() AS backendid) s;
> >
> >  procpid |   current_query
> > ---------+-------------------------------
> >    27134 | <IDLE> in transaction
> >    26958 | <IDLE> in transaction
> >    26953 | <IDLE> in transaction
> >    26960 | <IDLE> in transaction
> >    27008 | <IDLE> in transaction
> >    12839 | <IDLE>
> >    26977 | <IDLE> in transaction
> >    27012 | <IDLE> in transaction
> >    31354 | <IDLE>
> >    27014 | <IDLE> in transaction
> >    27015 | <IDLE> in transaction
> >    26978 | <IDLE> in transaction
> >    26985 | <IDLE> in transaction
> >    27135 | select count(*) from ( select distinct on (email_id)
> > email_id,email,contact from  email_bank a  join (select email_id from
> > email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9,
> > 10, 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1
> >    12262 | SELECT source_id , cnt from (SELECT
> > source_id,count(source_id) as cnt from email_source group by source_id )
> > subsel join sources
> > using(source_id) order by source_id
> >    27136 | <IDLE> in transaction
> > (16 rows)
> >
> >
> >
> > tradein_clients=#
> > why does the above not match with the "top" output at
> > the same time:
> >
> > =========================================================================
> >===== 4:10pm  up 2 days, 23:06,  2 users,  load average: 6.21, 6.06, 5.60
> > 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped
> > CPU states: 55.6% user,  2.0% system,  0.0% nice, 42.3% idle
> > Mem:  1028484K av,  980320K used,   48164K free,       0K shrd,    3744K
> > buff Swap:  971004K av,  102532K used,  868472K free
> > 912724K cached
> >
> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
> > COMMAND 5456 postgres  17   0 59456  57M 57156 R       0 99.1  5.7   3:12
> > postmaster 6601 postgres   9   0 79964  77M 78328 S       0  3.3  7.7
> > 0:01 postmaster 6779 postgres   9   0 88412  86M 86752 S       0  1.9
> > 8.5   0:00 postmaster 6703 postgres   9   0 81668  79M 80276 S       0
> > 1.7  7.9   0:01 postmaster 6943 postgres   9   0 78732  76M 77520 S
> > 0  1.7  7.6   0:01 postmaster 6940 postgres   9   0 44180  42M 42668 S
> >    0  0.5  4.2   0:00 postmaster 6776 postgres   9   0  121M 121M  119M S
> >       0  0.3 12.0   0:01 postmaster 5597 postgres   8   0   624  248
> > 216 S       0  0.0  0.0   0:24 postmaster 5598 postgres   9   0  1440
> > 4     4 D       0  0.0  0.0   2:31 postmaster 5599 postgres   9   0  2052
> >    4     4 S       0  0.0  0.0  28:05 postmaster 12262 postgres   9   0
> > 88564    4     4 D       0  0.0  0.0   0:19 postmaster 13039 postgres   9
> >   0   656    4     4 D       0  0.0  0.0   0:00 postmaster 29440 postgres
> >   9   0 20928  19M 20332 S       0  0.0  1.9   0:01 postmaster 1652
> > postgres   9   0  3356 2324  2144 S       0  0.0  0.2   3:21 postmaster
> > 2219 postgres   9   0  2744 2120  2068 S       0  0.0  0.2   0:00
> > postmaster 6772 postgres   9   0  100M 100M 99.3M S       0  0.0  9.9
> > 0:00 postmaster 6805 postgres   9   0  4440 4168  3532 S       0  0.0
> > 0.4   0:00 postmaster 6809 postgres   9   0 35280  34M 33948 S       0
> > 0.0  3.4   0:00 postmaster 6846 postgres   9   0 98.9M  98M 99804 S
> > 0  0.0  9.8   0:01 postmaster 6931 postgres   9   0 21744  20M 20428 S
> >    0  0.0  2.0   0:02 postmaster 6934 postgres   9   0 19020  18M 17868 S
> >       0  0.0  1.8   0:00 postmaster 6941 postgres   9   0 63280  61M
> > 61756 S       0  0.0  6.1   0:01 postmaster
> > =========================================================================
> >=======
> >
> >
> > [root@linux10320 root2]# kill -INT 27135
> > bash: kill: (27135) - No such pid
> > [root@linux10320 root2]#
> >
> >
> > and # kill -INT 12262   does not actually kills it ??
> >
> > regds
> > mallah.
> >
> >
> > --
> > Rajesh Kumar Mallah,
> > Project Manager (Development)
> > Infocom Network Limited, New Delhi
> > phone: +91(11)6152172 (221) (L) ,9811255597 (M)
> >
> > Visit http://www.trade-india.com ,
> > India's Leading B2B eMarketplace.
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

Re: postmaster dead now !!

From

"Rajesh Kumar Mallah."

Date:

10 May 2002, 07:50:12

Also wanted to highlight that the
"stats collector process" (PID: 5599)is the one which is taking up 99% CPU in my top
output ,

postgres  5599  1.3  0.0 142396  20 ?        S    May07  57:13 postgres: stats
collector process

On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote:
> Hi
> my postmaster died just now,
>
> only a bunch of backends running
>
> [rmallah@server rmallah]$ psql -h 130.94.22.209  -U tradein
> tradein_clients psql: could not connect to server: Connection refused
>         Is the server running on host 130.94.22.209 and accepting
>         TCP/IP connections on port 5432?
> [rmallah@server rmallah]$
>
> output of ps
> ============
>
> [root@linux10320 root2]# ps auxwww| grep post
> postgres  5598  0.0  0.0 140412   4 ?        D    May07   2:31 postgres:
> stats buffer process
> postgres  5599  1.1  0.0 142396  20 ?        R    May07  48:48 postgres:
> stats collector process
> postgres 12262  0.0  0.0 238712   4 ?        D    May09   0:19 postgres:
> tradein tradein_clients 130.94.20.27 SELECT
> postgres 13039  0.0  0.0 139812   4 ?        D    May09   0:00 postgres:
> checkpoint subprocess
> postgres 29440  0.0  0.8 140664 9256 ?       S    14:35   0:01 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres  6805  0.0  0.0 140196   4 ?        S    16:08   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10154  0.0  0.0 140196   4 ?        S    16:38   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10446  0.0  0.0 140164   4 ?        S    16:43   0:00 postgres:
> postgres tradein_clients 203.196.129.235 idle
> [root@linux10320 root2]#
>
> =============
> output of top
> =============
>   5:29pm  up 3 days, 24 min,  4 users,  load average: 5.68, 5.61, 5.70
> 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped
> CPU states:  5.8% user, 44.3% system,  0.0% nice, 49.8% idle
> Mem:  1028484K av,  900084K used,  128400K free,       0K shrd,    2968K
> buff Swap:  971004K av,   99288K used,  871716K free
> 857220K cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
>  5599 postgres  17   0  2064   20    20 R       0 99.9  0.0  53:25
> postmaster 5598 postgres   9   0  1440    4     4 D       0  0.0  0.0
> 2:31 postmaster 12262 postgres   9   0 88564    4     4 D       0  0.0  0.0
>   0:19 postmaster 13039 postgres   9   0   656    4     4 D       0  0.0
> 0.0   0:00 postmaster 29440 postgres   9   0 10512 9256  9256 S       0
> 0.0  0.8   0:01 postmaster 6805 postgres   9   0   972    4     4 S       0
>  0.0  0.0   0:00 postmaster 10154 postgres   9   0   968    4     4 S
> 0  0.0  0.0   0:00 postmaster 10446 postgres   9   0   964    4     4 S
>   0  0.0  0.0   0:00 postmaster
> ==========================================================================

pg_ctl stop does not work.

From

"Rajesh Kumar Mallah."

Date:

10 May 2002, 07:55:58

Hi Folks,


Please tell me how to bring down postgresql
system , the postmaster is dead but some backends
still seems  to be running...

Please Help , i do not want to loose data again !!

[root@linux10320 root2]# su - postgres
bash-2.03$ pg_ctl  stop
/usr/local/pgsql/bin/pg_ctl: kill: (5597) - No such pid
waiting for postmaster to shut
down................................................................ failed
pg_ctl: postmaster does not shut down
bash-2.03$

On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote:
> Hi
> my postmaster died just now,
>
> only a bunch of backends running
>
> [rmallah@server rmallah]$ psql -h 130.94.22.209  -U tradein
> tradein_clients psql: could not connect to server: Connection refused
>         Is the server running on host 130.94.22.209 and accepting
>         TCP/IP connections on port 5432?
> [rmallah@server rmallah]$
>
> output of ps
> ============
>
> [root@linux10320 root2]# ps auxwww| grep post
> postgres  5598  0.0  0.0 140412   4 ?        D    May07   2:31 postgres:
> stats buffer process
> postgres  5599  1.1  0.0 142396  20 ?        R    May07  48:48 postgres:
> stats collector process
> postgres 12262  0.0  0.0 238712   4 ?        D    May09   0:19 postgres:
> tradein tradein_clients 130.94.20.27 SELECT
> postgres 13039  0.0  0.0 139812   4 ?        D    May09   0:00 postgres:
> checkpoint subprocess
> postgres 29440  0.0  0.8 140664 9256 ?       S    14:35   0:01 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres  6805  0.0  0.0 140196   4 ?        S    16:08   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10154  0.0  0.0 140196   4 ?        S    16:38   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10446  0.0  0.0 140164   4 ?        S    16:43   0:00 postgres:
> postgres tradein_clients 203.196.129.235 idle
> [root@linux10320 root2]#
>
> =============
> output of top
> =============
>   5:29pm  up 3 days, 24 min,  4 users,  load average: 5.68, 5.61, 5.70
> 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped
> CPU states:  5.8% user, 44.3% system,  0.0% nice, 49.8% idle
> Mem:  1028484K av,  900084K used,  128400K free,       0K shrd,    2968K
> buff Swap:  971004K av,   99288K used,  871716K free
> 857220K cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
>  5599 postgres  17   0  2064   20    20 R       0 99.9  0.0  53:25
> postmaster 5598 postgres   9   0  1440    4     4 D       0  0.0  0.0
> 2:31 postmaster 12262 postgres   9   0 88564    4     4 D       0  0.0  0.0
>   0:19 postmaster 13039 postgres   9   0   656    4     4 D       0  0.0
> 0.0   0:00 postmaster 29440 postgres   9   0 10512 9256  9256 S       0
> 0.0  0.8   0:01 postmaster 6805 postgres   9   0   972    4     4 S       0
>  0.0  0.0   0:00 postmaster 10154 postgres   9   0   968    4     4 S
> 0  0.0  0.0   0:00 postmaster 10446 postgres   9   0   964    4     4 S
>   0  0.0  0.0   0:00 postmaster
> ==========================================================================

core file found...

From

"Rajesh Kumar Mallah."

Date:

10 May 2002, 08:03:35


A core file was also found  in postgres's Home directory

regds
mallah.


On Friday 10 May 2002 05:13 pm, Rajesh Kumar Mallah. wrote:
> Hi
> my postmaster died just now,
>
> only a bunch of backends running
>
> [rmallah@server rmallah]$ psql -h 130.94.22.209  -U tradein
> tradein_clients psql: could not connect to server: Connection refused
>         Is the server running on host 130.94.22.209 and accepting
>         TCP/IP connections on port 5432?
> [rmallah@server rmallah]$
>
> output of ps
> ============
>
> [root@linux10320 root2]# ps auxwww| grep post
> postgres  5598  0.0  0.0 140412   4 ?        D    May07   2:31 postgres:
> stats buffer process
> postgres  5599  1.1  0.0 142396  20 ?        R    May07  48:48 postgres:
> stats collector process
> postgres 12262  0.0  0.0 238712   4 ?        D    May09   0:19 postgres:
> tradein tradein_clients 130.94.20.27 SELECT
> postgres 13039  0.0  0.0 139812   4 ?        D    May09   0:00 postgres:
> checkpoint subprocess
> postgres 29440  0.0  0.8 140664 9256 ?       S    14:35   0:01 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres  6805  0.0  0.0 140196   4 ?        S    16:08   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10154  0.0  0.0 140196   4 ?        S    16:38   0:00 postgres:
> tradein tradein_clients 203.196.129.235 idle
> postgres 10446  0.0  0.0 140164   4 ?        S    16:43   0:00 postgres:
> postgres tradein_clients 203.196.129.235 idle
> [root@linux10320 root2]#
>
> =============
> output of top
> =============
>   5:29pm  up 3 days, 24 min,  4 users,  load average: 5.68, 5.61, 5.70
> 54 processes: 52 sleeping, 2 running, 0 zombie, 0 stopped
> CPU states:  5.8% user, 44.3% system,  0.0% nice, 49.8% idle
> Mem:  1028484K av,  900084K used,  128400K free,       0K shrd,    2968K
> buff Swap:  971004K av,   99288K used,  871716K free
> 857220K cached
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
>  5599 postgres  17   0  2064   20    20 R       0 99.9  0.0  53:25
> postmaster 5598 postgres   9   0  1440    4     4 D       0  0.0  0.0
> 2:31 postmaster 12262 postgres   9   0 88564    4     4 D       0  0.0  0.0
>   0:19 postmaster 13039 postgres   9   0   656    4     4 D       0  0.0
> 0.0   0:00 postmaster 29440 postgres   9   0 10512 9256  9256 S       0
> 0.0  0.8   0:01 postmaster 6805 postgres   9   0   972    4     4 S       0
>  0.0  0.0   0:00 postmaster 10154 postgres   9   0   968    4     4 S
> 0  0.0  0.0   0:00 postmaster 10446 postgres   9   0   964    4     4 S
>   0  0.0  0.0   0:00 postmaster
> ==========================================================================
>
> On Friday 10 May 2002 04:18 pm, Jan Wieck wrote:
> > Rajesh Kumar Mallah. wrote:
> > > Hi Folks,
> > > please help ,
> > >
> > > therse seems to be too much lag between the access collector
> > > and system status. even the pids of backend does not seems to be
> > > matching.
> >
> >     The  delay is on average 250 milliseconds for a busy database
> >     (1/4 second).  The controlling definition is in
> >
> >         src/include/pgstat.h:
> >         #define PGSTAT_STAT_INTERVAL    500
> >
> >     This means, from the moment ANY statistic packet has  arrived
> >     in  the  collector,  it waits 500 milliseconds before writing
> >     out  all  information.   Thus,  the  above  250  milliseconds
> >     average is only true assuming a constant flow of packets.
> >
> >     And,  before  you  discover this one: The backends send their
> >     statistic collection information via UDP packets. In the case
> >     of heavy database load, some of these packets can get lost so
> >     that the statistics will not be  100%  accurate.  This  is  a
> >     wanted  feature  and  implemented  on purpose!  It is because
> >     counting  the  number  of  scans  isn't  considered  as  much
> >     important  as  responding  to  the client as fast as possible
> >     during the rushhour.
> >
> >
> > Jan
> >
> > > tradein_clients=# SELECT pg_stat_get_backend_pid(s.backendid) AS
> > > procpid, pg_stat_get_backend_activity(s.backendid) AS current_query
> > > FROM (SELECT pg_stat_get_backend_idset() AS backendid) s;
> > >
> > >  procpid |   current_query
> > > ---------+-------------------------------
> > >    27134 | <IDLE> in transaction
> > >    26958 | <IDLE> in transaction
> > >    26953 | <IDLE> in transaction
> > >    26960 | <IDLE> in transaction
> > >    27008 | <IDLE> in transaction
> > >    12839 | <IDLE>
> > >    26977 | <IDLE> in transaction
> > >    27012 | <IDLE> in transaction
> > >    31354 | <IDLE>
> > >    27014 | <IDLE> in transaction
> > >    27015 | <IDLE> in transaction
> > >    26978 | <IDLE> in transaction
> > >    26985 | <IDLE> in transaction
> > >    27135 | select count(*) from ( select distinct on (email_id)
> > > email_id,email,contact from  email_bank a  join (select email_id from
> > > email_export_category where category_id in (1, 2, 3, 4, 5, 6, 7, 8, 9,
> > > 10, 12, 14, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 1
> > >    12262 | SELECT source_id , cnt from (SELECT
> > > source_id,count(source_id) as cnt from email_source group by source_id
> > > ) subsel join sources
> > > using(source_id) order by source_id
> > >    27136 | <IDLE> in transaction
> > > (16 rows)
> > >
> > >
> > >
> > > tradein_clients=#
> > > why does the above not match with the "top" output at
> > > the same time:
> > >
> > > =======================================================================
> > >== ===== 4:10pm  up 2 days, 23:06,  2 users,  load average: 6.21, 6.06,
> > > 5.60 69 processes: 66 sleeping, 3 running, 0 zombie, 0 stopped
> > > CPU states: 55.6% user,  2.0% system,  0.0% nice, 42.3% idle
> > > Mem:  1028484K av,  980320K used,   48164K free,       0K shrd,
> > > 3744K buff Swap:  971004K av,  102532K used,  868472K free
> > > 912724K cached
> > >
> > >   PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
> > > COMMAND 5456 postgres  17   0 59456  57M 57156 R       0 99.1  5.7
> > > 3:12 postmaster 6601 postgres   9   0 79964  77M 78328 S       0  3.3
> > > 7.7 0:01 postmaster 6779 postgres   9   0 88412  86M 86752 S       0
> > > 1.9 8.5   0:00 postmaster 6703 postgres   9   0 81668  79M 80276 S
> > >  0 1.7  7.9   0:01 postmaster 6943 postgres   9   0 78732  76M 77520 S
> > > 0  1.7  7.6   0:01 postmaster 6940 postgres   9   0 44180  42M 42668 S
> > > 0  0.5  4.2   0:00 postmaster 6776 postgres   9   0  121M 121M  119M S
> > > 0  0.3 12.0   0:01 postmaster 5597 postgres   8   0   624  248 216 S
> > >    0  0.0  0.0   0:24 postmaster 5598 postgres   9   0  1440 4     4 D
> > >      0  0.0  0.0   2:31 postmaster 5599 postgres   9   0  2052 4     4
> > > S       0  0.0  0.0  28:05 postmaster 12262 postgres   9   0 88564    4
> > >     4 D       0  0.0  0.0   0:19 postmaster 13039 postgres   9 0   656
> > >   4     4 D       0  0.0  0.0   0:00 postmaster 29440 postgres 9   0
> > > 20928  19M 20332 S       0  0.0  1.9   0:01 postmaster 1652 postgres
> > > 9   0  3356 2324  2144 S       0  0.0  0.2   3:21 postmaster 2219
> > > postgres   9   0  2744 2120  2068 S       0  0.0  0.2   0:00 postmaster
> > > 6772 postgres   9   0  100M 100M 99.3M S       0  0.0  9.9 0:00
> > > postmaster 6805 postgres   9   0  4440 4168  3532 S       0  0.0 0.4
> > > 0:00 postmaster 6809 postgres   9   0 35280  34M 33948 S       0 0.0
> > > 3.4   0:00 postmaster 6846 postgres   9   0 98.9M  98M 99804 S 0  0.0
> > > 9.8   0:01 postmaster 6931 postgres   9   0 21744  20M 20428 S 0  0.0
> > > 2.0   0:02 postmaster 6934 postgres   9   0 19020  18M 17868 S 0  0.0
> > > 1.8   0:00 postmaster 6941 postgres   9   0 63280  61M 61756 S       0
> > > 0.0  6.1   0:01 postmaster
> > > =======================================================================
> > >== =======
> > >
> > >
> > > [root@linux10320 root2]# kill -INT 27135
> > > bash: kill: (27135) - No such pid
> > > [root@linux10320 root2]#
> > >
> > >
> > > and # kill -INT 12262   does not actually kills it ??
> > >
> > > regds
> > > mallah.
> > >
> > >
> > > --
> > > Rajesh Kumar Mallah,
> > > Project Manager (Development)
> > > Infocom Network Limited, New Delhi
> > > phone: +91(11)6152172 (221) (L) ,9811255597 (M)
> > >
> > > Visit http://www.trade-india.com ,
> > > India's Leading B2B eMarketplace.
> > >
> > >
> > >
> > > ---------------------------(end of
> > > broadcast)--------------------------- TIP 3: if posting/reading through
> > > Usenet, please send an appropriate subscribe-nomail command to
> > > majordomo@postgresql.org so that your message can get through to the
> > > mailing list cleanly

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

Re: core file found...

From

Tom Lane

Date:

10 May 2002, 11:56:51

"Rajesh Kumar Mallah." <mallah@trade-india.com> writes:
> A core file was also found  in postgres's Home directory

Can you provide a gdb backtrace from the core file?
        regards, tom lane

Re: core file found...

From

"Rajesh Kumar Mallah."

Date:

11 May 2002, 00:50:12

Hi ,

This is what i have

regds
mallah.

[root@linux10320 data]# gdb -c core-10may
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux".
Core was generated by `/usr/local/pgsql/bin/postmaster'.
Program terminated with signal 25, File size limit exceeded.
#0  0x40272af4 in ?? ()
(gdb)

On Friday 10 May 2002 09:26 pm, Tom Lane wrote:
> "Rajesh Kumar Mallah." <mallah@trade-india.com> writes:
> > A core file was also found  in postgres's Home directory
>
> Can you provide a gdb backtrace from the core file?
>
>             regards, tom lane

--

Re: core file found...

From

Tom Lane

Date:

11 May 2002, 11:02:05

"Rajesh Kumar Mallah." <mallah@trade-india.com> writes:
> This is what i have

> Core was generated by `/usr/local/pgsql/bin/postmaster'.
> Program terminated with signal 25, File size limit exceeded.

Well, there you have it.  Probably the postmaster's log output exceeded
whatever ulimit setting you have it running under.

It's a real good idea to start postmasters under "ulimit -a unlimited"
(or whatever your local syntax is).  I'd recommend putting such a
command directly into the postmaster start script you use.
        regards, tom lane

Re: pg_stat_get_backend_pid seems to be listing non existant

From

Tom Lane

Date:

11 May 2002, 11:33:33

Jan Wieck <janwieck@yahoo.com> writes:
>     And,  before  you  discover this one: The backends send their
>     statistic collection information via UDP packets. In the case
>     of heavy database load, some of these packets can get lost so
>     that the statistics will not be  100%  accurate.

Recently the SourceForge DBAs got quite confused by this: under load,
the pg_stats_activity view would show query-in-progress entries for
backends that were not only not busy any more, but actually had
terminated long since.  It took awhile to realize that this was pgstats
operating as designed and not a symptom of serious problems.

Although it's okay for pg_stats to lag the true state of affairs by
some amount of time, it's not good for a view that claims to be
current state to be wrong for indefinitely long periods.  Would it
be possible to improve the reliability of transmission of backend-quit
messages somehow?

One idea that comes to mind is for pgstats to look through the shared
memory PROC list occasionally to see if its idea of active processes
still matches reality.  Both idle and dead processes could be reliably
detected that way; also, we could detect busy (or at least
in-a-transaction) processes and change their viewable state to
"<unknown query>" if we hadn't gotten any query text from them.
Interestingly, this approach would allow a somewhat useful
pg_stats_activity view to be maintained even without *any* messages
transmitted by backends.
        regards, tom lane

Re: core file found...

From

"Rajesh Kumar Mallah."

Date:

12 May 2002, 01:41:27

Yes you are very correct,

postmaster's log had exceed the OS file limit
which was ~ 2.1 GB  .

i wanted to rotate these log but i do not know
how to make postmaster recreate a log file while
it is running.

I start postmaster as

$ pg_ctl  -l /var/log/pgsql start
for a reason i will explain in a seperate post


thanks very much ,



On Saturday 11 May 2002 08:31 pm, Tom Lane wrote:
> "Rajesh Kumar Mallah." <mallah@trade-india.com> writes:
> > This is what i have
> >
> > Core was generated by `/usr/local/pgsql/bin/postmaster'.
> > Program terminated with signal 25, File size limit exceeded.
>
> Well, there you have it.  Probably the postmaster's log output exceeded
> whatever ulimit setting you have it running under.
>
> It's a real good idea to start postmasters under "ulimit -a unlimited"
> (or whatever your local syntax is).  I'd recommend putting such a
> command directly into the postmaster start script you use.
>
>             regards, tom lane

--
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.

Re: core file found...

From

"Gaetano Mendola"

Date:

12 May 2002, 06:33:46

"Rajesh Kumar Mallah." <mallah@trade-india.com> wrote:
> Yes you are very correct,
>
> postmaster's log had exceed the OS file limit
> which was ~ 2.1 GB  .
>
> i wanted to rotate these log but i do not know
> how to make postmaster recreate a log file while
> it is running.

Well,
the most correct way to do a logrotate is ( Redhat ):

1) Put on your postgresql.conf the following lines:

syslog = 2
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'

2)  Put on the directory /etc/logrotate.d a file called
'postgres' with the following lines:

/var/log/postgresql.log {   compress   rotate 2   size=10000k   errors mendola@bigfoot.com   create 0664 postgres
postgres  daily   postrotate            /usr/bin/killall -HUP syslogd   endscript
 
}

change the email address of course :-)

3) Put the following line on your /etc/syslog.conf

# Save postgresql logs
LOCAL0.*
/var/log/postgresql.log


Ciao
Gaetano

--
#exclude <windows>
#include <CSRSS>
printf("\t\t\b\b\b\b\b\b");.
printf("\t\t\b\b\b\b\b\b");