Thread: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Alexander Farber
Date:
Hello Postgres users,

I have a Linux website with phpBB serving a small Facebook game:

# uname -a
Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS release 5.5 (Final)
# rpm -qa | grep -i postgres
postgresql-devel-8.1.21-1.el5_5.1
postgresql-8.1.21-1.el5_5.1
postgresql-server-8.1.21-1.el5_5.1
postgresql-devel-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-docs-8.1.21-1.el5_5.1

It works generally ok and with little load since June.
(My game has less than 100 users and I have AMD/Quad+4GB)

On Friday I've installed a cronjob (full source code at the bottom):

3       3       *       *       *       psql -a -f
$HOME/bin/clean-phpbb-forum.sql

and on Monday I've found out, that the site is overloaded:

top - 10:16:56 up 3 days, 23:56,  1 user,  load average: 20.55, 21.38, 22.92
Tasks: 157 total,  24 running, 133 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.9%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4019028k total,  3073968k used,   945060k free,    50604k buffers
Swap:  2104496k total,      132k used,  2104364k free,  2316196k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28974 postgres  16   0  122m  14m  10m R 73.3  0.4  20:56.06 postmaster
28727 postgres  16   0  121m  13m  10m R 54.7  0.3  37:58.58 postmaster
28714 postgres  16   0  122m  14m  10m R 50.4  0.4  38:38.98 postmaster
29412 postgres  16   0  121m  13m  10m R 46.5  0.4   8:03.96 postmaster
28542 postgres  16   0  122m  14m  10m R 46.2  0.4  58:49.38 postmaster
28482 postgres  16   0  122m  14m  10m R 45.8  0.4  61:53.37 postmaster
28468 postgres  16   0  122m  14m  10m R 44.2  0.4  62:46.17 postmaster
29017 postgres  16   0  122m  14m  10m R 43.9  0.4  19:17.06 postmaster
28929 postgres  15   0  122m  14m  10m R 42.2  0.4  22:01.43 postmaster
28500 postgres  16   0  122m  14m  10m R 41.3  0.4  59:40.23 postmaster
28460 postgres  16   0  122m  14m  10m R 40.6  0.4  64:17.16 postmaster
28894 postgres  16   0  122m  14m  10m R 38.6  0.4  23:35.53 postmaster
28489 postgres  16   0  122m  14m  10m R 36.0  0.4  60:32.59 postmaster
28719 postgres  15   0  121m  13m  10m R 25.2  0.3  38:10.33 postmaster
29496 postgres  16   0  121m  13m  10m R 22.9  0.4   4:20.32 postmaster
28556 postgres  15   0  122m  14m  10m R 17.7  0.4  57:32.62 postmaster
28735 postgres  15   0  122m  14m  10m R 15.7  0.4  36:09.45 postmaster
29602 postgres  15   0  119m  11m 9680 S  8.2  0.3   0:00.25 postmaster
28457 postgres  17   0  122m  14m   9m R  3.6  0.4  64:34.38 postmaster
26092 apache    15   0  238m  16m 3740 S  0.3  0.4   0:03.38 httpd
29596 afarber   15   0 12744 1116  800 R  0.3  0.0   0:00.09 top
    1 root      15   0 10352  700  592 S  0.0  0.0   0:01.69 init

I understand, that I probably supply not enough info,
but how would you approach debugging this problem?

I've run my cronjob numerous times from CLI - it works ok
and takes only few seconds. I've installed it as an every-3 mins
cronjob on my development VM - it works ok there too.

My biggest problem is, that I don't see any information
from Postgres at the production site - why did it have to start
so many postmaster's (whatever those processes do).

The only log file I've found has been /var/lib/pgsql/pgstartup.log
and there is nothing suspicious there. (Also nothing related to Postgres
in messages, mcelog (empty), audit.log, access_log, error_log).

Please give me some hints
Alex

# cat bin/clean-phpbb-forum.sql
start transaction;

delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';

create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;

insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';

-- select count(topic_id) as "old topics:" from old_topics;

insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;

-- select count(post_id) as "old posts:" from old_posts;

delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);

update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';

update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';

update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;

update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;

commit

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Rajesh Kumar Mallah
Date:
Dear Farber,
Is there anything that stops you from upgrading to a more respectable release number?
Eg 9.0 or at least latest 8.4.X

Regds
Rajesh Kumar Mallah.

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Robert Gravsjö
Date:

Alexander Farber skrev 2010-10-04 11.20:
> Hello Postgres users,
>
> I have a Linux website with phpBB serving a small Facebook game:
>
> # uname -a
> Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010
> x86_64 x86_64 x86_64 GNU/Linux
> # cat /etc/redhat-release
> CentOS release 5.5 (Final)
> # rpm -qa | grep -i postgres
> postgresql-devel-8.1.21-1.el5_5.1
> postgresql-8.1.21-1.el5_5.1
> postgresql-server-8.1.21-1.el5_5.1
> postgresql-devel-8.1.21-1.el5_5.1
> postgresql-libs-8.1.21-1.el5_5.1
> postgresql-libs-8.1.21-1.el5_5.1
> postgresql-docs-8.1.21-1.el5_5.1
>
> It works generally ok and with little load since June.
> (My game has less than 100 users and I have AMD/Quad+4GB)
>
> On Friday I've installed a cronjob (full source code at the bottom):
>
> 3       3       *       *       *       psql -a -f
> $HOME/bin/clean-phpbb-forum.sql
>
> and on Monday I've found out, that the site is overloaded:
>
> top - 10:16:56 up 3 days, 23:56,  1 user,  load average: 20.55, 21.38, 22.92
> Tasks: 157 total,  24 running, 133 sleeping,   0 stopped,   0 zombie
> Cpu(s): 98.9%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   4019028k total,  3073968k used,   945060k free,    50604k buffers
> Swap:  2104496k total,      132k used,  2104364k free,  2316196k cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 28974 postgres  16   0  122m  14m  10m R 73.3  0.4  20:56.06 postmaster
> 28727 postgres  16   0  121m  13m  10m R 54.7  0.3  37:58.58 postmaster
> 28714 postgres  16   0  122m  14m  10m R 50.4  0.4  38:38.98 postmaster
> 29412 postgres  16   0  121m  13m  10m R 46.5  0.4   8:03.96 postmaster
> 28542 postgres  16   0  122m  14m  10m R 46.2  0.4  58:49.38 postmaster
> 28482 postgres  16   0  122m  14m  10m R 45.8  0.4  61:53.37 postmaster
> 28468 postgres  16   0  122m  14m  10m R 44.2  0.4  62:46.17 postmaster
> 29017 postgres  16   0  122m  14m  10m R 43.9  0.4  19:17.06 postmaster
> 28929 postgres  15   0  122m  14m  10m R 42.2  0.4  22:01.43 postmaster
> 28500 postgres  16   0  122m  14m  10m R 41.3  0.4  59:40.23 postmaster
> 28460 postgres  16   0  122m  14m  10m R 40.6  0.4  64:17.16 postmaster
> 28894 postgres  16   0  122m  14m  10m R 38.6  0.4  23:35.53 postmaster
> 28489 postgres  16   0  122m  14m  10m R 36.0  0.4  60:32.59 postmaster
> 28719 postgres  15   0  121m  13m  10m R 25.2  0.3  38:10.33 postmaster
> 29496 postgres  16   0  121m  13m  10m R 22.9  0.4   4:20.32 postmaster
> 28556 postgres  15   0  122m  14m  10m R 17.7  0.4  57:32.62 postmaster
> 28735 postgres  15   0  122m  14m  10m R 15.7  0.4  36:09.45 postmaster
> 29602 postgres  15   0  119m  11m 9680 S  8.2  0.3   0:00.25 postmaster
> 28457 postgres  17   0  122m  14m   9m R  3.6  0.4  64:34.38 postmaster
> 26092 apache    15   0  238m  16m 3740 S  0.3  0.4   0:03.38 httpd
> 29596 afarber   15   0 12744 1116  800 R  0.3  0.0   0:00.09 top
>      1 root      15   0 10352  700  592 S  0.0  0.0   0:01.69 init
>
> I understand, that I probably supply not enough info,
> but how would you approach debugging this problem?

I would connect to the db with psql and query the pg_stat_activity and
the pg_locks views.

>
> I've run my cronjob numerous times from CLI - it works ok
> and takes only few seconds. I've installed it as an every-3 mins
> cronjob on my development VM - it works ok there too.
>
> My biggest problem is, that I don't see any information
> from Postgres at the production site - why did it have to start
> so many postmaster's (whatever those processes do).

Those are backends running queries.

>
> The only log file I've found has been /var/lib/pgsql/pgstartup.log
> and there is nothing suspicious there. (Also nothing related to Postgres
> in messages, mcelog (empty), audit.log, access_log, error_log).

You might want to increase logging. Take a look at
http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html
for details.

Regards
/roppert

>
> Please give me some hints
> Alex
>
> # cat bin/clean-phpbb-forum.sql
> start transaction;
>
> delete from phpbb_users
> where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
> age(to_timestamp(user_regdate))>interval '3 days';
>
> create temp table old_topics (topic_id integer) on commit delete rows;
> create temp table old_posts (post_id integer) on commit delete rows;
>
> insert into old_topics select topic_id from phpbb_topics
> where forum_id=5 and topic_poster=1 and
> age(to_timestamp(topic_time))>interval '7 days';
>
> -- select count(topic_id) as "old topics:" from old_topics;
>
> insert into old_posts select p.post_id from phpbb_posts p, old_topics t
> where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
>
> -- select count(post_id) as "old posts:" from old_posts;
>
> delete from phpbb_posts where post_id in (select post_id from old_posts);
> delete from phpbb_topics where topic_id in (select topic_id from old_topics);
>
> update phpbb_config set
> config_value = (select count(topic_id) from phpbb_topics)
> where config_name = 'num_topics';
>
> update phpbb_config set
> config_value = (select count(post_id) from phpbb_posts)
> where config_name = 'num_posts';
>
> update phpbb_users set
> user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
> where user_id = 1;
>
> update phpbb_forums set
> forum_posts = (select count(post_id) from phpbb_posts),
> forum_topics = (select count(topic_id) from phpbb_topics),
> forum_topics_real = (select count(topic_id) from phpbb_topics)
> where forum_id = 5;
>
> commit
>

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Alexander Farber
Date:
Hello again,

ouch, I have these postmaster's spinning up again -
even though I've disabled the cronjob and rebooted:

top - 11:42:31 up  1:22,  2 users,  load average: 9.15, 7.59, 5.03
Tasks: 135 total,  10 running, 125 sleeping,   0 stopped,   0 zombie
Cpu0  : 31.7%us,  0.5%sy,  0.0%ni, 67.4%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 32.9%us,  0.7%sy,  0.0%ni, 65.8%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu2  : 58.8%us,  1.3%sy,  0.0%ni, 38.2%id,  1.3%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu3  : 21.7%us,  0.5%sy,  0.0%ni, 77.1%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   4019028k total,  1512432k used,  2506596k free,    21828k buffers
Swap:  2104496k total,        0k used,  2104496k free,  1152844k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3208 postgres  17   0  122m  14m   9m R 143.9  0.4  30:22.79 postmaster
 3193 postgres  18   0  122m  14m   9m R 117.7  0.4  34:04.08 postmaster
 3263 postgres  18   0  120m  13m   9m R 109.4  0.3  11:06.24 postmaster
 3315 postgres  17   0  121m  13m 9.9m R 90.7  0.4   1:59.89 postmaster
 3275 postgres  18   0  121m  13m   9m R 83.2  0.4   6:44.74 postmaster
 3243 postgres  18   0  122m  14m   9m R 69.0  0.4  20:04.06 postmaster
 3233 postgres  17   0  122m  14m   9m R 47.2  0.4  25:01.18 postmaster
 3256 postgres  18   0  122m  14m   9m R 35.2  0.4  14:48.54 postmaster


On Mon, Oct 4, 2010 at 11:34 AM, Robert Gravsjö <robert@blogg.se> wrote:
> I would connect to the db with psql and query the pg_stat_activity and the
> pg_locks views.

pref=> select * from pg_stat_activity;
 datid | datname | procpid | usesysid | usename |        current_query
        | query_start |         backend_start         | client_addr |
client_port

-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
 16384 | pref    |    2681 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 10:22:53.051483+02 |             |
         -1
 16384 | pref    |    3193 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:17:40.081829+02 |             |
         -1
 16384 | pref    |    3208 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:20:18.763179+02 |             |
         -1
 16384 | pref    |    3233 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:22:50.818701+02 |             |
         -1
 16384 | pref    |    3243 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:25:34.110573+02 |             |
         -1
 16384 | pref    |    3256 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:28:33.355689+02 |             |
         -1
 16384 | pref    |    3263 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:32:02.116546+02 |             |
         -1
 16384 | pref    |    3275 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:35:31.611361+02 |             |
         -1
 16384 | pref    |    3315 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:40:00.119433+02 |             |
         -1
 16384 | pref    |    3355 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:42:51.827796+02 |             |
         -1
 16384 | pref    |    3356 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 11:43:04.285152+02 |             |
         -1
(11 rows)

pref=> select * from pg_locks;
   locktype    | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction | pid  |      mode       |
granted

---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------------+---------
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677546 | 3275 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677546 | 3275 | AccessShareLock | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677313 | 3243 | AccessShareLock | t
 transactionid |          |          |      |       |       5677313 |
       |       |          |     5677313 | 3243 | ExclusiveLock   | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677742 | 3315 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677395 | 3256 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677313 | 3243 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677313 | 3243 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677070 | 3193 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677234 | 3233 | AccessShareLock | t
 relation      |    16384 |    10342 |      |       |               |
       |       |          |     5677823 | 3355 | AccessShareLock | t
 transactionid |          |          |      |       |       5677234 |
       |       |          |     5677234 | 3233 | ExclusiveLock   | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677454 | 3263 | AccessShareLock | t
 transactionid |          |          |      |       |       5677167 |
       |       |          |     5677167 | 3208 | ExclusiveLock   | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677546 | 3275 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677167 | 3208 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677070 | 3193 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677167 | 3208 | AccessShareLock | t
 transactionid |          |          |      |       |       5677070 |
       |       |          |     5677070 | 3193 | ExclusiveLock   | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677167 | 3208 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677742 | 3315 | AccessShareLock | t
 transactionid |          |          |      |       |       5677823 |
       |       |          |     5677823 | 3355 | ExclusiveLock   | t
 transactionid |          |          |      |       |       5677454 |
       |       |          |     5677454 | 3263 | ExclusiveLock   | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677546 | 3275 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677313 | 3243 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677454 | 3263 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677742 | 3315 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677167 | 3208 | AccessShareLock | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677454 | 3263 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677070 | 3193 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677234 | 3233 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677546 | 3275 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677234 | 3233 | AccessShareLock | t
 transactionid |          |          |      |       |       5677742 |
       |       |          |     5677742 | 3315 | ExclusiveLock   | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677070 | 3193 | AccessShareLock | t
 relation      |    16384 |    17716 |      |       |               |
       |       |          |     5677395 | 3256 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677167 | 3208 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677742 | 3315 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677234 | 3233 | AccessShareLock | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677395 | 3256 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677742 | 3315 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677070 | 3193 | AccessShareLock | t
 transactionid |          |          |      |       |       5677395 |
       |       |          |     5677395 | 3256 | ExclusiveLock   | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677395 | 3256 | AccessShareLock | t
 relation      |    16384 |    16858 |      |       |               |
       |       |          |     5677234 | 3233 | AccessShareLock | t
 relation      |    16384 |    17755 |      |       |               |
       |       |          |     5677313 | 3243 | AccessShareLock | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677454 | 3263 | AccessShareLock | t
 relation      |    16384 |    17721 |      |       |               |
       |       |          |     5677454 | 3263 | AccessShareLock | t
 transactionid |          |          |      |       |       5677546 |
       |       |          |     5677546 | 3275 | ExclusiveLock   | t
 relation      |    16384 |    17313 |      |       |               |
       |       |          |     5677395 | 3256 | AccessShareLock | t
(50 rows)

Regards
Alex

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Alexander Farber
Date:
I wish I could see those hanging queries, what SQL do they try to execute:

pref=> select * from pg_stat_activity;
 datid | datname | procpid | usesysid | usename |        current_query
       | query_start |         backend_start         | client_addr |
client_port

-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
 16384 | pref    |    2681 |    16385 | pref    | <command string not
enabled> |             | 2010-10-04 10:22:53.051483+02 |             |
        -1
.........

I have usually just one Postgres connection from my game
(a perl script running as daemon). And then I have several
connections from phpBB. But when I restart httpd,
the spinning postmaster's don't disappear:

top - 11:48:11 up  1:28,  1 user,  load average: 9.85, 8.68, 6.25
Tasks: 126 total,  12 running, 114 sleeping,   0 stopped,   0 zombie
Cpu0  : 36.1%us,  0.5%sy,  0.0%ni, 63.0%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 37.1%us,  0.8%sy,  0.0%ni, 61.6%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu2  : 61.4%us,  1.3%sy,  0.0%ni, 35.7%id,  1.2%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu3  : 26.6%us,  0.6%sy,  0.0%ni, 72.1%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   4019028k total,  1428256k used,  2590772k free,    22324k buffers
Swap:  2104496k total,        0k used,  2104496k free,  1199036k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3263 postgres  16   0  122m  14m   9m R 116.0  0.4  16:16.83 postmaster
 3208 postgres  16   0  122m  14m   9m R 112.0  0.4  35:16.08 postmaster
 3275 postgres  16   0  121m  13m   9m R 110.6  0.4  11:34.32 postmaster
 3315 postgres  16   0  121m  13m   9m R 100.5  0.4   7:16.93 postmaster
 3193 postgres  16   0  122m  14m   9m R 96.4  0.4  38:44.78 postmaster
 3233 postgres  16   0  122m  14m   9m R 68.8  0.4  29:28.90 postmaster
 3243 postgres  16   0  122m  14m   9m R 53.3  0.4  25:13.96 postmaster
 3256 postgres  16   0  122m  14m   9m R 34.4  0.4  19:23.93 postmaster

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Craig Ringer
Date:
On 4/10/2010 5:20 PM, Alexander Farber wrote:

> # rpm -qa | grep -i postgres
> postgresql-8.1.21-1.el5_5.1

Mmm, elderly.

> On Friday I've installed a cronjob (full source code at the bottom):
>
> 3       3       *       *       *       psql -a -f
> $HOME/bin/clean-phpbb-forum.sql

Do you rate-control it or provide any sort of interlocking so that you
don't land up with multiple copies running at once?

See "man 2 flock" for a convenient way to do that reliably from the shell.

> My biggest problem is, that I don't see any information
> from Postgres at the production site - why did it have to start
> so many postmaster's (whatever those processes do).

http://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_have_so_many_processes.2C_even_when_idle.3F

As for what they're doing: see pg_stat_activity.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Alexander Farber
Date:
Thank you, I've enabled logging,
but couldn't see anything suspicious there.

Also the 60%-120% CPU-eating postmaster
processes are now suddenly gone.

I suspect this is some bug in the
postgresql-server-8.1.21-1.el5_5.1 delivered
with CenOS 5.5/64 bit, because I don't
have many simultaneous users ( < 100)
and not that many queries.

I'll try to upgrade to 8.4.4 as described at
http://www.pgrpms.org/ later. I'm a bit confused
though that they do not support "yum update" anymore
(I think that worked earlier with PGDG....)

Regards
Alex

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Robert Gravsjö
Date:

Alexander Farber skrev 2010-10-04 11.48:
> I wish I could see those hanging queries, what SQL do they try to execute:

To see the query you need to enable stats_command_string in your
postgresql.conf. See:
http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html

You might also benefit from chapter 24:
http://www.postgresql.org/docs/8.1/interactive/monitoring.html

Regards,
roppert

>
> pref=>  select * from pg_stat_activity;
>   datid | datname | procpid | usesysid | usename |        current_query
>         | query_start |         backend_start         | client_addr |
> client_port
>
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
>   16384 | pref    |    2681 |    16385 | pref    |<command string not
> enabled>  |             | 2010-10-04 10:22:53.051483+02 |             |
>          -1
> .........
>
> I have usually just one Postgres connection from my game
> (a perl script running as daemon). And then I have several
> connections from phpBB. But when I restart httpd,
> the spinning postmaster's don't disappear:
>
> top - 11:48:11 up  1:28,  1 user,  load average: 9.85, 8.68, 6.25
> Tasks: 126 total,  12 running, 114 sleeping,   0 stopped,   0 zombie
> Cpu0  : 36.1%us,  0.5%sy,  0.0%ni, 63.0%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  : 37.1%us,  0.8%sy,  0.0%ni, 61.6%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
> Cpu2  : 61.4%us,  1.3%sy,  0.0%ni, 35.7%id,  1.2%wa,  0.0%hi,  0.5%si,  0.0%st
> Cpu3  : 26.6%us,  0.6%sy,  0.0%ni, 72.1%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
> Mem:   4019028k total,  1428256k used,  2590772k free,    22324k buffers
> Swap:  2104496k total,        0k used,  2104496k free,  1199036k cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   3263 postgres  16   0  122m  14m   9m R 116.0  0.4  16:16.83 postmaster
>   3208 postgres  16   0  122m  14m   9m R 112.0  0.4  35:16.08 postmaster
>   3275 postgres  16   0  121m  13m   9m R 110.6  0.4  11:34.32 postmaster
>   3315 postgres  16   0  121m  13m   9m R 100.5  0.4   7:16.93 postmaster
>   3193 postgres  16   0  122m  14m   9m R 96.4  0.4  38:44.78 postmaster
>   3233 postgres  16   0  122m  14m   9m R 68.8  0.4  29:28.90 postmaster
>   3243 postgres  16   0  122m  14m   9m R 53.3  0.4  25:13.96 postmaster
>   3256 postgres  16   0  122m  14m   9m R 34.4  0.4  19:23.93 postmaster
>

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
Alexander Farber
Date:
I've upgraded my CentOS 5.5 VM to:

postgresql-libs-8.4.4-1PGDG.rhel5
postgresql-devel-8.4.4-1PGDG.rhel5
postgresql-8.4.4-1PGDG.rhel5
postgresql-contrib-8.4.4-1PGDG.rhel5
postgresql-docs-8.4.4-1PGDG.rhel5
postgresql-server-8.4.4-1PGDG.rhel5
postgresql-plperl-8.4.4-1PGDG.rhel5

and then have loaded my backup and
then run my cleanup script (source below)
and it would stuck:

pref=> \i bin/clean-phpbb-forum.sql
START TRANSACTION
DELETE 6
CREATE TABLE
CREATE TABLE
INSERT 0 26699
INSERT 0 129903
[... and here nothing happens for minutes....]

And in the
/var/lib/pgsql/data/pg_log/postgresql-Mon.log
see (probably because of the many inserts?)

LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  statement: start transaction;
LOG:  statement: delete from phpbb_users
        where user_type=1 and user_inactive_reason=1 and
user_lastvisit=0 and age(to_timestamp(user_regdate))>interval '3
days';
LOG:  statement: create temp table old_topics (topic_id integer) on
commit delete rows;
LOG:  statement: create temp table old_posts (post_id integer) on
commit delete rows;
LOG:  statement: insert into old_topics select topic_id from phpbb_topics
        where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
LOG:  statement: insert into old_posts select p.post_id from
phpbb_posts p, old_topics t
        where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
LOG:  statement: delete from phpbb_posts where post_id in (select
post_id from old_posts);
LOG:  checkpoints are occurring too frequently (29 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (29 seconds apart)

Then after some minutes waiting,
I've pressed ctrl-Z on the pgsql and then "fg"
and this has waken PostgreSQL up for some reason:

[1]+  Stopped                 psql
afarber@centos:~> fg
psql
DELETE 129903
DELETE 26699
UPDATE 1
UPDATE 1
UPDATE 1
UPDATE 1
COMMIT

Could it be that my script is somehow making
PostgreSQL hang? (or wait for some input?):

cat ~/bin/clean-phpbb-forum.sql
start transaction;

delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';

create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;

insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';

-- select count(topic_id) as "old topics:" from old_topics;

insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;

-- select count(post_id) as "old posts:" from old_posts;

delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);

update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';

update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';

update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;

update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;

commit

Thank you for any insights
Alex

PS: I haven't modified postgresql.conf
       or other conf files besides logging:

# diff postgresql.conf{.OLD,}
353c353
< #log_statement = 'none'                       # none, ddl, mod, all
---
> log_statement = 'all'                 # none, ddl, mod, all

# grep checkpoint postgresql.conf
#checkpoint_segments = 3                # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s               # 0 disables
#log_checkpoints = off

Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)

From
"Eric Comeau"
Date:
Specifically look into setting..... 

log_min_duration_statement (integer)

in the postgresql.conf file.

-----Original Message-----
From: Robert Gravsjö [mailto:tekniksupport@blogg.se] 
Sent: Monday, October 04, 2010 7:22 AM
To: Alexander Farber
Cc: pgsql-general@postgresql.org
Subject: Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)



Alexander Farber skrev 2010-10-04 11.48:
> I wish I could see those hanging queries, what SQL do they try to execute:

To see the query you need to enable stats_command_string in your 
postgresql.conf. See: 
http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html

You might also benefit from chapter 24:
http://www.postgresql.org/docs/8.1/interactive/monitoring.html

Regards,
roppert

>
> pref=>  select * from pg_stat_activity;
>   datid | datname | procpid | usesysid | usename |        current_query
>         | query_start |         backend_start         | client_addr |
> client_port
>
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
>   16384 | pref    |    2681 |    16385 | pref    |<command string not
> enabled>  |             | 2010-10-04 10:22:53.051483+02 |             |
>          -1
> .........
>
> I have usually just one Postgres connection from my game
> (a perl script running as daemon). And then I have several
> connections from phpBB. But when I restart httpd,
> the spinning postmaster's don't disappear:
>
> top - 11:48:11 up  1:28,  1 user,  load average: 9.85, 8.68, 6.25
> Tasks: 126 total,  12 running, 114 sleeping,   0 stopped,   0 zombie
> Cpu0  : 36.1%us,  0.5%sy,  0.0%ni, 63.0%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  : 37.1%us,  0.8%sy,  0.0%ni, 61.6%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
> Cpu2  : 61.4%us,  1.3%sy,  0.0%ni, 35.7%id,  1.2%wa,  0.0%hi,  0.5%si,  0.0%st
> Cpu3  : 26.6%us,  0.6%sy,  0.0%ni, 72.1%id,  0.1%wa,  0.0%hi,  0.5%si,  0.0%st
> Mem:   4019028k total,  1428256k used,  2590772k free,    22324k buffers
> Swap:  2104496k total,        0k used,  2104496k free,  1199036k cached
>
>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   3263 postgres  16   0  122m  14m   9m R 116.0  0.4  16:16.83 postmaster
>   3208 postgres  16   0  122m  14m   9m R 112.0  0.4  35:16.08 postmaster
>   3275 postgres  16   0  121m  13m   9m R 110.6  0.4  11:34.32 postmaster
>   3315 postgres  16   0  121m  13m   9m R 100.5  0.4   7:16.93 postmaster
>   3193 postgres  16   0  122m  14m   9m R 96.4  0.4  38:44.78 postmaster
>   3233 postgres  16   0  122m  14m   9m R 68.8  0.4  29:28.90 postmaster
>   3243 postgres  16   0  122m  14m   9m R 53.3  0.4  25:13.96 postmaster
>   3256 postgres  16   0  122m  14m   9m R 34.4  0.4  19:23.93 postmaster
>