Thread: Postmaster can't stop with pg_ctl

Postmaster can't stop with pg_ctl

From
takuya koide
Date:
============================================================================
                        POSTGRESQL BUG REPORT
============================================================================


Your name        : Takuya Koide
Your email address    : koide-txa (at) necst (dot) nec (dot) co (dot) jp

Category        : runtime: back-end:
Severity        : serious

Summary: Postmaster can't stop with pg_ctl

System Configuration
--------------------
  Operating System   : Red Hat Enterprise Linux ES release 4 (Nahant Update 3)

  PostgreSQL version : PostgreSQL 8.2.4 on i686-redhat-linux-gnu,
                       compiled by GCC gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)

  notice: I use following RPM packages.

            $ rpm -qa|grep -i postgresql
            postgresql-server-8.2.4-1PGDG
            postgresql-plperl-8.2.4-1PGDG
            postgresql-8.2.4-1PGDG
            postgresql-contrib-8.2.4-1PGDG
            postgresql-docs-8.2.4-1PGDG
            postgresql-plpython-8.2.4-1PGDG
            postgresql-test-8.2.4-1PGDG
            postgresql-libs-8.2.4-1PGDG
            postgresql-devel-8.2.4-1PGDG
            postgresql-pltcl-8.2.4-1PGDG

  Compiler used      : gcc

Hardware:
---------
  x86

Versions of other tools:
------------------------


--------------------------------------------------------------------------

Problem Description:
--------------------
  I found that pg_ctl can't stop postmaster processes under some conditions.

  If PostgreSQL's process is abnormal condition (stall), I would like to
  stop PostgreSQL's process (and restart) with /etc/rc.d/init.d/postgresql
  But I couldn't stop its process.



--------------------------------------------------------------------------

Test Case (reproduce procedures):
---------------------------------
  I can reproduce with following steps.

  1) confirm current status.
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3507 0.1 1.0 21352 2800 ? S 18:48 0:00 /usr/bin/postmaster
  -p 5432 -D /var/lib/pgsql/data
  postgres 3509 0.0 0.2 11132  568 ? S 18:48 0:00 postgres: logger process
  postgres 3514 0.0 0.3 21352  844 ? S 18:48 0:00 postgres: writer process
  postgres 3515 0.0 0.2 12132  564 ? S 18:48 0:00 postgres: stats
  buffer process
  postgres 3516 0.0 0.2 11364  748 ? S 18:48 0:00 postgres: stats
  collector process

  2) connect with psql command by postgres user
  $ id
  uid=26(postgres) gid=26(postgres) group=26(postgres)
  context=user_u:system_r:unconfined_t
  -bash-3.1$ psql template1
  template1=#

  3) re-confirm status
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3507 0.0 1.1 21352  2804 ?        S    18:48 0:00
  /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
  postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
  postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
  postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
  postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
  postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.8 22012 2124 ?     S  18:54 0:00 postgres:
  postgres template1 [local] idle

  4) send 'SIGSTOP' signal to postgres
  # kill -SIGSTOP 3619
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3507 0.0 1.1 21352 2804 ?        S    18:48   0:00
  /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
  postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
  postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
  postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
  postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
  postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.8 22012 2124 ?     T  18:54 0:00 postgres:
  postgres template1 [local] idle

  5) try to stop PostgreSQL with normal method
  # /etc/rc.d/init.d/postgresql stop
  postgresql stopping service: [fail]

  6) confirm status and confirm that PostgreSQL is not stop. (this is
  problem)
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3507 0.0 1.1 21352 2816 ?     S 18:48   0:00
  /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
  postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
  postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
  postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
  postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
  postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.8 22012 2124 ?     T  18:54 0:00 postgres:
  postgres template1 [local] idle

  7) try to stop PostgreSQL with SIGINT signal.
  # kill -SIGINT 3507

  8) confirm status and confirm that PostgreSQL is not stop. (this is
  problem,too.)
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3507 0.0 1.1 21352 2816 ?     S  18:48 0:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
  postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
  postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
  postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
  postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
  postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.8 22012 2124 ?     T  18:54 0:00 postgres:
  postgres template1 [local] idle

  9) try to stop PostgreSQL with SIGKILL
  # kill -SIGKILL 3507
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3509 0.0 0.2 11132  564 ?     S  18:48 0:00 postgres: logger process
  postgres 3618 0.0 0.5  8476 1520 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.7 22012 1976 ?     T  18:54 0:00 postgres:
  postgres template1 [local] idle
  # kill -SIGKILL 3509
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3618 0.0 0.5  8476 1520 pts/3 S+ 18:54 0:00 psql template1
  postgres 3619 0.0 0.7 22012 1976 ?     T  18:54 0:00 postgres:
  postgres template1 [local] idle
  # kill -SIGKILL 3619
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  postgres 3618 0.0 0.5 8476 1520 pts/3 S+ 18:54 0:00 psql template1
  # kill -SIGKILL 3618
  # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
  #

--------------------------------------------------------------------------

Solution:
---------
  I suggest the method to resolve this issue.
  If you think that this idea is good, please use it.

  [current status]
  a part of /etc/rc.d/init.d/postgresql (rc scripts of postgresql)
  -------------------------------------------------------------------
   stop(){
     echo -n $"Stopping ${NAME} service: "
      $SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null
      ret=$?
  -------------------------------------------------------------------

  In postgresql processes is stalled

  1. perform '/etc/rc.d/init.d/postgresql stop'
  2. So pg_ctl of line3 is running and return error code (1)
    (Please refer to following)

    $ pg_ctl stop -m fast
    waiting for postmaster to shut down........ failed
    pg_ctl: postmaster does not shut down
    $ echo $?
    1

  So when pg_ctl is fail, add performed script to rc script of postgresql.
  (following 'add script')


  /etc/rc.d/init.d/postgresql (rc scripts of postgresql)
  -------------------------------------------------------------------
   stop(){
     echo -n $"Stopping ${NAME} service: "
      ...snip...
      $SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null
      ret=$?

      # when pg_ctl fails, perform following steps
      if [ ret value is 1 ]

        # try to stop postgresql with pg_ctl
        until $SU -l postgres -c "$PGENGINE/pg_ctl stop -D '$PGDATA' -s -m fast" > /dev/null 2>&1 < /dev/null; do

          # if pg_ctl needs to times for stopping postgresql
          sleep (user's designated time)

          # if pg_ctl can not stop postgresql after repeat a few times
          if [ loop's times equal user's designated time. ]

            # give up using pg_ctl
            exit loop
          fi
        done

        # forced terminate postgresql
        if [ user hope forced terminate postgresql ]
            1. send SIGCONT signal to suspended processes of postgresql.
            2. if postgresql can't stop, send SIGKILL signal to processes of postgresql.
            3. release shared memory used by postgresql.
               (if linux, can use ipcclean)
        fi
      fi
  -------------------------------------------------------------------


--------------------------------------------------------------------------


---
Takuya Koide
NEC System Technologies, Ltd.

Re: Postmaster can't stop with pg_ctl

From
Alvaro Herrera
Date:
takuya koide wrote:

>   4) send 'SIGSTOP' signal to postgres
>   # kill -SIGSTOP 3619
>   # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
>   postgres 3507 0.0 1.1 21352 2804 ?        S    18:48   0:00
>   /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
>   postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
>   postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
>   postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
>   postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
>   postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
>   postgres 3619 0.0 0.8 22012 2124 ?     T  18:54 0:00 postgres:
>   postgres template1 [local] idle

If you "stop" a process by SIGSTOP you must make it run again with
SIGCONT.  Otherwise it's just not processing signals, so it'll obviously
not shut down.  I don't think this is a bug.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Postmaster can't stop with pg_ctl

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> takuya koide wrote:
>> 4) send 'SIGSTOP' signal to postgres

> If you "stop" a process by SIGSTOP you must make it run again with
> SIGCONT.  Otherwise it's just not processing signals, so it'll obviously
> not shut down.  I don't think this is a bug.

SIGSTOP is a debugging tool, which would be rendered nigh useless if the
postmaster tried to override it automatically.  So definitely NOTABUG
in my opinion too.

            regards, tom lane

Re: Postmaster can't stop with pg_ctl

From
takuya koide
Date:
Thank you for your reply.

On Wed, 25 Apr 2007 09:44:47 -0400
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> takuya koide wrote:
>
> >   4) send 'SIGSTOP' signal to postgres
> >   # kill -SIGSTOP 3619
> >   # ps axuw|grep -i postgres|grep -Ev 'grep|bash|su -'
> >   postgres 3507 0.0 1.1 21352 2804 ?        S    18:48   0:00
> >   /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
> >   postgres 3509 0.0 0.2 11132  568 ?     S  18:48 0:00 postgres: logger process
> >   postgres 3514 0.0 0.3 21352  852 ?     S  18:48 0:00 postgres: writer process
> >   postgres 3515 0.0 0.2 12132  564 ?     S  18:48 0:00 postgres: stats buffer process
> >   postgres 3516 0.0 0.3 11364  772 ?     S  18:48 0:00 postgres: stats collector process
> >   postgres 3618 0.0 0.6  8476 1752 pts/3 S+ 18:54 0:00 psql template1
> >   postgres 3619 0.0 0.8 22012 2124 ?     T  18:54 0:00 postgres:
> >   postgres template1 [local] idle
>
> If you "stop" a process by SIGSTOP you must make it run again with
> SIGCONT.  Otherwise it's just not processing signals, so it'll obviously
> not shut down.  I don't think this is a bug.

I am sorry lack of my talk about SIGSTOP.

[assumed premise]
  I have performed PostgreSQL with third-party cluster system and have evaluated
  that if PostgreSQL's status become unusual, cluster system can stop PostgreSQL.
  So I have used SIGSTOP to create environment like this case.

[result]
  This cluster system picked up stalled postgres process and tried to stop
  PostgreSQL with /etc/rc.d/init.d/postgresql. but pg_ctl colud not stop it.
  (I confirmed that cluster system perform /etc/rc.d/init.d/postgresql)

[expectation]
  I expect that pg_ctl can stop PostgreSQL even if postgres processes
  or postgres process stalled.


Thank you.
Best Regards
---
Takuya Koide
NEC System Technologies, Ltd.

Re: Postmaster can't stop with pg_ctl

From
takuya koide
Date:
Thank you for your reply.

On Wed, 25 Apr 2007 10:22:25 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > takuya koide wrote:
> >> 4) send 'SIGSTOP' signal to postgres
>
> > If you "stop" a process by SIGSTOP you must make it run again with
> > SIGCONT.  Otherwise it's just not processing signals, so it'll obviously
> > not shut down.  I don't think this is a bug.
>
> SIGSTOP is a debugging tool, which would be rendered nigh useless if the
> postmaster tried to override it automatically.  So definitely NOTABUG
> in my opinion too.

For end-user, Debugging is not important and daily working, too.
So for developper, if PostgreSQL have a debug option, it seems that it is no problem.
(When it is used debug option, it not shutdown as like current working)


Thank you.
Best Regards
---
Takuya Koide
NEC System Technologies, Ltd.