Thread: Database server crash ! URGENT !

Database server crash ! URGENT !

From
"Sunit Bhatia"
Date:
Hi all,

I'm running Postgresql 7.1.1 on Solaris and mostly uses JDBC Driver to
connect to the Database.

Recently I'm seeing very weird behaviour, Database Server automatically
shuts down and when I restart it, it again shuts down as soon as I start it
or sometimes it shuts down as soon as I try to establish the connection. I t
happens 3-4 times and after that it again becomes stable and would run for
days without any problems. I've seen this in 2-3 production systems.

Here is how I'm starting the server.
'postmaster -B 5- -N 25 -D/usr/local/pgsql/data -i'

Here is what I see on the console on two different instances

First Instance:
Sun Microsystems Inc.    SunOS 5.7    Generic    October 1998
stty: : No such device or address
DEBUG:  database system was shut down at 2000-10-17 21:04:30 PDT
DEBUG:  CheckPoint record at (0, 34075216)
DEBUG:  Redo record at (0, 34075216); Undo record at (0, 0); Shutdown TRUE
DEBUG:  NextTransactionId: 21473; NextOid: 68943
DEBUG:  database system is in production state
Fast Shutdown request at Tue Oct 17 21:40:23 2000
DEBUG:  shutting down
DEBUG:  database system is shut down
******************************************


Second Instance:
Sun Microsystems Inc.    SunOS 5.7    Generic    October 1998
stty: : No such device or address
DEBUG:  database system was interrupted at 2000-10-17 05:07:20 PDT
DEBUG:  CheckPoint record at (0, 33362784)
DEBUG:  Redo record at (0, 33362784); Undo record at (0, 0); Shutdown TRUE
DEBUG:  NextTransactionId: 21144; NextOid: 60751
DEBUG:  database system was not properly shut down; automatic recovery in
progress...
DEBUG:  redo starts at (0, 33362848)
DEBUG:  ReadRecord: record with zero len at (0, 34075152)
DEBUG:  redo done at (0, 34075112)
DEBUG:  database system is in production state
pq_recvbuf: unexpected EOF on client connection
pq_recvbuf: unexpected EOF on client connection
/usr/local/pgsql/bin/postmaster: dumpstatus:
/usr/local/pgsql/bin/postmaster: dumpstatus:
Fast Shutdown request at Tue Oct 17 21:04:28 2000
Aborting any active transaction...
FATAL 1:  This connection has been terminated by the administrator.
FATAL 1:  This connection has been terminated by the administrator.
DEBUG:  shutting down
DEBUG:  MoveOfflineLogs: remove 0000000000000001
DEBUG:  database system is shut down
********************************************


I tried following precautions/solutions:

* I tried to make sure that I properly shut down the database server, but it
happens even if the last shutdown was 'Fast Shutdown'.
* I issue 'VACCUM' command, every time my application (which is built on top
of JDBC) starts. and I do it automatically once in 24 hrs.
But I see this problem on those systems too, where I do auto VACUUM.


Can somebody suggest how to debug this problem or if somebody has
experienced this before !!

Also please tell what is the meaning of 'Redo Record at..' messages printed
on the console.

thanks
Sunit













_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: Database server crash ! URGENT !

From
Justin Clift
Date:
Hi Sunit,

PostgreSQL 7.1.3 has a few bugfixes which make it a better choice than
7.1.1, if you can upgrade it's a good move.

Also, the "-B 5-" looks interesting.  I'm not sure what the - on the end
of the 5 will do, it probably won't help though.

Apart from this, I'll defer to the guys with more experience debugging
these kinds of problems.

Regards and best wishes,

Justin Clift


Sunit Bhatia wrote:
>
> Hi all,
>
> I'm running Postgresql 7.1.1 on Solaris and mostly uses JDBC Driver to
> connect to the Database.
>
> Recently I'm seeing very weird behaviour, Database Server automatically
> shuts down and when I restart it, it again shuts down as soon as I start it
> or sometimes it shuts down as soon as I try to establish the connection. I t
> happens 3-4 times and after that it again becomes stable and would run for
> days without any problems. I've seen this in 2-3 production systems.
>
> Here is how I'm starting the server.
> 'postmaster -B 5- -N 25 -D/usr/local/pgsql/data -i'
>
> Here is what I see on the console on two different instances
>
> First Instance:
> Sun Microsystems Inc.   SunOS 5.7       Generic October 1998
> stty: : No such device or address
> DEBUG:  database system was shut down at 2000-10-17 21:04:30 PDT
> DEBUG:  CheckPoint record at (0, 34075216)
> DEBUG:  Redo record at (0, 34075216); Undo record at (0, 0); Shutdown TRUE
> DEBUG:  NextTransactionId: 21473; NextOid: 68943
> DEBUG:  database system is in production state
> Fast Shutdown request at Tue Oct 17 21:40:23 2000
> DEBUG:  shutting down
> DEBUG:  database system is shut down
> ******************************************
>
> Second Instance:
> Sun Microsystems Inc.   SunOS 5.7       Generic October 1998
> stty: : No such device or address
> DEBUG:  database system was interrupted at 2000-10-17 05:07:20 PDT
> DEBUG:  CheckPoint record at (0, 33362784)
> DEBUG:  Redo record at (0, 33362784); Undo record at (0, 0); Shutdown TRUE
> DEBUG:  NextTransactionId: 21144; NextOid: 60751
> DEBUG:  database system was not properly shut down; automatic recovery in
> progress...
> DEBUG:  redo starts at (0, 33362848)
> DEBUG:  ReadRecord: record with zero len at (0, 34075152)
> DEBUG:  redo done at (0, 34075112)
> DEBUG:  database system is in production state
> pq_recvbuf: unexpected EOF on client connection
> pq_recvbuf: unexpected EOF on client connection
> /usr/local/pgsql/bin/postmaster: dumpstatus:
> /usr/local/pgsql/bin/postmaster: dumpstatus:
> Fast Shutdown request at Tue Oct 17 21:04:28 2000
> Aborting any active transaction...
> FATAL 1:  This connection has been terminated by the administrator.
> FATAL 1:  This connection has been terminated by the administrator.
> DEBUG:  shutting down
> DEBUG:  MoveOfflineLogs: remove 0000000000000001
> DEBUG:  database system is shut down
> ********************************************
>
> I tried following precautions/solutions:
>
> * I tried to make sure that I properly shut down the database server, but it
> happens even if the last shutdown was 'Fast Shutdown'.
> * I issue 'VACCUM' command, every time my application (which is built on top
> of JDBC) starts. and I do it automatically once in 24 hrs.
> But I see this problem on those systems too, where I do auto VACUUM.
>
> Can somebody suggest how to debug this problem or if somebody has
> experienced this before !!
>
> Also please tell what is the meaning of 'Redo Record at..' messages printed
> on the console.
>
> thanks
> Sunit
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
   - Indira Gandhi

Re: Database server crash ! URGENT !

From
Tom Lane
Date:
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes:
> Fast Shutdown request at Tue Oct 17 21:40:23 2000

Something is sending SIGINT signals to your postmaster.

You might need to start the postmaster using nohup,
and/or be more careful to detach it from the foreground
shell.  (For example, on HPUX it's a real good idea to
explicitly redirect the postmaster's stdin, stdout, *and*
stderr away from the terminal, else it won't be detached
from the foreground process group and will still receive
signals from the terminal.)

            regards, tom lane

Re: Database server crash ! URGENT !

From
"Sunit Bhatia"
Date:
Let me make some corrections to my previous mail.
I start the PostgreSQL Server as 'root' like  this:

su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data -i" >>
$LOGFILE 2>&1 &

So I'm staring it in the background and I'm redirecting the stdout and
stderr away from the terminal. Still I don't understand why it receives Fast
Shutdown Request or (SIGINT signal).

Any body has any ideas ??

I also don't see the advantage of using 'nohup' over what I'm doing, Since
Starting it using 'nohup' will only stop it from receiving SIGHUP signal.


thanks for your ideas...
Sunit




>
>"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes:
> > Fast Shutdown request at Tue Oct 17 21:40:23 2000
>
>Something is sending SIGINT signals to your postmaster.
>
>You might need to start the postmaster using nohup,
>and/or be more careful to detach it from the foreground
>shell.  (For example, on HPUX it's a real good idea to
>explicitly redirect the postmaster's stdin, stdout, *and*
>stderr away from the terminal, else it won't be detached
>from the foreground process group and will still receive
>signals from the terminal.)
>
>            regards, tom lane


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: Database server crash ! URGENT !

From
Doug McNaught
Date:
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes:

> Let me make some corrections to my previous mail.
> I start the PostgreSQL Server as 'root' like  this:
>
> su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data -i" >>
> $LOGFILE 2>&1 &
>
>
> So I'm staring it in the background and I'm redirecting the stdout
> and stderr away from the terminal. Still I don't understand why it
> receives Fast Shutdown Request or (SIGINT signal).

You might need to also run it with 'nohup'.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

7.2 pg_hba.conf load on SIGHUP?

From
"Command Prompt, Inc."
Date:
Hello,

Question. In 7.2 the pg_hba.conf file will only be read on a
SIGHUP (kill -HUP). What are the external factors to this?
Will a HUP cause the child postgres processes to restart as well?

What if a transaction is occuring at the time of the HUP?


J

--
--
by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/


Re: 7.2 pg_hba.conf load on SIGHUP?

From
Bruce Momjian
Date:
> Hello,
>
> Question. In 7.2 the pg_hba.conf file will only be read on a
> SIGHUP (kill -HUP). What are the external factors to this?
> Will a HUP cause the child postgres processes to restart as well?

No, SIGHUP to postmaster.  It is caught and reloads config files,
similar to other Internet daemons.

> What if a transaction is occuring at the time of the HUP?

Again, no effect on children of postmaster.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: 7.2 pg_hba.conf load on SIGHUP?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Will a HUP cause the child postgres processes to restart as well?
>> What if a transaction is occuring at the time of the HUP?

> Again, no effect on children of postmaster.

Not so.  The postmaster responds to the signal as soon as it's idle,
rereads the conf file itself, and rebroadcasts SIGHUP to all its
children.  The children then reread the conf file immediately after they
next receive a query from their clients.  See postmaster/postmaster.c
and tcop/postgres.c.

A lot of the configuration file entries are not allowed to change in a
running backend, so the children will ignore attempted changes in those
entries.  But for entries that can be changed on the fly, the response
is reasonably prompt across the board.

            regards, tom lane

Re: 7.2 pg_hba.conf load on SIGHUP?

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Will a HUP cause the child postgres processes to restart as well?
> >> What if a transaction is occuring at the time of the HUP?
>
> > Again, no effect on children of postmaster.
>
> Not so.  The postmaster responds to the signal as soon as it's idle,
> rereads the conf file itself, and rebroadcasts SIGHUP to all its
> children.  The children then reread the conf file immediately after they
> next receive a query from their clients.  See postmaster/postmaster.c
> and tcop/postgres.c.
>
> A lot of the configuration file entries are not allowed to change in a
> running backend, so the children will ignore attempted changes in those
> entries.  But for entries that can be changed on the fly, the response
> is reasonably prompt across the board.

Yes, true.  The specific question was about pg_hba.conf, which only
affects the postmaster.  postgresql.conf is a file read by postmastger
and backends, and that is re-read by children on sighup.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Database server crash ! URGENT !

From
"Sunit Bhatia"
Date:
I've narrowed down the problem. Here is what I did:

As suggested by everybody, I started the server like this:

su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data
-i" </dev/null >>$LOGFILE 2>&1  </dev/null &

After this the process is started on the background. Now If I type
'Control-C' on the terminal, it generates 'Fast Shutdown' request to the
Server Process, and server is shutdown.

This happens if the database user 'pgsql' is created with BOURNE SHELL as
default.

I tried the same on other system, where the 'pgsql' user was created with
tcsh shell as default, and it DOES NOT send the Fast Shutdown request to the
server on typing 'Control-C' on the terminal.

Can anybody tell me how I can avoid this from happening in the Bourne Shell
??  Detaching the 'stdin' from the terminal doesn't seem to be helping here
!!

thanks
Sunit





>"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes:
>
> > Let me make some corrections to my previous mail.
> > I start the PostgreSQL Server as 'root' like  this:
> >
> > su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data -i"
> >>
> > $LOGFILE 2>&1 &
> >
> >
> > So I'm staring it in the background and I'm redirecting the stdout
> > and stderr away from the terminal. Still I don't understand why it
> > receives Fast Shutdown Request or (SIGINT signal).
>
>You might need to also run it with 'nohup'.
>
>-Doug
>--
>Let us cross over the river, and rest under the shade of the trees.
>    --T. J. Jackson, 1863
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo@postgresql.org so that your
>message can get through to the mailing list cleanly


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: Database server crash ! URGENT !

From
Tom Lane
Date:
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes:
> I've narrowed down the problem. Here is what I did:
> As suggested by everybody, I started the server like this:

> su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data
> -i" </dev/null >>$LOGFILE 2>&1  </dev/null &

Not sure, but maybe the correct spelling is

su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data
-i </dev/null >>$LOGFILE 2>&1" &

As is, you're redirecting the stdin etc of su, not of the
eventually-launched shell that execs the postmaster.  I don't know if
that's the problem, but it seems unlikely to be a good idea.

            regards, tom lane

Re: Database server crash ! URGENT !

From
"Sunit Bhatia"
Date:
I investigated it further, and figured out that BOURNE SHELL is a non-job
control shell, whereas C Shell is a Job Control Shell.
So in Bourne shell, even though the process is running as a background job,
its Process Group ID is same as that of the Terminal ID.
So any signal generated by the Terminal is sent to all the child processes,
which have the same Process Group ID.
That's why SIGINT signal generated by Control-C is sent to database server
running in the background and server SHUTS DOWN !!

This is not the case in case of background job running in C Shell.

Now my question is HOW DO I DISSOCIATE MY SERVER PROCESS FROM THE TERMINAL
??

Any Ideas or suggestions ? Please share !

btw, I separated out 'su' and 'postmaster', but this problem is still there,
as explained above.

thanks
Sunit




> > I've narrowed down the problem. Here is what I did:
> > As suggested by everybody, I started the server like this:
>
> > su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25
>-D$DB_ROOT/data
> > -i" </dev/null >>$LOGFILE 2>&1  </dev/null &
>
>Not sure, but maybe the correct spelling is
>
>su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25  -D$DB_ROOT/data
>-i </dev/null >>$LOGFILE 2>&1" &
>
>As is, you're redirecting the stdin etc of su, not of the
>eventually-launched shell that execs the postmaster.  I don't know if
>that's the problem, but it seems unlikely to be a good idea.
>
>            regards, tom lane


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: Database server crash ! URGENT !

From
speedboy
Date:
> Now my question is HOW DO I DISSOCIATE MY SERVER PROCESS FROM THE TERMINAL
> ??

Put:

#!/bin/bash

at the top of the script.