Thread: Database server crash ! URGENT !
Hi all, I'm running Postgresql 7.1.1 on Solaris and mostly uses JDBC Driver to connect to the Database. Recently I'm seeing very weird behaviour, Database Server automatically shuts down and when I restart it, it again shuts down as soon as I start it or sometimes it shuts down as soon as I try to establish the connection. I t happens 3-4 times and after that it again becomes stable and would run for days without any problems. I've seen this in 2-3 production systems. Here is how I'm starting the server. 'postmaster -B 5- -N 25 -D/usr/local/pgsql/data -i' Here is what I see on the console on two different instances First Instance: Sun Microsystems Inc. SunOS 5.7 Generic October 1998 stty: : No such device or address DEBUG: database system was shut down at 2000-10-17 21:04:30 PDT DEBUG: CheckPoint record at (0, 34075216) DEBUG: Redo record at (0, 34075216); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 21473; NextOid: 68943 DEBUG: database system is in production state Fast Shutdown request at Tue Oct 17 21:40:23 2000 DEBUG: shutting down DEBUG: database system is shut down ****************************************** Second Instance: Sun Microsystems Inc. SunOS 5.7 Generic October 1998 stty: : No such device or address DEBUG: database system was interrupted at 2000-10-17 05:07:20 PDT DEBUG: CheckPoint record at (0, 33362784) DEBUG: Redo record at (0, 33362784); Undo record at (0, 0); Shutdown TRUE DEBUG: NextTransactionId: 21144; NextOid: 60751 DEBUG: database system was not properly shut down; automatic recovery in progress... DEBUG: redo starts at (0, 33362848) DEBUG: ReadRecord: record with zero len at (0, 34075152) DEBUG: redo done at (0, 34075112) DEBUG: database system is in production state pq_recvbuf: unexpected EOF on client connection pq_recvbuf: unexpected EOF on client connection /usr/local/pgsql/bin/postmaster: dumpstatus: /usr/local/pgsql/bin/postmaster: dumpstatus: Fast Shutdown request at Tue Oct 17 21:04:28 2000 Aborting any active transaction... FATAL 1: This connection has been terminated by the administrator. FATAL 1: This connection has been terminated by the administrator. DEBUG: shutting down DEBUG: MoveOfflineLogs: remove 0000000000000001 DEBUG: database system is shut down ******************************************** I tried following precautions/solutions: * I tried to make sure that I properly shut down the database server, but it happens even if the last shutdown was 'Fast Shutdown'. * I issue 'VACCUM' command, every time my application (which is built on top of JDBC) starts. and I do it automatically once in 24 hrs. But I see this problem on those systems too, where I do auto VACUUM. Can somebody suggest how to debug this problem or if somebody has experienced this before !! Also please tell what is the meaning of 'Redo Record at..' messages printed on the console. thanks Sunit _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
Hi Sunit, PostgreSQL 7.1.3 has a few bugfixes which make it a better choice than 7.1.1, if you can upgrade it's a good move. Also, the "-B 5-" looks interesting. I'm not sure what the - on the end of the 5 will do, it probably won't help though. Apart from this, I'll defer to the guys with more experience debugging these kinds of problems. Regards and best wishes, Justin Clift Sunit Bhatia wrote: > > Hi all, > > I'm running Postgresql 7.1.1 on Solaris and mostly uses JDBC Driver to > connect to the Database. > > Recently I'm seeing very weird behaviour, Database Server automatically > shuts down and when I restart it, it again shuts down as soon as I start it > or sometimes it shuts down as soon as I try to establish the connection. I t > happens 3-4 times and after that it again becomes stable and would run for > days without any problems. I've seen this in 2-3 production systems. > > Here is how I'm starting the server. > 'postmaster -B 5- -N 25 -D/usr/local/pgsql/data -i' > > Here is what I see on the console on two different instances > > First Instance: > Sun Microsystems Inc. SunOS 5.7 Generic October 1998 > stty: : No such device or address > DEBUG: database system was shut down at 2000-10-17 21:04:30 PDT > DEBUG: CheckPoint record at (0, 34075216) > DEBUG: Redo record at (0, 34075216); Undo record at (0, 0); Shutdown TRUE > DEBUG: NextTransactionId: 21473; NextOid: 68943 > DEBUG: database system is in production state > Fast Shutdown request at Tue Oct 17 21:40:23 2000 > DEBUG: shutting down > DEBUG: database system is shut down > ****************************************** > > Second Instance: > Sun Microsystems Inc. SunOS 5.7 Generic October 1998 > stty: : No such device or address > DEBUG: database system was interrupted at 2000-10-17 05:07:20 PDT > DEBUG: CheckPoint record at (0, 33362784) > DEBUG: Redo record at (0, 33362784); Undo record at (0, 0); Shutdown TRUE > DEBUG: NextTransactionId: 21144; NextOid: 60751 > DEBUG: database system was not properly shut down; automatic recovery in > progress... > DEBUG: redo starts at (0, 33362848) > DEBUG: ReadRecord: record with zero len at (0, 34075152) > DEBUG: redo done at (0, 34075112) > DEBUG: database system is in production state > pq_recvbuf: unexpected EOF on client connection > pq_recvbuf: unexpected EOF on client connection > /usr/local/pgsql/bin/postmaster: dumpstatus: > /usr/local/pgsql/bin/postmaster: dumpstatus: > Fast Shutdown request at Tue Oct 17 21:04:28 2000 > Aborting any active transaction... > FATAL 1: This connection has been terminated by the administrator. > FATAL 1: This connection has been terminated by the administrator. > DEBUG: shutting down > DEBUG: MoveOfflineLogs: remove 0000000000000001 > DEBUG: database system is shut down > ******************************************** > > I tried following precautions/solutions: > > * I tried to make sure that I properly shut down the database server, but it > happens even if the last shutdown was 'Fast Shutdown'. > * I issue 'VACCUM' command, every time my application (which is built on top > of JDBC) starts. and I do it automatically once in 24 hrs. > But I see this problem on those systems too, where I do auto VACUUM. > > Can somebody suggest how to debug this problem or if somebody has > experienced this before !! > > Also please tell what is the meaning of 'Redo Record at..' messages printed > on the console. > > thanks > Sunit > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes: > Fast Shutdown request at Tue Oct 17 21:40:23 2000 Something is sending SIGINT signals to your postmaster. You might need to start the postmaster using nohup, and/or be more careful to detach it from the foreground shell. (For example, on HPUX it's a real good idea to explicitly redirect the postmaster's stdin, stdout, *and* stderr away from the terminal, else it won't be detached from the foreground process group and will still receive signals from the terminal.) regards, tom lane
Let me make some corrections to my previous mail. I start the PostgreSQL Server as 'root' like this: su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data -i" >> $LOGFILE 2>&1 & So I'm staring it in the background and I'm redirecting the stdout and stderr away from the terminal. Still I don't understand why it receives Fast Shutdown Request or (SIGINT signal). Any body has any ideas ?? I also don't see the advantage of using 'nohup' over what I'm doing, Since Starting it using 'nohup' will only stop it from receiving SIGHUP signal. thanks for your ideas... Sunit > >"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes: > > Fast Shutdown request at Tue Oct 17 21:40:23 2000 > >Something is sending SIGINT signals to your postmaster. > >You might need to start the postmaster using nohup, >and/or be more careful to detach it from the foreground >shell. (For example, on HPUX it's a real good idea to >explicitly redirect the postmaster's stdin, stdout, *and* >stderr away from the terminal, else it won't be detached >from the foreground process group and will still receive >signals from the terminal.) > > regards, tom lane _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes: > Let me make some corrections to my previous mail. > I start the PostgreSQL Server as 'root' like this: > > su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data -i" >> > $LOGFILE 2>&1 & > > > So I'm staring it in the background and I'm redirecting the stdout > and stderr away from the terminal. Still I don't understand why it > receives Fast Shutdown Request or (SIGINT signal). You might need to also run it with 'nohup'. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Hello, Question. In 7.2 the pg_hba.conf file will only be read on a SIGHUP (kill -HUP). What are the external factors to this? Will a HUP cause the child postgres processes to restart as well? What if a transaction is occuring at the time of the HUP? J -- -- by way of pgsql-general@commandprompt.com http://www.postgresql.info/ http://www.commandprompt.com/
> Hello, > > Question. In 7.2 the pg_hba.conf file will only be read on a > SIGHUP (kill -HUP). What are the external factors to this? > Will a HUP cause the child postgres processes to restart as well? No, SIGHUP to postmaster. It is caught and reloads config files, similar to other Internet daemons. > What if a transaction is occuring at the time of the HUP? Again, no effect on children of postmaster. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Will a HUP cause the child postgres processes to restart as well? >> What if a transaction is occuring at the time of the HUP? > Again, no effect on children of postmaster. Not so. The postmaster responds to the signal as soon as it's idle, rereads the conf file itself, and rebroadcasts SIGHUP to all its children. The children then reread the conf file immediately after they next receive a query from their clients. See postmaster/postmaster.c and tcop/postgres.c. A lot of the configuration file entries are not allowed to change in a running backend, so the children will ignore attempted changes in those entries. But for entries that can be changed on the fly, the response is reasonably prompt across the board. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Will a HUP cause the child postgres processes to restart as well? > >> What if a transaction is occuring at the time of the HUP? > > > Again, no effect on children of postmaster. > > Not so. The postmaster responds to the signal as soon as it's idle, > rereads the conf file itself, and rebroadcasts SIGHUP to all its > children. The children then reread the conf file immediately after they > next receive a query from their clients. See postmaster/postmaster.c > and tcop/postgres.c. > > A lot of the configuration file entries are not allowed to change in a > running backend, so the children will ignore attempted changes in those > entries. But for entries that can be changed on the fly, the response > is reasonably prompt across the board. Yes, true. The specific question was about pg_hba.conf, which only affects the postmaster. postgresql.conf is a file read by postmastger and backends, and that is re-read by children on sighup. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I've narrowed down the problem. Here is what I did: As suggested by everybody, I started the server like this: su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data -i" </dev/null >>$LOGFILE 2>&1 </dev/null & After this the process is started on the background. Now If I type 'Control-C' on the terminal, it generates 'Fast Shutdown' request to the Server Process, and server is shutdown. This happens if the database user 'pgsql' is created with BOURNE SHELL as default. I tried the same on other system, where the 'pgsql' user was created with tcsh shell as default, and it DOES NOT send the Fast Shutdown request to the server on typing 'Control-C' on the terminal. Can anybody tell me how I can avoid this from happening in the Bourne Shell ?? Detaching the 'stdin' from the terminal doesn't seem to be helping here !! thanks Sunit >"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes: > > > Let me make some corrections to my previous mail. > > I start the PostgreSQL Server as 'root' like this: > > > > su - pgsql -c "$DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data -i" > >> > > $LOGFILE 2>&1 & > > > > > > So I'm staring it in the background and I'm redirecting the stdout > > and stderr away from the terminal. Still I don't understand why it > > receives Fast Shutdown Request or (SIGINT signal). > >You might need to also run it with 'nohup'. > >-Doug >-- >Let us cross over the river, and rest under the shade of the trees. > --T. J. Jackson, 1863 > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to majordomo@postgresql.org so that your >message can get through to the mailing list cleanly _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
"Sunit Bhatia" <sunit_bhatia@hotmail.com> writes: > I've narrowed down the problem. Here is what I did: > As suggested by everybody, I started the server like this: > su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data > -i" </dev/null >>$LOGFILE 2>&1 </dev/null & Not sure, but maybe the correct spelling is su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data -i </dev/null >>$LOGFILE 2>&1" & As is, you're redirecting the stdin etc of su, not of the eventually-launched shell that execs the postmaster. I don't know if that's the problem, but it seems unlikely to be a good idea. regards, tom lane
I investigated it further, and figured out that BOURNE SHELL is a non-job control shell, whereas C Shell is a Job Control Shell. So in Bourne shell, even though the process is running as a background job, its Process Group ID is same as that of the Terminal ID. So any signal generated by the Terminal is sent to all the child processes, which have the same Process Group ID. That's why SIGINT signal generated by Control-C is sent to database server running in the background and server SHUTS DOWN !! This is not the case in case of background job running in C Shell. Now my question is HOW DO I DISSOCIATE MY SERVER PROCESS FROM THE TERMINAL ?? Any Ideas or suggestions ? Please share ! btw, I separated out 'su' and 'postmaster', but this problem is still there, as explained above. thanks Sunit > > I've narrowed down the problem. Here is what I did: > > As suggested by everybody, I started the server like this: > > > su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25 >-D$DB_ROOT/data > > -i" </dev/null >>$LOGFILE 2>&1 </dev/null & > >Not sure, but maybe the correct spelling is > >su - pgsql -c "nohup $DB_ROOT/bin/postmaster -B 50 -N 25 -D$DB_ROOT/data >-i </dev/null >>$LOGFILE 2>&1" & > >As is, you're redirecting the stdin etc of su, not of the >eventually-launched shell that execs the postmaster. I don't know if >that's the problem, but it seems unlikely to be a good idea. > > regards, tom lane _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
> Now my question is HOW DO I DISSOCIATE MY SERVER PROCESS FROM THE TERMINAL > ?? Put: #!/bin/bash at the top of the script.