Thread: Postgres not starting at boot(FreeBSD) - startup script not releasing
Try this on for size... recently during a reboot (first in about 3 months for this particular server) our entire rc.d directory failed to start... after some hacking of the rc file to output some helpful debuggin, it was apparent that the 010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any directives thereafter not to be processed. Running the script manually as root starts the postmaster but doesn't return you to the command prompt. ^C and checking the errlog shows Waiting for postmaster starting up..DEBUG: Data Base System is starting up at Sat Mar 9 17:05:45 2002 DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002 DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002 Fast Shutdown request at Sat Mar 9 17:05:48 2002 DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002 DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002 Can force it to return to command prompt by adding a "&" and doublt cr web1# /usr/local/etc/rc.d/010.pgsql.sh start & [1] 4635 web1# [1] + Suspended (tty output) /usr/local/etc/rc.d/010.pgsql.sh start web1# and postgres stays up and frees the terminal. Output in errlog for this is... Waiting for postmaster starting up..DEBUG: Data Base System is starting up at Sat Mar 9 17:07:21 2002 DEBUG: Data Base System was shut down at Sat Mar 9 17:05:48 2002 DEBUG: Data Base System is in production state at Sat Mar 9 17:07:21 2002 No idea what could be causing the script not to function as it is the EXACT same script as on the other servers we are operating (did a diff just to be sure) In the interim we removed the script from the startup dir... any ideas as to why this is occuring? Installed from port, left the port startup script as is... listed below. Appreciate any feedback/comments. Dave # $FreeBSD: ports/databases/postgresql7/files/pgsql.sh.tmpl,v 1.9 2000/12/11 03:22:07 steve Exp $ # # For postmaster startup options, edit $PGDATA/postmaster.opts.default # Preinstalled options are -i -o "-F" case $1 in start) [ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib [ -x /usr/local/pgsql/bin/pg_ctl ] && { su -l pgsql -c \ 'exec /usr/local/pgsql/bin/pg_ctl -w start > /usr/local/pgsql/errlog 2>&1' echo -n ' pgsql' } ;; stop) [ -x /usr/local/pgsql/bin/pg_ctl ] && { exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl -w -m fast stop' } ;; status) [ -x /usr/local/pgsql/bin/pg_ctl ] && { exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status' } ;; *) echo "usage: `basename $0` {start|stop|status}" >&2 exit 64 ;; esac
"Dave" <dave@hawk-systems.com> writes: > DEBUG: Data Base System is starting up at Sat Mar 9 17:05:45 2002 > DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002 > DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002 > Fast Shutdown request at Sat Mar 9 17:05:48 2002 > DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002 > DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002 It looks like something is hitting the postmaster with a SIGINT signal as soon as it starts. Got any idea what might be doing that? It's not pg_ctl, for sure (unless the "something" is firing your init script with a 'stop' option). In any case I think you should be looking for outside agencies, not a problem directly in this init script. regards, tom lane
Sorry, should point out that the stop is resulting from executing a ^c after running the script manually. Since the script runs... postgres starts, but from reading the startup script, it is waiting for the pid file to appear before reporting suscess... and it isn't doing this. Or at least not exiting and leaving the postmaster running. It just sits there... thus the ^c to regain the terminal. opening two terminals, I can run the start script, and while the first terminal is sitting there waiting for the script to release control, move to the second terminal and view the results... postmaster running fine, pid file there, all normal. if I execute the script with the & behind it, it allows everything through after entering another <cr> which from what I can see suspends the session which then clears normally. (making sense?) Confused still as to the cause or how to rectify. Dave >-----Original Message----- >From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >Sent: Sunday, March 10, 2002 11:22 AM >To: Dave >Cc: pgsql-admin@postgresql.org >Subject: Re: [ADMIN] Postgres not starting at boot(FreeBSD) - startup >script not releasing > > >"Dave" <dave@hawk-systems.com> writes: >> DEBUG: Data Base System is starting up at Sat Mar 9 17:05:45 2002 >> DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002 >> DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002 >> Fast Shutdown request at Sat Mar 9 17:05:48 2002 >> DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002 >> DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002 > >It looks like something is hitting the postmaster with a SIGINT signal >as soon as it starts. Got any idea what might be doing that? It's >not pg_ctl, for sure (unless the "something" is firing your init >script with a 'stop' option). In any case I think you should be looking >for outside agencies, not a problem directly in this init script. > > regards, tom lane >
hold the farm... >>> Try this on for size... recently during a reboot (first in about 3 >>months for >>> this particular server) our entire rc.d directory failed to start... >> after some >>> hacking of the rc file to output some helpful debuggin, it was >>apparent that the >>> 010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any >>> directives thereafter not to be processed. >> >>have you tried manually doing "pg_ctl restart" to see if any problems >>pop-up? Maybe it is not a script error, but some other issue with the db >>server. did the following, stopped the server totally... then ran the following. web5# su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl start' postmaster successfully started up. web5# DEBUG: Data Base System is starting up at Sun Mar 10 14:32:46 2002 DEBUG: Data Base System was shut down at Sun Mar 10 14:32:04 2002 DEBUG: Data Base System is in production state at Sun Mar 10 14:32:46 2002 web5# web5# su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl restart' Smart Shutdown request at Sun Mar 10 14:33:25 2002 Waiting for postmaster shutting down..................................The Data Base System is shutting down ..........The Data Base System is shutting down ...The Data Base System is shutting down ....The Data Base System is shutting down ...The Data Base System is shutting down .........pg_ctl: postmaster does not shut down web5# The Data Base System is shutting down The Data Base System is shutting down The Data Base System is shutting down The Data Base System is shutting down Hmmm... check that its still running... web5# ps -aux | grep pgsql pgsql 81016 0.0 0.1 628 452 p0 I 2:32PM 0:00.00 /bin/sh /usr/loca pgsql 81018 0.0 0.3 4080 2404 p0 I 2:32PM 0:00.03 /usr/local/pgsql/ pgsql 81082 0.0 0.4 4508 3008 p0 I 2:33PM 0:00.03 /usr/local/pgsql/ pgsql 81083 0.0 0.4 4556 3364 p0 I 2:33PM 0:00.06 /usr/local/pgsql/ web5# ok, lets try and use the rc.d script... web5# /usr/local/etc/rc.d/010* stop Fast Shutdown request at Sun Mar 10 14:37:28 2002 Aborting any active transaction... Waiting for postmaster shutting down..FATAL 1: The system is shutting down FATAL 1: The system is shutting down NOTICE: AbortTransaction and not in in-progress state .NOTICE: AbortTransaction and not in in-progress state DEBUG: Data Base System shutting down at Sun Mar 10 14:37:28 2002 DEBUG: Data Base System shut down at Sun Mar 10 14:37:28 2002 done. postmaster successfully shut down. web5# Thats interesting, perhaps pg_ctl is hosed? web5# ps -aux | grep pgsql web5# Ideas? Dave
On Sun, 10 Mar 2002, Dave wrote: I use the following lines (at /usr/local/etc/rc.d/pgsql.sh) -- 8< -- #!/bin/sh PGBIN=/usr/local/pgsql/bin cmd="$1" : ${cmd:=start} case $cmd in start) [ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib [ -x ${PGBIN}/pg_ctl ] && { echo -n 'pgsql ' su -l pgsql -c \ '[ -d ${PGDATA} ] && exec /usr/local/pgsql/bin/pg_ctl start -s -l ~pgsql/log/errlog' } ;; stop) [ -x ${PGBIN}/pg_ctl ] && { echo -n 'pgsql ' su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl stop -s -m fast' } ;; status) [ -x ${PGBIN}/pg_ctl ] && { exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status' } ;; *) echo "usage: `basename $0` {start|stop|status}" >&2 exit 64 ;; esac -- 8< -- D> Try this on for size... recently during a reboot (first in about 3 months for D> this particular server) our entire rc.d directory failed to start... after some D> hacking of the rc file to output some helpful debuggin, it was apparent that the D> 010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any D> directives thereafter not to be processed. D> D> Running the script manually as root starts the postmaster but doesn't return you D> to the command prompt. ^C and checking the errlog shows D> D> Waiting for postmaster starting up..DEBUG: Data Base System is starting up at D> Sat Mar 9 17:05:45 2002 D> DEBUG: Data Base System was shut down at Sat Mar 9 17:05:39 2002 D> DEBUG: Data Base System is in production state at Sat Mar 9 17:05:45 2002 D> Fast Shutdown request at Sat Mar 9 17:05:48 2002 D> DEBUG: Data Base System shutting down at Sat Mar 9 17:05:48 2002 D> DEBUG: Data Base System shut down at Sat Mar 9 17:05:48 2002 D> D> Can force it to return to command prompt by adding a "&" and doublt cr D> D> web1# /usr/local/etc/rc.d/010.pgsql.sh start & D> [1] 4635 D> web1# D> [1] + Suspended (tty output) /usr/local/etc/rc.d/010.pgsql.sh start D> web1# D> D> and postgres stays up and frees the terminal. Output in errlog for this is... D> D> Waiting for postmaster starting up..DEBUG: Data Base System is starting up at D> Sat Mar 9 17:07:21 2002 D> DEBUG: Data Base System was shut down at Sat Mar 9 17:05:48 2002 D> DEBUG: Data Base System is in production state at Sat Mar 9 17:07:21 2002 D> D> No idea what could be causing the script not to function as it is the EXACT same D> script as on the other servers we are operating (did a diff just to be sure) D> D> In the interim we removed the script from the startup dir... any ideas as to D> why this is occuring? D> D> Installed from port, left the port startup script as is... listed below. D> Appreciate any feedback/comments. D> D> Dave D> D> # $FreeBSD: ports/databases/postgresql7/files/pgsql.sh.tmpl,v 1.9 2000/12/11 D> 03:22:07 steve Exp $ D> # D> # For postmaster startup options, edit $PGDATA/postmaster.opts.default D> # Preinstalled options are -i -o "-F" D> D> case $1 in D> start) D> [ -d /usr/local/pgsql/lib ] && /sbin/ldconfig -m /usr/local/pgsql/lib D> [ -x /usr/local/pgsql/bin/pg_ctl ] && { D> su -l pgsql -c \ D> 'exec /usr/local/pgsql/bin/pg_ctl -w start > /usr/local/pgsql/errlog D> 2>&1' D> echo -n ' pgsql' D> } D> ;; D> D> stop) D> [ -x /usr/local/pgsql/bin/pg_ctl ] && { D> exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl -w -m fast stop' D> } D> ;; D> D> status) D> [ -x /usr/local/pgsql/bin/pg_ctl ] && { D> exec su -l pgsql -c 'exec /usr/local/pgsql/bin/pg_ctl status' D> } D> ;; D> D> *) D> echo "usage: `basename $0` {start|stop|status}" >&2 D> exit 64 D> ;; D> esac D> D> D> ---------------------------(end of broadcast)--------------------------- D> TIP 4: Don't 'kill -9' the postmaster D> Sincerely, D.Marck [DM5020, DM268-RIPE, DM3-RIPN] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing
From
"Matthew D. Fuller"
Date:
On Sun, Mar 10, 2002 at 09:11:11AM -0500 I heard the voice of Dave, and lo! it spake thus: > Try this on for size... recently during a reboot (first in about 3 months for > this particular server) our entire rc.d directory failed to start... after some > hacking of the rc file to output some helpful debuggin, it was apparent that the > 010.pgsql.sh script in /usr/local/etc/rc.d was timing out and causing any > directives thereafter not to be processed. At a guess, you've set it up to not automatically trust local users, so the default options which 'wait' for the server to come up (and "waits" by having psql try connecting as the postgres user) waits for a long long time for somebody to give it the password it now requires. I find that rather annoying, and miss it every time, until the rc script hangs. Check the options and figure out which one it is you have to take out, I can't recall offhand. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Unix Systems Administrator | fullermd@futuresouth.com Specializing in FreeBSD | http://www.over-yonder.net/ "The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet"
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing < solved
From
"Dave"
Date:
Bingo! Dumb move. Dropped everything to password a few months back, never had the occasion to restart after that. Will work on tweaking the pg_hba.conf Thanks Matthew... if you are ever in Toronto, I owe you a beer. Dave >At a guess, you've set it up to not automatically trust local users, so >the default options which 'wait' for the server to come up (and "waits" >by having psql try connecting as the postgres user) waits for a long long >time for somebody to give it the password it now requires. > >I find that rather annoying, and miss it every time, until the rc script >hangs. Check the options and figure out which one it is you have to take >out, I can't recall offhand. > > > >-- >Matthew Fuller (MF4839) | fullermd@over-yonder.net >Unix Systems Administrator | fullermd@futuresouth.com >Specializing in FreeBSD | http://www.over-yonder.net/ >
Re: Postgres not starting at boot(FreeBSD) - startup script not releasing < solved
From
"Matthew D. Fuller"
Date:
On Sun, Mar 10, 2002 at 06:11:21PM -0500 I heard the voice of Dave, and lo! it spake thus: > Bingo! Dumb move. Dropped everything to password a few months back, never had > the occasion to restart after that. Will work on tweaking the pg_hba.conf FWIW (after a quick glance at the default script and the manpage), "-w" is the pg_ctl option that makes it wait. I just take it out; it only takes PG a few seconds to initialize, so it's ready to go long before something would need to connect to it. It could also be said that having -w implemented as invoking psql to try to connect as the DB superuser assuming no password is a rather inappropriate way of going about it, but that's another can of worms. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Unix Systems Administrator | fullermd@futuresouth.com Specializing in FreeBSD | http://www.over-yonder.net/ "The only reason I'm burning my candle at both ends, is because I haven't figured out how to light the middle yet"
At 02:54 AM 3/11/2002 , Matthew D. Fuller wrote: >It could also be said that having -w implemented as invoking psql to try >to connect as the DB superuser assuming no password is a rather >inappropriate way of going about it, but that's another can of worms. It doesn't wait for the PID file to be created (at least, no on our 7.1.2 systems). It attempts to connect to a database using psql, and loops until that connection is successful. Which it won't be if you've got a password, because the script will wait for some entity to type the password, and hang. My fix here was a hack in pg_ctl, right at the bottom where the script is looping on a psql attempt to connect to a database to prove the system is up. I added a "-h localhost" to the psql invocation to force a TCP connection, and then used "ident" instead of password for the authorization. -crl -- Chad R. Larson (CRL22) chad@eldocomp.com Eldorado Computing, Inc. 602-604-3100 5353 North 16th Street, Suite 400 Phoenix, Arizona 85016-3228