Thread: postgres startup failure
Hi,
I'm attempting to start postgres on a standby machine from a backup. Both the primary and standby are running postgres 8.4, fedora 12.
The backup on the primary is done nightly using the following commands:
# psql -d operations -U postgres -c "SELECT pg_start_backup('/data/postgres')";
# tar zcvf /mnt/thresheresc/postgres/backup.tar.gz /data/postgres/ --exclude "/data/postgres/pg_xlog";
# psql -d operations -U postgres -c "SELECT pg_stop_backup()";
I also have WAL enabled on the primary and writing to the standby but it doesn't seem to make a difference whether I'm using recovery.conf or not so I have left that step out of the equation for now while I figure out why the plain backup doesn't work.
After extracting the backup onto the standby machine and verifying the correct postgres user permissions it refuses to start.
-bash-4.0$ postgres -d 3 -D /data/postgres/
DEBUG: postgres: PostmasterMain: initial environ dump:
DEBUG: -----------------------------------------
DEBUG: HOSTNAME=xxx.xxx.xxx
DEBUG: SHELL=/bin/bash
DEBUG: TERM=xterm
DEBUG: HISTSIZE=1000
DEBUG: USER=postgres
DEBUG: LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
DEBUG: MAIL=/var/spool/mail/postgres
DEBUG: PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
DEBUG: PWD=/var/lib/pgsql
DEBUG: LANG=en_US.UTF-8
DEBUG: SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
DEBUG: HISTCONTROL=ignoreboth
DEBUG: SHLVL=1
DEBUG: HOME=/var/lib/pgsql
DEBUG: LOGNAME=postgres
DEBUG: PGDATA=/data/postgres
DEBUG: LESSOPEN=|/usr/bin/lesspipe.sh %s
DEBUG: G_BROKEN_FILENAMES=1
DEBUG: _=/usr/bin/postgres
DEBUG: PGLOCALEDIR=/usr/share/locale
DEBUG: PGSYSCONFDIR=/etc/sysconfig/pgsql
DEBUG: LC_COLLATE=en_US.UTF-8
DEBUG: LC_CTYPE=en_US.UTF-8
DEBUG: LC_MESSAGES=en_US.UTF-8
DEBUG: LC_MONETARY=C
DEBUG: LC_NUMERIC=C
DEBUG: LC_TIME=C
DEBUG: -----------------------------------------
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
DEBUG: logger shutting down
DEBUG: shmem_exit(0): 0 callbacks to make
DEBUG: proc_exit(0): 0 callbacks to make
DEBUG: exit(0)
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
There is nothing in the pgstartup.log file... If I initdb on an empty /data/postgres directory it starts up fine. I just can't start the database using the backup.
Anybody have a clue what is going on here? The memory messages seem suspicious to me...
Thank you,
Matt
I'm attempting to start postgres on a standby machine from a backup. Both the primary and standby are running postgres 8.4, fedora 12.
The backup on the primary is done nightly using the following commands:
# psql -d operations -U postgres -c "SELECT pg_start_backup('/data/postgres')";
# tar zcvf /mnt/thresheresc/postgres/backup.tar.gz /data/postgres/ --exclude "/data/postgres/pg_xlog";
# psql -d operations -U postgres -c "SELECT pg_stop_backup()";
I also have WAL enabled on the primary and writing to the standby but it doesn't seem to make a difference whether I'm using recovery.conf or not so I have left that step out of the equation for now while I figure out why the plain backup doesn't work.
After extracting the backup onto the standby machine and verifying the correct postgres user permissions it refuses to start.
-bash-4.0$ postgres -d 3 -D /data/postgres/
DEBUG: postgres: PostmasterMain: initial environ dump:
DEBUG: -----------------------------------------
DEBUG: HOSTNAME=xxx.xxx.xxx
DEBUG: SHELL=/bin/bash
DEBUG: TERM=xterm
DEBUG: HISTSIZE=1000
DEBUG: USER=postgres
DEBUG: LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
DEBUG: MAIL=/var/spool/mail/postgres
DEBUG: PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
DEBUG: PWD=/var/lib/pgsql
DEBUG: LANG=en_US.UTF-8
DEBUG: SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
DEBUG: HISTCONTROL=ignoreboth
DEBUG: SHLVL=1
DEBUG: HOME=/var/lib/pgsql
DEBUG: LOGNAME=postgres
DEBUG: PGDATA=/data/postgres
DEBUG: LESSOPEN=|/usr/bin/lesspipe.sh %s
DEBUG: G_BROKEN_FILENAMES=1
DEBUG: _=/usr/bin/postgres
DEBUG: PGLOCALEDIR=/usr/share/locale
DEBUG: PGSYSCONFDIR=/etc/sysconfig/pgsql
DEBUG: LC_COLLATE=en_US.UTF-8
DEBUG: LC_CTYPE=en_US.UTF-8
DEBUG: LC_MESSAGES=en_US.UTF-8
DEBUG: LC_MONETARY=C
DEBUG: LC_NUMERIC=C
DEBUG: LC_TIME=C
DEBUG: -----------------------------------------
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
DEBUG: logger shutting down
DEBUG: shmem_exit(0): 0 callbacks to make
DEBUG: proc_exit(0): 0 callbacks to make
DEBUG: exit(0)
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
There is nothing in the pgstartup.log file... If I initdb on an empty /data/postgres directory it starts up fine. I just can't start the database using the backup.
Anybody have a clue what is going on here? The memory messages seem suspicious to me...
Thank you,
Matt
Matt Bartolome <mattxbart@gmail.com> writes: > I'm attempting to start postgres on a standby machine from a backup. Both > the primary and standby are running postgres 8.4, fedora 12. 8.4.what exactly? > DEBUG: invoking IpcMemoryCreate(size=32595968) > DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6 > DEBUG: logger shutting down > DEBUG: shmem_exit(0): 0 callbacks to make > DEBUG: proc_exit(0): 0 callbacks to make > DEBUG: exit(0) > DEBUG: shmem_exit(-1): 0 callbacks to make > DEBUG: proc_exit(-1): 0 callbacks to make Hmm. I'm thinking that the postmaster must have simply crashed at some time between starting the logger subprocess and starting the startup subprocess. You could try running the postmaster under gdb to see if you can get a stack trace, along the lines of $ gdb /usr/bin/postgres gdb> run -d 3 -D /data/postgres/ ... crash gdb> bt ... printout gdb> quit You might need to "continue" if it decides to stop at the point of forking off the logger. regards, tom lane
Hi Tom,
Do you have any other suggestions?
Thank you,
Matt
On Thu, May 20, 2010 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
gdb output...
DEBUG: -----------------------------------------
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 18310.
Detaching after fork from child process 18311.
Program exited with code 01.
DEBUG: logger shutting down
DEBUG: shmem_exit(0): 0 callbacks to make
DEBUG: proc_exit(0): 0 callbacks to make
DEBUG: exit(0)
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
(gdb) bt
No stack.
Matt Bartolome <mattxbart@gmail.com> writes:8.4.what exactly?
> I'm attempting to start postgres on a standby machine from a backup. Both
> the primary and standby are running postgres 8.4, fedora 12.
I'm running 8.4.2 on the primary and 8.4.4 on the standby via yum. I wouldn't think a minor release change would cause what I'm seeing... I've also built 8.4.2 from source on the standby and I get the same debug output. I should also mention these are both VM's (vmware) and are running under a windows host OS.
> DEBUG: invoking IpcMemoryCreate(size=32595968)Hmm. I'm thinking that the postmaster must have simply crashed at some
> DEBUG: max_safe_fds = 984, usable_fds = 1000, already_open = 6
> DEBUG: logger shutting down
> DEBUG: shmem_exit(0): 0 callbacks to make
> DEBUG: proc_exit(0): 0 callbacks to make
> DEBUG: exit(0)
> DEBUG: shmem_exit(-1): 0 callbacks to make
> DEBUG: proc_exit(-1): 0 callbacks to make
time between starting the logger subprocess and starting the startup
subprocess. You could try running the postmaster under gdb to see
if you can get a stack trace, along the lines of
$ gdb /usr/bin/postgres
gdb> run -d 3 -D /data/postgres/
... crash
gdb> bt
... printout
gdb> quit
DEBUG: -----------------------------------------
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 18310.
Detaching after fork from child process 18311.
Program exited with code 01.
DEBUG: logger shutting down
DEBUG: shmem_exit(0): 0 callbacks to make
DEBUG: proc_exit(0): 0 callbacks to make
DEBUG: exit(0)
DEBUG: shmem_exit(-1): 0 callbacks to make
DEBUG: proc_exit(-1): 0 callbacks to make
(gdb) bt
No stack.
You might need to "continue" if it decides to stop at the point of
forking off the logger.
regards, tom lane
Do you have any other suggestions?
Thank you,
Matt
Matt Bartolome <mattxbart@gmail.com> writes: > gdb output... > DEBUG: ----------------------------------------- > DEBUG: invoking IpcMemoryCreate(size=32595968) > DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10 > Detaching after fork from child process 18310. > Detaching after fork from child process 18311. > Program exited with code 01. Huh. So it called exit(1) somewhere, without generating any error message first. That's pretty unfriendly. Try setting a breakpoint at exit(), and if you can get it to stop there, get a stack trace from that point. Dunno if you're familiar with gdb, but the quick way to do this is to say b exit and then the "run" command. It's possible that the attempt to set the breakpoint will fail because libc.so isn't loaded yet, in which case you need two steps: b main run ... when control stops at main: b exit continue regards, tom lane
On Thu, May 20, 2010 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Setting the breakpoint (b exit) got me a little farther...
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 19066.
Detaching after fork from child process 19067.
Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
(gdb) bt
#0 0x0046efb6 in exit () from /lib/libc.so.6
#1 0x082286de in proc_exit ()
#2 0x0820bbdf in ?? ()
#3 0x0820eb63 in ?? ()
#4 <signal handler called>
#5 0x003ec424 in __kernel_vsyscall ()
#6 0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
#7 0x0820d489 in ?? ()
#8 0x08210489 in PostmasterMain ()
#9 0x081b92df in main ()
(gdb) quit
Matt Bartolome <mattxbart@gmail.com> writes:> gdb output...Huh. So it called exit(1) somewhere, without generating any error
> DEBUG: -----------------------------------------
> DEBUG: invoking IpcMemoryCreate(size=32595968)
> DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 18310.
> Detaching after fork from child process 18311.
> Program exited with code 01.
message first. That's pretty unfriendly. Try setting a breakpoint
at exit(), and if you can get it to stop there, get a stack trace
from that point.
Dunno if you're familiar with gdb, but the quick way to do this is
to say
b exit
and then the "run" command. It's possible that the attempt to set
the breakpoint will fail because libc.so isn't loaded yet, in
which case you need two steps:
b main
run ...
when control stops at main:
b exit
continue
Setting the breakpoint (b exit) got me a little farther...
DEBUG: invoking IpcMemoryCreate(size=32595968)
DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 19066.
Detaching after fork from child process 19067.
Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
(gdb) bt
#0 0x0046efb6 in exit () from /lib/libc.so.6
#1 0x082286de in proc_exit ()
#2 0x0820bbdf in ?? ()
#3 0x0820eb63 in ?? ()
#4 <signal handler called>
#5 0x003ec424 in __kernel_vsyscall ()
#6 0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
#7 0x0820d489 in ?? ()
#8 0x08210489 in PostmasterMain ()
#9 0x081b92df in main ()
(gdb) quit
regards, tom lane
Thanks Tom,
Matt
Matt Bartolome <mattxbart@gmail.com> writes: > Setting the breakpoint (b exit) got me a little farther... > DEBUG: invoking IpcMemoryCreate(size=32595968) > DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10 > Detaching after fork from child process 19066. > Detaching after fork from child process 19067. > Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6 > (gdb) bt > #0 0x0046efb6 in exit () from /lib/libc.so.6 > #1 0x082286de in proc_exit () > #2 0x0820bbdf in ?? () > #3 0x0820eb63 in ?? () > #4 <signal handler called> > #5 0x003ec424 in __kernel_vsyscall () > #6 0x0051433d in ___newselect_nocancel () from /lib/libc.so.6 > #7 0x0820d489 in ?? () > #8 0x08210489 in PostmasterMain () > #9 0x081b92df in main () Hmm. Not tremendously helpful --- if you were to install the appropriate postgresql-debuginfo RPM, the stack trace might get more useful. However, this does appear to confirm the theory that postmaster.c is doing proc_exit(1), and I don't see any code paths in there where there isn't a error message logged first. So right now I'm wondering whether you're not failing to notice a relevant log message(s). You've evidently managed to launch the syslogger --- where is it configured to write the postmaster log? regards, tom lane
On Thu, May 20, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gosh, you know I never even bothered looking in pg_log. Stupid! I was expecting to see some sort of error when doing:
$ postgres -d 3 -D /data/postgres/
I even have this written down in my own recovery instructions:
4. Create pg_xlog directory as postgres user, remove backup_label
I had forgotten when doing a backup with WAL enabled you want to exclude the pg_xlog directory because you are restoring from the WAL's written on the standby server in the event of failure. In the pg_log it clearly prints the error.
Matt Bartolome <mattxbart@gmail.com> writes:> Setting the breakpoint (b exit) got me a little farther...Hmm. Not tremendously helpful --- if you were to install the
> DEBUG: invoking IpcMemoryCreate(size=32595968)
> DEBUG: max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 19066.
> Detaching after fork from child process 19067.
> Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
> (gdb) bt
> #0 0x0046efb6 in exit () from /lib/libc.so.6
> #1 0x082286de in proc_exit ()
> #2 0x0820bbdf in ?? ()
> #3 0x0820eb63 in ?? ()
> #4 <signal handler called>
> #5 0x003ec424 in __kernel_vsyscall ()
> #6 0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
> #7 0x0820d489 in ?? ()
> #8 0x08210489 in PostmasterMain ()
> #9 0x081b92df in main ()
appropriate postgresql-debuginfo RPM, the stack trace might get
more useful. However, this does appear to confirm the theory
that postmaster.c is doing proc_exit(1), and I don't see any code
paths in there where there isn't a error message logged first.
So right now I'm wondering whether you're not failing to notice
a relevant log message(s). You've evidently managed to launch
the syslogger --- where is it configured to write the postmaster
log?
Gosh, you know I never even bothered looking in pg_log. Stupid! I was expecting to see some sort of error when doing:
$ postgres -d 3 -D /data/postgres/
I even have this written down in my own recovery instructions:
4. Create pg_xlog directory as postgres user, remove backup_label
I had forgotten when doing a backup with WAL enabled you want to exclude the pg_xlog directory because you are restoring from the WAL's written on the standby server in the event of failure. In the pg_log it clearly prints the error.
regards, tom lane
Thanks Tom, I probably would have been running in circles for days without your help.
-Matt
Matt Bartolome <mattxbart@gmail.com> writes: > On Thu, May 20, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So right now I'm wondering whether you're not failing to notice >> a relevant log message(s). You've evidently managed to launch >> the syslogger --- where is it configured to write the postmaster >> log? > Gosh, you know I never even bothered looking in pg_log. Stupid! I was > expecting to see some sort of error when doing: > $ postgres -d 3 -D /data/postgres/ I wonder whether it'd be a good idea to write a log message on the order of "switching to syslogger" just before cutting over the log output during startup. That's probably not quite the best wording, but something like that might help to remind people where to look. regards, tom lane