Thread: postgres startup failure

postgres startup failure

From
Matt Bartolome
Date:
Hi,
I'm attempting to start postgres on a standby machine from a backup. Both the primary and standby are running postgres 8.4, fedora 12.

The backup on the primary is done nightly using the following commands:

# psql -d operations -U postgres -c "SELECT pg_start_backup('/data/postgres')";
# tar zcvf /mnt/thresheresc/postgres/backup.tar.gz /data/postgres/ --exclude "/data/postgres/pg_xlog";
# psql -d operations -U postgres -c "SELECT pg_stop_backup()";

I also have WAL enabled on the primary and writing to the standby but it doesn't seem to make a difference whether I'm using recovery.conf or not so I have left that step out of the equation for now while I figure out why the plain backup doesn't work.

After extracting the backup onto the standby machine and verifying the correct postgres user permissions it refuses to start.

-bash-4.0$ postgres -d 3 -D /data/postgres/
DEBUG:  postgres: PostmasterMain: initial environ dump:
DEBUG:  -----------------------------------------
DEBUG:          HOSTNAME=xxx.xxx.xxx
DEBUG:          SHELL=/bin/bash
DEBUG:          TERM=xterm
DEBUG:          HISTSIZE=1000
DEBUG:          USER=postgres
DEBUG:          LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
DEBUG:          MAIL=/var/spool/mail/postgres
DEBUG:          PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
DEBUG:          PWD=/var/lib/pgsql
DEBUG:          LANG=en_US.UTF-8
DEBUG:          SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
DEBUG:          HISTCONTROL=ignoreboth
DEBUG:          SHLVL=1
DEBUG:          HOME=/var/lib/pgsql
DEBUG:          LOGNAME=postgres
DEBUG:          PGDATA=/data/postgres
DEBUG:          LESSOPEN=|/usr/bin/lesspipe.sh %s
DEBUG:          G_BROKEN_FILENAMES=1
DEBUG:          _=/usr/bin/postgres
DEBUG:          PGLOCALEDIR=/usr/share/locale
DEBUG:          PGSYSCONFDIR=/etc/sysconfig/pgsql
DEBUG:          LC_COLLATE=en_US.UTF-8
DEBUG:          LC_CTYPE=en_US.UTF-8
DEBUG:          LC_MESSAGES=en_US.UTF-8
DEBUG:          LC_MONETARY=C
DEBUG:          LC_NUMERIC=C
DEBUG:          LC_TIME=C
DEBUG:  -----------------------------------------
DEBUG:  invoking IpcMemoryCreate(size=32595968)
DEBUG:  max_safe_fds = 984, usable_fds = 1000, already_open = 6
DEBUG:  logger shutting down
DEBUG:  shmem_exit(0): 0 callbacks to make
DEBUG:  proc_exit(0): 0 callbacks to make
DEBUG:  exit(0)
DEBUG:  shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make


There is nothing in the pgstartup.log file... If I initdb on an empty /data/postgres directory it starts up fine. I just can't start the database using the backup.

Anybody have a clue what is going on here? The memory messages seem suspicious to me...

Thank you,
Matt


Re: postgres startup failure

From
Tom Lane
Date:
Matt Bartolome <mattxbart@gmail.com> writes:
> I'm attempting to start postgres on a standby machine from a backup. Both
> the primary and standby are running postgres 8.4, fedora 12.

8.4.what exactly?

> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 984, usable_fds = 1000, already_open = 6
> DEBUG:  logger shutting down
> DEBUG:  shmem_exit(0): 0 callbacks to make
> DEBUG:  proc_exit(0): 0 callbacks to make
> DEBUG:  exit(0)
> DEBUG:  shmem_exit(-1): 0 callbacks to make
> DEBUG:  proc_exit(-1): 0 callbacks to make

Hmm.  I'm thinking that the postmaster must have simply crashed at some
time between starting the logger subprocess and starting the startup
subprocess.  You could try running the postmaster under gdb to see
if you can get a stack trace, along the lines of

    $ gdb /usr/bin/postgres
    gdb> run -d 3 -D /data/postgres/
    ... crash
    gdb> bt
    ... printout
    gdb> quit

You might need to "continue" if it decides to stop at the point of
forking off the logger.

            regards, tom lane

Re: postgres startup failure

From
Matt Bartolome
Date:
Hi Tom,

On Thu, May 20, 2010 at 11:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Bartolome <mattxbart@gmail.com> writes:
> I'm attempting to start postgres on a standby machine from a backup. Both
> the primary and standby are running postgres 8.4, fedora 12.

8.4.what exactly?
 
I'm running 8.4.2 on the primary and 8.4.4 on the standby via yum. I wouldn't think a minor release change would cause what I'm seeing... I've also built 8.4.2 from source on the standby and I get the same debug output. I should also mention these are both VM's (vmware) and are running under a windows host OS.

> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 984, usable_fds = 1000, already_open = 6
> DEBUG:  logger shutting down
> DEBUG:  shmem_exit(0): 0 callbacks to make
> DEBUG:  proc_exit(0): 0 callbacks to make
> DEBUG:  exit(0)
> DEBUG:  shmem_exit(-1): 0 callbacks to make
> DEBUG:  proc_exit(-1): 0 callbacks to make

Hmm.  I'm thinking that the postmaster must have simply crashed at some
time between starting the logger subprocess and starting the startup
subprocess.  You could try running the postmaster under gdb to see
if you can get a stack trace, along the lines of

       $ gdb /usr/bin/postgres
       gdb> run -d 3 -D /data/postgres/
       ... crash
       gdb> bt
       ... printout
       gdb> quit

 
gdb output...

DEBUG:  -----------------------------------------
DEBUG:  invoking IpcMemoryCreate(size=32595968)
DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 18310.
Detaching after fork from child process 18311.

Program exited with code 01.
DEBUG:  logger shutting down
DEBUG:  shmem_exit(0): 0 callbacks to make
DEBUG:  proc_exit(0): 0 callbacks to make
DEBUG:  exit(0)
DEBUG:  shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make
(gdb) bt
No stack.


You might need to "continue" if it decides to stop at the point of
forking off the logger.

                       regards, tom lane

Do you have any other suggestions?

Thank you,
Matt

Re: postgres startup failure

From
Tom Lane
Date:
Matt Bartolome <mattxbart@gmail.com> writes:
> gdb output...

> DEBUG:  -----------------------------------------
> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 18310.
> Detaching after fork from child process 18311.

> Program exited with code 01.

Huh.  So it called exit(1) somewhere, without generating any error
message first.  That's pretty unfriendly.  Try setting a breakpoint
at exit(), and if you can get it to stop there, get a stack trace
from that point.

Dunno if you're familiar with gdb, but the quick way to do this is
to say
    b exit
and then the "run" command.  It's possible that the attempt to set
the breakpoint will fail because libc.so isn't loaded yet, in
which case you need two steps:
    b main
    run ...
    when control stops at main:
    b exit
    continue

            regards, tom lane

Re: postgres startup failure

From
Matt Bartolome
Date:


On Thu, May 20, 2010 at 2:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Bartolome <mattxbart@gmail.com> writes:
> gdb output...

> DEBUG:  -----------------------------------------
> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 18310.
> Detaching after fork from child process 18311.

> Program exited with code 01.

Huh.  So it called exit(1) somewhere, without generating any error
message first.  That's pretty unfriendly.  Try setting a breakpoint
at exit(), and if you can get it to stop there, get a stack trace
from that point.

Dunno if you're familiar with gdb, but the quick way to do this is
to say
       b exit
and then the "run" command.  It's possible that the attempt to set
the breakpoint will fail because libc.so isn't loaded yet, in
which case you need two steps:
       b main
       run ...
       when control stops at main:
       b exit
       continue

Setting the breakpoint (b exit) got me a little farther...

DEBUG:  invoking IpcMemoryCreate(size=32595968)
DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
Detaching after fork from child process 19066.
Detaching after fork from child process 19067.

Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
(gdb) bt
#0  0x0046efb6 in exit () from /lib/libc.so.6
#1  0x082286de in proc_exit ()
#2  0x0820bbdf in ?? ()
#3  0x0820eb63 in ?? ()
#4  <signal handler called>
#5  0x003ec424 in __kernel_vsyscall ()
#6  0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
#7  0x0820d489 in ?? ()
#8  0x08210489 in PostmasterMain ()
#9  0x081b92df in main ()
(gdb) quit
 

                       regards, tom lane


Thanks Tom,
Matt

Re: postgres startup failure

From
Tom Lane
Date:
Matt Bartolome <mattxbart@gmail.com> writes:
> Setting the breakpoint (b exit) got me a little farther...

> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 19066.
> Detaching after fork from child process 19067.

> Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
> (gdb) bt
> #0  0x0046efb6 in exit () from /lib/libc.so.6
> #1  0x082286de in proc_exit ()
> #2  0x0820bbdf in ?? ()
> #3  0x0820eb63 in ?? ()
> #4  <signal handler called>
> #5  0x003ec424 in __kernel_vsyscall ()
> #6  0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
> #7  0x0820d489 in ?? ()
> #8  0x08210489 in PostmasterMain ()
> #9  0x081b92df in main ()

Hmm.  Not tremendously helpful --- if you were to install the
appropriate postgresql-debuginfo RPM, the stack trace might get
more useful.  However, this does appear to confirm the theory
that postmaster.c is doing proc_exit(1), and I don't see any code
paths in there where there isn't a error message logged first.
So right now I'm wondering whether you're not failing to notice
a relevant log message(s).  You've evidently managed to launch
the syslogger --- where is it configured to write the postmaster
log?

            regards, tom lane

Re: postgres startup failure

From
Matt Bartolome
Date:


On Thu, May 20, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Bartolome <mattxbart@gmail.com> writes:
> Setting the breakpoint (b exit) got me a little farther...

> DEBUG:  invoking IpcMemoryCreate(size=32595968)
> DEBUG:  max_safe_fds = 980, usable_fds = 1000, already_open = 10
> Detaching after fork from child process 19066.
> Detaching after fork from child process 19067.

> Breakpoint 1, 0x0046efb6 in exit () from /lib/libc.so.6
> (gdb) bt
> #0  0x0046efb6 in exit () from /lib/libc.so.6
> #1  0x082286de in proc_exit ()
> #2  0x0820bbdf in ?? ()
> #3  0x0820eb63 in ?? ()
> #4  <signal handler called>
> #5  0x003ec424 in __kernel_vsyscall ()
> #6  0x0051433d in ___newselect_nocancel () from /lib/libc.so.6
> #7  0x0820d489 in ?? ()
> #8  0x08210489 in PostmasterMain ()
> #9  0x081b92df in main ()

Hmm.  Not tremendously helpful --- if you were to install the
appropriate postgresql-debuginfo RPM, the stack trace might get
more useful.  However, this does appear to confirm the theory
that postmaster.c is doing proc_exit(1), and I don't see any code
paths in there where there isn't a error message logged first.
So right now I'm wondering whether you're not failing to notice
a relevant log message(s).  You've evidently managed to launch
the syslogger --- where is it configured to write the postmaster
log?


Gosh, you know I never even bothered looking in pg_log. Stupid! I was expecting to see some sort of error when doing:
$ postgres -d 3 -D /data/postgres/

I even have this written down in my own recovery instructions:

4. Create pg_xlog directory as postgres user, remove backup_label

I had forgotten when doing a backup with WAL enabled you want to exclude the pg_xlog directory because you are restoring from the WAL's written on the standby server in the event of failure. In the pg_log it clearly prints the error.
 
                       regards, tom lane

Thanks Tom, I probably would have been running in circles for days without your help.

-Matt

Re: postgres startup failure

From
Tom Lane
Date:
Matt Bartolome <mattxbart@gmail.com> writes:
> On Thu, May 20, 2010 at 3:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So right now I'm wondering whether you're not failing to notice
>> a relevant log message(s).  You've evidently managed to launch
>> the syslogger --- where is it configured to write the postmaster
>> log?

> Gosh, you know I never even bothered looking in pg_log. Stupid! I was
> expecting to see some sort of error when doing:
> $ postgres -d 3 -D /data/postgres/

I wonder whether it'd be a good idea to write a log message on the order
of "switching to syslogger" just before cutting over the log output
during startup.  That's probably not quite the best wording, but
something like that might help to remind people where to look.

            regards, tom lane