Thread: psql hangs when trying to access a database
Running into an interesting problem when trying to do a psql from the commandline. My infrastructure: Intel Architecture Redhat 7.1 PostgreSQL 7.0.3 (from stock RPM's of the 7.1 distribution) 320MB RAM Kernel shared memory increases: # Up the maximum shared memory values kernel.shmall = 207618048 kernel.shmmax = 207618048 pg_ctl startup in /etc/rc.d/init.d/postgresql: su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster -o '-d 5 -i -B 24576' start >/tmp/postgres.log 2>&1" < /dev/null Database Environment 5 databases on this one server. The one in question has about 5 tables with 60,000 rows per table average. The entire directory under $PGDATA is only about 160MB At the command line I type 'psql datawarehouse'. I expect to see the banner from the psql command but in this case, nothing. The system seems to just hang. Running a top shows that my process is taking up 99% of the User CPU but according to iostat, there is no activity on the harddrive. Here's what the 'ps -awx|grep post' command shows me: 3073 pts/1 S 0:00 /bin/sh /usr/bin/pg_ctl -D /db/pgsql -p /usr/bin/postmaster -o -d 5 -i -B 24576 start 3075 pts/1 S 0:00 /usr/bin/postmaster -d 5 -i -B 24576 3085 pts/1 R 0:11 /usr/bin/postgres localhost mnickel datawarehouse startup generally I would expect the /usr/bin/postgres line to eventually change to 'idle' instead of startup... Also, pg_dump and vacuumdb do not work. The postmaster happily spawns off another /usr/bin/postgres, but the state never switches out of 'idle' for pg_dump or vacuumdb. Upon inspection of the /tmp/postgres.log file, (included at the end), I see that the processing is hanging right after the 'InitPostgres'. Normally there is a bunch of other "stuff" that is there when normal entry into the psql command completes... I have no problem using psql on any of the other 5 databases on this server. Everything runs smoothly. This is the only database that is giving us grief. This is the first problem that we've come across in almost a year of PostgreSQL utilization. I'm totally baffled... :) Thanks so much for looking at this, as I'm getting the "see, I told you that open source/freeware stuff wasn't any good" pressure. :) Mark Nickel Here is the logfile in question: postmaster successfully started up. FindExec: found "/usr/bin/postgres" using argv[0] binding ShmemCreate(key=52e2c1, size=205273088) 011105.20:22:27.451 [3076] DEBUG: Data Base System is starting up at Mon Nov 5 20:22:27 2001 011105.20:22:27.451 [3076] DEBUG: Data Base System was shut down at Mon Nov 5 20:22:13 2001 011105.20:22:27.453 [3076] DEBUG: Data Base System is in production state at Mon Nov 5 20:22:27 2001 011105.20:22:27.453 [3076] proc_exit(0) 011105.20:22:27.454 [3076] shmem_exit(0) 011105.20:22:27.454 [3076] exit(0) /usr/bin/postmaster: reaping dead processes... /usr/bin/postmaster: ServerLoop: handling reading 5 /usr/bin/postmaster: ServerLoop: handling reading 5 /usr/bin/postmaster: ServerLoop: handling writing 5 /usr/bin/postmaster: BackendStartup: environ dump: ----------------------------------------- PWD=/var/lib/pgsql HOSTNAME=topaz.dunsirn.com ODBCINI=/etc/odbc.ini PVM_RSH=/usr/bin/rsh QTDIR=/usr/lib/qt-2.3.0 LESSOPEN=|/usr/bin/lesspipe.sh %s XPVM_ROOT=/usr/share/pvm3/xpvm KDEDIR=/usr USER=postgres LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35: MACHTYPE=i386-redhat-linux-gnu MAIL=/var/spool/mail/postgres INPUTRC=/etc/inputrc LANG=en_US LOGNAME=postgres SHLVL=2 MQSERVER=OPAL.AMBER/TCP/opal(1414) SHELL=/bin/bash PGLIB=/usr/lib/pgsql HOSTTYPE=i386 OSTYPE=linux-gnu HISTSIZE=1000 LAMHELPFILE=/etc/lam/lam-helpfile PVM_ROOT=/usr/share/pvm3 TERM=xterm HOME=/var/lib/pgsql SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass PGDATA=/db/pgsql PATH=/usr/kerberos/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin _=/usr/bin/postmaster POSTPORT=5432 POSTID=2147483646 IPC_KEY=5432000 PG_USER=mnickel ----------------------------------------- /usr/bin/postmaster: BackendStartup: pid 3085 user mnickel db datawarehouse socket 5 /usr/bin/postmaster child[3085]: starting with (/usr/bin/postgres -d5 -B 24576 -v131072 -p datawarehouse ) FindExec: found "/usr/bin/postgres" using argv[0] 011105.20:22:46.186 [3085] started: host=localhost user=mnickel database=datawarehouse 011105.20:22:46.187 [3085] InitPostgres 011105.20:27:45.645 [3075] pmdie 15 Smart Shutdown request at Mon Nov 5 20:27:45 2001 /usr/bin/pg_ctl: line 1: 3075 Killed /usr/bin/postmaster -d 5 -i -B 24576
Mark Nickel <mnickel@dunsirn.com> writes: > Running into an interesting problem when trying to do a psql from the > commandline. Can't say I've heard of a similar problem before. Can you attach to the looping process with gdb and get a stack trace? You might need to rebuild from source with --enable-debug to get anything really useful from gdb, but let's first try it with the executables you have. regards, tom lane
Tom Lane wrote: > Mark Nickel <mnickel@dunsirn.com> writes: > > Running into an interesting problem when trying to do a psql from the > > commandline. > > Can't say I've heard of a similar problem before. Can you attach to the > looping process with gdb and get a stack trace? Are there utilities to retrieve data from the raw files on the harddrive, or must everything be done via the postmaster? Kinda like a fsck for the database... Doing a 'gdb'. Once I'm in I've done a 'info stack'. Not sure how much to include, but here goes! [root@topaz /db]# gdb /usr/bin/postgres 3712 GNU gdb 5.0rh-5 Red Hat Linux 7.1 Copyright 2001 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)... /db/3712: No such file or directory. Attaching to program: /usr/bin/postgres, process 3712 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/i686/libm.so.6...done. Loaded symbols for /lib/i686/libm.so.6 Reading symbols from /lib/libutil.so.1...done. Loaded symbols for /lib/libutil.so.1 Reading symbols from /usr/lib/libreadline.so.4.1...done. Loaded symbols for /usr/lib/libreadline.so.4.1 Reading symbols from /lib/libtermcap.so.2...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /usr/lib/libncurses.so.5...done. Loaded symbols for /usr/lib/libncurses.so.5 Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x080f6900 in SpinAcquire () at eval.c:41 41 eval.c: No such file or directory. in eval.c (gdb) info stack #0 0x080f6900 in SpinAcquire () at eval.c:41 #1 0x0807875d in _bt_getroot () at eval.c:41 #2 0x08079e0a in _bt_search () at eval.c:41 #3 0x0807acab in _bt_first () at eval.c:41 #4 0x0807926a in btgettuple () at eval.c:41 #5 0x08071fc7 in index_getnext () at eval.c:41 #6 0x08088edb in CatalogHasIndex () at eval.c:41 #7 0x08089748 in ClassNameIndexScan () at eval.c:41 #8 0x08136d0a in RelationSetIndexSupport () at eval.c:41 #9 0x08137a0c in RelationSetIndexSupport () at eval.c:41 #10 0x08138047 in RelationNameGetRelation () at eval.c:41 #11 0x0806ff18 in heap_openr () at eval.c:41 #12 0x08135031 in float8_to_char () at eval.c:41 #13 0x08135e74 in SearchSysCache () at eval.c:41 #14 0x08139ce6 in SearchSysCacheTuple () at eval.c:41 #15 0x0813da2e in SetUserId () at eval.c:41 #16 0x0813e21d in InitPostgres () at eval.c:41 #17 0x080fe4f1 in PostgresMain () at eval.c:41 #18 0x080e5e40 in PostmasterMain () at eval.c:41 #19 0x080e59e0 in PostmasterMain () at eval.c:41 #20 0x080e4b29 in PostmasterMain () at eval.c:41 #21 0x080e44ac in PostmasterMain () at eval.c:41 #22 0x080b5047 in main () at eval.c:41 #23 0x40126177 in __libc_start_main (main=0x80b4fa0 <main>, argc=6, ubp_av=0xbffffa1c, init=0x806486c <_init>, fini=0x81461fc <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa0c) at ../sysdeps/generic/libc-start.c:129 (gdb) > You might need to rebuild from source with --enable-debug to get > anything really useful from gdb, but let's first try it with the > executables you have. Ok, if necessary I should be able to rebuild the postgresql stuff from the source rpms... Thanks again for looking at this... Mark Nickel
Mark Nickel <mnickel@dunsirn.com> writes: > (gdb) info stack > #0 0x080f6900 in SpinAcquire () at eval.c:41 > #1 0x0807875d in _bt_getroot () at eval.c:41 > #2 0x08079e0a in _bt_search () at eval.c:41 > #3 0x0807acab in _bt_first () at eval.c:41 > #4 0x0807926a in btgettuple () at eval.c:41 Hm. Don't think I believe this backtrace very much --- _bt_getroot doesn't call SpinAcquire, at least not directly. Probably the next step should be to rebuild with debug support. However, before you do that ... it would seem that you have either a stuck-lock problem or a corrupted btree index on one of the critical system tables (pg_class, looks like). The locking issue could be checked by restarting the postmaster to see whether the problem goes away. If there's a corrupted index, you can use REINDEX DATABASE to rebuild it --- but read the directions carefully! You need to do it in a properly-started standalone backend, else you'll not get far enough to do the reindex. If neither of those things get you up and running, it's time to roll out the debugger. regards, tom lane
I did something really stupid. Our server was out of disk space and I deleted the files in the pg_xlog directory (or whatever the heck it's called). Now I can't start Postgres. Help! Also, how to deal with these logs, that grow and grow and take up space? Thank you __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com
> away. If there's a corrupted index, you can use REINDEX DATABASE > to rebuild it --- but read the directions carefully! You need to do > it in a properly-started standalone backend, else you'll not get > far enough to do the reindex. > bingo! Corrupted index it was! The problem was solved, as you said, by starting a standalone backend: /usr/bin/postgres -d2 -B 24576 -D /db/pgsql -O -P datawarehouse Once I was in, I performed a REINDEX DATABASE datawarehouse FORCE; for good measure I ran a VACUUM and was treated to several errors requesting me to recreate indices. Running REINDEX TABLE <table name>... REINDEX INDEX <index name> on those objects that VACUUM reported as needed to be recreated fixed all of my problems. For good measure I ran a vacuumdb datawarehouse as the postgres super user, after bringing the database up with pg_ctl, and all is well. thankyou thankyou thankyou! You've helped me to dramatically increase my knowledge of this database. I recognized the REINDEX command from the manual, but I really didn't know the appropiate time to use such a command. In "normal" daily database use, your indices don't get corrupted. Thanks for helping me to discover when to use this command! :) Would having the --enable-debug in the build of postgres have helped to show more information in the stack trace? Would that extra information been more helpful to determining the problem in this case? While waiting for majordomo to send your replies, I built a --enable-debug version of postgres from the 7.1 SRPM just in case the REINDEX didn't work. Fortunately I can just stash those RPM's away for a rainy day... :) I'm off to plan an upgrade to 7.1.3 :) Mark Nickel
I don't know what to tell you about fixing the deleted logs problem; but upgrading to 7.1.3 may fix the growing log problem -- it worked for me. Best of luck, Andrew Gould --- Stephanie Brail <zhaanlives@yahoo.com> wrote: > I did something really stupid. Our server was out of > disk space and I deleted the files in the pg_xlog > directory (or whatever the heck it's called). Now I > can't start Postgres. Help! > > Also, how to deal with these logs, that grow and > grow > and take up space? > > Thank you > > > __________________________________________________ > Do You Yahoo!? > Find a job, post your resume. > http://careers.yahoo.com > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org __________________________________________________ Do You Yahoo!? Find a job, post your resume. http://careers.yahoo.com
On Mon, Nov 05, 2001 at 08:57:22PM -0800, Stephanie Brail wrote: > I did something really stupid. Our server was out of > disk space and I deleted the files in the pg_xlog > directory (or whatever the heck it's called). Now I > can't start Postgres. Help! That's bad. Really bad. In recent releases there is a program called resetxlog or something which will rebuild it for you, but only back as far as the last vacuum IIRC :(. Do you have backups? > Also, how to deal with these logs, that grow and grow > and take up space? They grow a little bit for each transaction you do. In older releases the only fix was complete dump and restore of database. Apparently recent releases handle this much better. HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
I try to dump my database but I get this message: dumpIndices(): failed sanity check, table pe_toast_880262 was not found How do I go about with this? My system is a trustix 1.4.80 and I'm running postgres 7.1.3. Thanks! Carlo
Mark Nickel <mnickel@dunsirn.com> writes: > Would having the --enable-debug in the build of postgres have helped to > show more information in the stack trace? Would that extra information > been more helpful to determining the problem in this case? Yes, and maybe. If you're using gcc then there's really no reason not to --enable-debug. The executable files will balloon by several megabytes worth of debugging information, but this is just extra disk space; there's no run-time penalty to having it there, and someday you might be glad you did. (On a non-gcc compiler there may be some runtime penalty, since --enable-debug defeats code optimization in most other compilers.) > I'm off to plan an upgrade to 7.1.3 :) Good idea. I can think of several scenarios that can lead to corrupted index in 7.0.*, but we've tried to defend against each of them in 7.1. regards, tom lane
Stephanie Brail <zhaanlives@yahoo.com> writes: > I did something really stupid. Our server was out of > disk space and I deleted the files in the pg_xlog > directory (or whatever the heck it's called). Now I > can't start Postgres. Help! If you shut down the postmaster cleanly before you zapped pg_xlog, then I think it will work safely to run contrib/pg_resetxlog and start the postmaster. If you zapped pg_xlog in a running system or after a crash, you can still do the resetxlog, but there's no guarantee that your last few transactions will have applied cleanly --- you may find that some or all of the effects of "committed" xacts are missing. > Also, how to deal with these logs, that grow and grow > and take up space? 7.1.3 is better about that. regards, tom lane