Thread: psql hangs when trying to access a database

psql hangs when trying to access a database

From
Mark Nickel
Date:

Running into an interesting problem when trying to do a psql from the
commandline.

My infrastructure:
Intel Architecture
Redhat 7.1
PostgreSQL 7.0.3 (from stock RPM's of the 7.1 distribution)
320MB RAM

Kernel shared memory increases:
# Up the maximum shared memory values
kernel.shmall = 207618048
kernel.shmmax = 207618048

pg_ctl startup in /etc/rc.d/init.d/postgresql:
su -l postgres -c "/usr/bin/pg_ctl  -D $PGDATA -p /usr/bin/postmaster -o
'-d 5 -i -B 24576' start >/tmp/postgres.log 2>&1" < /dev/null

Database Environment
5 databases on this one server.  The one in question has about 5 tables
with 60,000 rows per table average.  The entire directory under $PGDATA
is only about 160MB


At the command line I type 'psql datawarehouse'.  I expect to see the
banner from the psql command but in this case, nothing.  The system
seems to just hang.  Running a top shows that my process is taking up
99% of the User CPU but according to iostat, there is no activity on the
harddrive.

Here's what the 'ps -awx|grep post' command shows me:
 3073 pts/1    S      0:00 /bin/sh /usr/bin/pg_ctl -D /db/pgsql -p
/usr/bin/postmaster -o -d 5 -i -B 24576 start
 3075 pts/1    S      0:00 /usr/bin/postmaster -d 5 -i -B 24576
 3085 pts/1    R      0:11 /usr/bin/postgres localhost mnickel
datawarehouse startup

generally I would expect the /usr/bin/postgres line to eventually change
to 'idle' instead of startup...

Also, pg_dump and vacuumdb do not work.  The postmaster happily spawns
off another /usr/bin/postgres, but the state never switches out of
'idle' for pg_dump or vacuumdb.

Upon inspection of the /tmp/postgres.log file, (included at the end),  I
see that the processing is hanging right after the 'InitPostgres'.
Normally there is a bunch of other "stuff" that is there when normal
entry into the psql command completes...

I have no problem using psql on any of the other 5 databases on this
server.  Everything runs smoothly.  This is the only database that is
giving us grief.  This is the first problem that we've come across in
almost a year of PostgreSQL utilization.  I'm totally baffled... :)

Thanks so much for looking at this, as I'm getting the "see, I told you
that open source/freeware stuff wasn't any good" pressure.  :)

Mark Nickel

Here is the logfile in question:

postmaster successfully started up.
FindExec: found "/usr/bin/postgres" using argv[0]
binding ShmemCreate(key=52e2c1, size=205273088)
011105.20:22:27.451  [3076] DEBUG:  Data Base System is starting up at
Mon Nov  5 20:22:27 2001
011105.20:22:27.451  [3076] DEBUG:  Data Base System was shut down at
Mon Nov  5 20:22:13 2001
011105.20:22:27.453  [3076] DEBUG:  Data Base System is in production
state at Mon Nov  5 20:22:27 2001
011105.20:22:27.453  [3076] proc_exit(0)
011105.20:22:27.454  [3076] shmem_exit(0)
011105.20:22:27.454  [3076] exit(0)
/usr/bin/postmaster: reaping dead processes...
/usr/bin/postmaster: ServerLoop:  handling reading 5
/usr/bin/postmaster: ServerLoop:  handling reading 5
/usr/bin/postmaster: ServerLoop:  handling writing 5
/usr/bin/postmaster: BackendStartup: environ dump:
-----------------------------------------
 PWD=/var/lib/pgsql
 HOSTNAME=topaz.dunsirn.com
 ODBCINI=/etc/odbc.ini
 PVM_RSH=/usr/bin/rsh
 QTDIR=/usr/lib/qt-2.3.0
 LESSOPEN=|/usr/bin/lesspipe.sh %s
 XPVM_ROOT=/usr/share/pvm3/xpvm
 KDEDIR=/usr
 USER=postgres

LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:

 MACHTYPE=i386-redhat-linux-gnu
 MAIL=/var/spool/mail/postgres
 INPUTRC=/etc/inputrc
 LANG=en_US
 LOGNAME=postgres
 SHLVL=2
 MQSERVER=OPAL.AMBER/TCP/opal(1414)
 SHELL=/bin/bash
 PGLIB=/usr/lib/pgsql
 HOSTTYPE=i386
 OSTYPE=linux-gnu
 HISTSIZE=1000
 LAMHELPFILE=/etc/lam/lam-helpfile
 PVM_ROOT=/usr/share/pvm3
 TERM=xterm
 HOME=/var/lib/pgsql
 SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
 PGDATA=/db/pgsql
 PATH=/usr/kerberos/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin

 _=/usr/bin/postmaster
 POSTPORT=5432
 POSTID=2147483646
 IPC_KEY=5432000
 PG_USER=mnickel
-----------------------------------------
/usr/bin/postmaster: BackendStartup: pid 3085 user mnickel db
datawarehouse socket 5
/usr/bin/postmaster child[3085]: starting with (/usr/bin/postgres -d5 -B
24576 -v131072 -p datawarehouse )
FindExec: found "/usr/bin/postgres" using argv[0]
011105.20:22:46.186  [3085] started: host=localhost user=mnickel
database=datawarehouse
011105.20:22:46.187  [3085] InitPostgres
011105.20:27:45.645  [3075] pmdie 15
Smart Shutdown request at Mon Nov  5 20:27:45 2001
/usr/bin/pg_ctl: line 1:  3075 Killed
/usr/bin/postmaster -d 5 -i -B 24576





Re: psql hangs when trying to access a database

From
Tom Lane
Date:
Mark Nickel <mnickel@dunsirn.com> writes:
> Running into an interesting problem when trying to do a psql from the
> commandline.

Can't say I've heard of a similar problem before.  Can you attach to the
looping process with gdb and get a stack trace?

You might need to rebuild from source with --enable-debug to get
anything really useful from gdb, but let's first try it with the
executables you have.

            regards, tom lane

Re: psql hangs when trying to access a database

From
Mark Nickel
Date:

Tom Lane wrote:

> Mark Nickel <mnickel@dunsirn.com> writes:
> > Running into an interesting problem when trying to do a psql from the
> > commandline.
>
> Can't say I've heard of a similar problem before.  Can you attach to the
> looping process with gdb and get a stack trace?

Are there utilities to retrieve data from the raw files on the harddrive, or
must everything be done via the postmaster?  Kinda like a fsck for the
database...


Doing a 'gdb'.  Once I'm in I've done a 'info stack'.   Not sure how much to
include, but here goes!

[root@topaz /db]# gdb /usr/bin/postgres 3712
GNU gdb 5.0rh-5 Red Hat Linux 7.1
Copyright 2001 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...(no debugging symbols
found)...
/db/3712: No such file or directory.
Attaching to program: /usr/bin/postgres, process 3712
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libm.so.6...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /usr/lib/libreadline.so.4.1...done.
Loaded symbols for /usr/lib/libreadline.so.4.1
Reading symbols from /lib/libtermcap.so.2...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /usr/lib/libncurses.so.5...done.
Loaded symbols for /usr/lib/libncurses.so.5
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x080f6900 in SpinAcquire () at eval.c:41
41      eval.c: No such file or directory.
        in eval.c
(gdb) info stack
#0  0x080f6900 in SpinAcquire () at eval.c:41
#1  0x0807875d in _bt_getroot () at eval.c:41
#2  0x08079e0a in _bt_search () at eval.c:41
#3  0x0807acab in _bt_first () at eval.c:41
#4  0x0807926a in btgettuple () at eval.c:41
#5  0x08071fc7 in index_getnext () at eval.c:41
#6  0x08088edb in CatalogHasIndex () at eval.c:41
#7  0x08089748 in ClassNameIndexScan () at eval.c:41
#8  0x08136d0a in RelationSetIndexSupport () at eval.c:41
#9  0x08137a0c in RelationSetIndexSupport () at eval.c:41
#10 0x08138047 in RelationNameGetRelation () at eval.c:41
#11 0x0806ff18 in heap_openr () at eval.c:41
#12 0x08135031 in float8_to_char () at eval.c:41
#13 0x08135e74 in SearchSysCache () at eval.c:41
#14 0x08139ce6 in SearchSysCacheTuple () at eval.c:41
#15 0x0813da2e in SetUserId () at eval.c:41
#16 0x0813e21d in InitPostgres () at eval.c:41
#17 0x080fe4f1 in PostgresMain () at eval.c:41
#18 0x080e5e40 in PostmasterMain () at eval.c:41
#19 0x080e59e0 in PostmasterMain () at eval.c:41
#20 0x080e4b29 in PostmasterMain () at eval.c:41
#21 0x080e44ac in PostmasterMain () at eval.c:41
#22 0x080b5047 in main () at eval.c:41
#23 0x40126177 in __libc_start_main (main=0x80b4fa0 <main>, argc=6,
ubp_av=0xbffffa1c, init=0x806486c <_init>, fini=0x81461fc <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa0c) at
../sysdeps/generic/libc-start.c:129
(gdb)


> You might need to rebuild from source with --enable-debug to get
> anything really useful from gdb, but let's first try it with the
> executables you have.

Ok, if necessary I should be able to rebuild the postgresql stuff from the
source rpms...
Thanks again for looking at this...

Mark Nickel


Re: psql hangs when trying to access a database

From
Tom Lane
Date:
Mark Nickel <mnickel@dunsirn.com> writes:
> (gdb) info stack
> #0  0x080f6900 in SpinAcquire () at eval.c:41
> #1  0x0807875d in _bt_getroot () at eval.c:41
> #2  0x08079e0a in _bt_search () at eval.c:41
> #3  0x0807acab in _bt_first () at eval.c:41
> #4  0x0807926a in btgettuple () at eval.c:41

Hm.  Don't think I believe this backtrace very much --- _bt_getroot
doesn't call SpinAcquire, at least not directly.  Probably the next
step should be to rebuild with debug support.

However, before you do that ... it would seem that you have either
a stuck-lock problem or a corrupted btree index on one of the critical
system tables (pg_class, looks like).  The locking issue could be
checked by restarting the postmaster to see whether the problem goes
away.  If there's a corrupted index, you can use REINDEX DATABASE
to rebuild it --- but read the directions carefully!  You need to do
it in a properly-started standalone backend, else you'll not get
far enough to do the reindex.

If neither of those things get you up and running, it's time to
roll out the debugger.

            regards, tom lane

deleted logs when I shouldn't have...

From
Stephanie Brail
Date:
I did something really stupid. Our server was out of
disk space and I deleted the files in the pg_xlog
directory (or whatever the heck it's called). Now I
can't start Postgres. Help!

Also, how to deal with these logs, that grow and grow
and take up space?

Thank you


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

Re: psql hangs when trying to access a database

From
Mark Nickel
Date:
> away.  If there's a corrupted index, you can use REINDEX DATABASE
> to rebuild it --- but read the directions carefully!  You need to do
> it in a properly-started standalone backend, else you'll not get
> far enough to do the reindex.
>

bingo!  Corrupted index it was!

The problem was solved, as you said, by starting a standalone backend:
/usr/bin/postgres -d2 -B 24576 -D /db/pgsql -O -P datawarehouse

Once I was in, I performed  a
REINDEX DATABASE datawarehouse FORCE;

for good measure I ran a VACUUM and was treated to several errors
requesting me to recreate indices.

Running REINDEX TABLE <table name>... REINDEX INDEX <index name> on those
objects that VACUUM reported as needed to be recreated fixed all of my
problems.

For good measure I ran a vacuumdb datawarehouse as the postgres super
user, after bringing the database up with pg_ctl, and all is well.

thankyou thankyou thankyou!  You've helped me to dramatically increase my
knowledge of this database.  I recognized the REINDEX command from the
manual, but I really didn't know the appropiate time to use such a
command.  In "normal" daily database use, your indices don't get
corrupted.  Thanks for helping me to discover when to use this command!
:)

Would having the --enable-debug in the build of postgres have helped to
show more information in the stack trace?  Would that extra information
been more helpful to determining the problem in this case?

While waiting for majordomo to send your replies, I built a --enable-debug
version of postgres from the 7.1 SRPM just in case the REINDEX didn't
work.  Fortunately I can just stash those RPM's away for a rainy day... :)

I'm off to plan an upgrade to 7.1.3  :)
Mark Nickel



Re: deleted logs when I shouldn't have...

From
Andrew Gould
Date:
I don't know what to tell you about fixing the deleted
logs problem; but upgrading to 7.1.3 may fix the
growing log problem -- it worked for me.

Best of luck,

Andrew Gould

--- Stephanie Brail <zhaanlives@yahoo.com> wrote:
> I did something really stupid. Our server was out of
> disk space and I deleted the files in the pg_xlog
> directory (or whatever the heck it's called). Now I
> can't start Postgres. Help!
>
> Also, how to deal with these logs, that grow and
> grow
> and take up space?
>
> Thank you
>
>
> __________________________________________________
> Do You Yahoo!?
> Find a job, post your resume.
> http://careers.yahoo.com
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com

Re: deleted logs when I shouldn't have...

From
Martijn van Oosterhout
Date:
On Mon, Nov 05, 2001 at 08:57:22PM -0800, Stephanie Brail wrote:
> I did something really stupid. Our server was out of
> disk space and I deleted the files in the pg_xlog
> directory (or whatever the heck it's called). Now I
> can't start Postgres. Help!

That's bad. Really bad. In recent releases there is a program called
resetxlog or something which will rebuild it for you, but only back as far
as the last vacuum IIRC :(. Do you have backups?

> Also, how to deal with these logs, that grow and grow
> and take up space?

They grow a little bit for each transaction you do. In older releases the
only fix was complete dump and restore of database. Apparently recent
releases handle this much better.

HTH,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

pg_dump error

From
"Carlo F. Florendo"
Date:
I try to dump my database but I get this message:
dumpIndices(): failed sanity check, table pe_toast_880262 was not found

How do I go about with this?  My system is a trustix 1.4.80 and I'm running
postgres 7.1.3.

Thanks!

Carlo


Re: psql hangs when trying to access a database

From
Tom Lane
Date:
Mark Nickel <mnickel@dunsirn.com> writes:
> Would having the --enable-debug in the build of postgres have helped to
> show more information in the stack trace?  Would that extra information
> been more helpful to determining the problem in this case?

Yes, and maybe.  If you're using gcc then there's really no reason
not to --enable-debug.  The executable files will balloon by several
megabytes worth of debugging information, but this is just extra disk
space; there's no run-time penalty to having it there, and someday you
might be glad you did.  (On a non-gcc compiler there may be some runtime
penalty, since --enable-debug defeats code optimization in most other
compilers.)

> I'm off to plan an upgrade to 7.1.3  :)

Good idea.  I can think of several scenarios that can lead to corrupted
index in 7.0.*, but we've tried to defend against each of them in 7.1.

            regards, tom lane

Re: deleted logs when I shouldn't have...

From
Tom Lane
Date:
Stephanie Brail <zhaanlives@yahoo.com> writes:
> I did something really stupid. Our server was out of
> disk space and I deleted the files in the pg_xlog
> directory (or whatever the heck it's called). Now I
> can't start Postgres. Help!

If you shut down the postmaster cleanly before you zapped pg_xlog, then
I think it will work safely to run contrib/pg_resetxlog and start the
postmaster.

If you zapped pg_xlog in a running system or after a crash, you can
still do the resetxlog, but there's no guarantee that your last few
transactions will have applied cleanly --- you may find that some or
all of the effects of "committed" xacts are missing.

> Also, how to deal with these logs, that grow and grow
> and take up space?

7.1.3 is better about that.

            regards, tom lane