Postgres "locked up" - Mailing list pgsql-general

From Eric B. Ridge
Subject Postgres "locked up"
Date
Msg-id AA2D4A31-F9C6-41F6-98F9-921B01EBBA1D@tcdi.com
Whole thread Raw
Responses Re: Postgres "locked up"  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
# select version():
 PostgreSQL 8.1.10 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.1)
(I know, I know, it's an old version of PG whose sources aren't even available today.  Nonetheless, we've had great
successwith it.) 

# uname -a
Linux servername 2.6.24.3-TCDI #16 SMP Thu Sep 24 13:52:37 EDT 2009 i686 Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
GenuineIntelGNU/Linux 

The strangest thing happened today...

Postgres locked up.  All existing backends (roughly 100) couldn't execute commands.  They'd just hang.  One random
backendI selected had this backtrace: 

(gdb) bt
#0  0xb7f7f410 in __kernel_vsyscall ()
#1  0xb7e37a6b in semop () from /lib/libc.so.6
#2  0x081916af in PGSemaphoreLock ()
#3  0x081ba3c1 in LWLockAcquire ()
#4  0x081b3b84 in ReceiveSharedInvalidMessages ()
#5  0x0822d52b in AcceptInvalidationMessages ()
#6  0x13f52016 in exec_eval_simple_expr () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#7  0x13f52336 in exec_eval_boolean () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#8  0x13f56fe7 in exec_stmt_if () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#9  0x13f5518f in exec_stmts () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#10 0x13f54fcb in exec_stmt_block () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#11 0x13f5790f in plpgsql_exec_function () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#12 0x13f4df29 in plpgsql_call_handler () from /tcdi/pgsql-8.1.10/lib/plpgsql.so
#13 0x08145190 in ExecMakeFunctionResultNoSets ()
#14 0x08141221 in ExecProject ()
#15 0x0814f2fb in ExecNestLoop ()
#16 0x08140957 in ExecProcNode ()
#17 0x08150565 in ExecSort ()
#18 0x08140993 in ExecProcNode ()
#19 0x0813fc71 in ExecutorRun ()
#20 0x081c3af3 in PortalRunSelect ()
#21 0x081c4806 in PortalRun ()
#22 0x081c1dc1 in PostgresMain ()
#23 0x08199dcb in ServerLoop ()
#24 0x0819acaa in PostmasterMain ()
#25 0x0815d3f9 in main ()

Additionally, it was not possible to fully create new connections (via psql).  The backtrace for that looked like:

(gdb) bt
#0  0xb7f7f410 in __kernel_vsyscall ()
#1  0xb7e37a6b in semop () from /lib/libc.so.6
#2  0x081916af in PGSemaphoreLock ()
#3  0x081ba3c1 in LWLockAcquire ()
#4  0x081b3c49 in InitBackendSharedInvalidationState ()
#5  0x08240b3e in InitPostgres ()
#6  0x081c115d in PostgresMain ()
#7  0x08199dcb in ServerLoop ()
#8  0x0819acaa in PostmasterMain ()
#9  0x0815d3f9 in main ()

On a whim, I ran "pg_ctl reload".  Magically, all the existing backends processed whatever pending commands they had
andthe pending new connections (via psql) connected.  Things ran for a few seconds and then it all locked up again.
Successive"pg_ctl reload"'s would cause things to limp along long enough for us to checkout pg_locks.  There was
nothingabnormal there.  Many AccessShareLocks on views/tables and a few RowExclusiveLocks on tables/indexes that were
currentlybeing updated.  All the locks were granted, nothing was conflicting. 

When Postgres woud lock, even seemingly simple things like psql's table name tab-completion would hang.  "pg_ctl
reload"would cause it to complete. 

The only thing I could think of that might cause Postgres to lockup such that even new connections couldn't be fully
establishedwas 'autovacuum', so I turned that off for now (also turned off all stats collection).  Now we're playing
thewaiting game to see if this happens again. 

Also, "select * from pg_stat_activity" showed a backend that was issuing a COMMIT for quite some time, however the 'ps'
outputline for that pid showed that it was "idle".  I've never seen those two things not match up, but maybe the stats
collectormissed some messages or something? 

I realize that we're running an old version, but reading through the release notes up to 8.1.18, there's no mention of
anythingthat seems related to this particular problem.  And it's one we've never seen.  Any of you experts out there
haveany ideas? 

Thanks!

eric


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_dump and ON DELETE CASCADE problem
Next
From: Glen Barber
Date:
Subject: Restore time differences between full database dumps and separate schema/data dumps