Odd db lockup - investigation advice wanted - Mailing list pgsql-hackers
From | Marc Munro |
---|---|
Subject | Odd db lockup - investigation advice wanted |
Date | |
Msg-id | 1131386995.28004.24.camel@bloodnok.com Whole thread Raw |
Responses |
Re: Odd db lockup - investigation advice wanted
|
List | pgsql-hackers |
Last week I managed to lock-up and then crash a development database. I'm going to try to reproduce it today and would like to know what I can do to further investigate the problem. I am running Linux 2.6.9-11.ELsmp X86_64 on a Quad Dual-Core Opteron I have the following postgres RPMs installed: postgresql-libs-7.4.7-2.RHEL4.1 postgresql-contrib-8.0.3-1PGDG postgresql-libs-8.0.3-1PGDG postgresql-server-8.0.3-1PGDG postgresql-8.0.3-1PGDG PGDATA is installed on a Netapp network storage device. We are using slony 1.1.0 for replication. The (provider) database locked-up after I killed a slony client process (kill -9) on the subscriber. Psql connections would not respond to \d and simply locked up. I was able to run a query to check for blocking locks - this returned no rows. There was a significant test load on the database at the time. I stopped the database but was unable to restart it. I was unable to kill a number of postgres processes and could not release postgres shared memory. Having decided that the database was toast, I discovered that I could not even delete the database files, and eventually the only solution was a full reboot. This hardware was destined to be put into production in the next two weeks but this crash has shaken our confidence somewhat. Any advice on how to further investigate this would be much appreciated Here is an exerpt from the logs at the time of the failure: LOG: duration: 4143.996 ms statement: execute wibble_transaction ( '2012416', '3410660', '2005-11-04 17:39:49 -0600', '20005', '3', '0', '{22000,22011,22001,22002,22003,22004,22005,22006,22007,22008,22009,22010}', '{8,0,0,0,2,1,0,7,2,23,26,0}' ) LOG: duration: 4814.012 ms statement: execute write_wibble ( '2048847', '{{15000,17660,0},{15001,3522,0},{15002,0,0},{15003,3851,0},{15004,0,0},{15005,0,0},{15006,0,0},{15007,0,0},{15011,0,0},{15012,0,0},{15013,0,0},{15014,0,0},{15015,0,0},{15016,0,0},{15017,0,0},{15018,0,0},{15019,0,0},{15020,0,0},{15021,0,0},{15022,0,0},{15023,0,0},{15024,0,0},{15025,0,0},{15026,0,0},{15048,0,0},{15030,0,0},{15031,0,0},{15032,0,0},{15033,0,0},{15035,0,0},{15036,0,0},{15037,0,0},{15038,0,0},{15040,0,0},{15041,1,0},{15042,0,0},{15043,0,0},{15046,0,0},{15047,0,0},{15049,1000,0},{15050,0,0},{15051,0,0},{15052,0,0}}' ) LOG: unexpected EOF on client connection LOG: received fast shutdown request LOG: aborting any active transactions LOG: received immediate shutdown request WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. My lock checking query: select d.datname || '.' || c.relname as object, l.transaction as trans, l.pid, l.mode, case when b.blocked thenb.blocker else null end as blocker from (select w.pid as pid, h.pid as blocker, 't'::bool as blockedfrom pg_locks h, pg_locks wwhere h.granted and notw.granted and ( (h.relation = w.relation and h.database = w.database) or h.transaction = w.transaction) unionselect h.pid, null, 'f'::bool as blockedfrom pg_locks h, pg_locks wwhere h.granted and not w.granted and ( (h.relation = w.relation and h.database = w.database) or h.transaction = w.transaction) ) b, pg_locksl left outer join pg_database d on d.oid = l.database left outer join pg_class c on c.oid = l.relation where l.pid = b.pid order by l.pid; Thanks for any suggestions. __ Marc
pgsql-hackers by date: