Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: hung backends stuck in spinlock heavy endless loop |
Date | |
Msg-id | 20150115191542.GD14782@awork2.anarazel.de Whole thread Raw |
In response to | Re: hung backends stuck in spinlock heavy endless loop (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: hung backends stuck in spinlock heavy endless loop
Re: hung backends stuck in spinlock heavy endless loop |
List | pgsql-hackers |
Hi, > The plot thickens! I looped the test, still stock 9.4 as of this time > and went to lunch. When I came back, the database was in recovery > mode. Here is the rough sequence of events. > Whoa. That looks scary. Did you see (some of) those errors before? Most of them should have been emitted independently of being built with assertions. > 1) REINDEXED pg_class (wanted clean slate for full reproduction) > 2) before too long (unfortunately did not configure timestamps in the > log) starting seeing: > > ERROR: root page 3 of index "pg_class_oid_index" has level 0, > expected 1 at character 8 > QUERY: UPDATE CDSRunTable SET > Finished = clock_timestamp(), > DidSucceed = _DidSucceed, > ErrorMessage = _ErrorMessage > WHERE CDSRunTableId = _CdsRunTableId > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 82 at > SQL statement > STATEMENT: SELECT CDSReconcileRunTable(2020) > > ..and again with a FATAL > FATAL: root page 3 of index "pg_class_oid_index" has level 0, expected 1 > ERROR: root page 3 of index "pg_class_oid_index" has level 0, expected 1 > CONTEXT: SQL statement "UPDATE CDSRunTable SET > Finished = clock_timestamp(), > DidSucceed = _DidSucceed, > ErrorMessage = _ErrorMessage > WHERE CDSRunTableId = _CdsRunTableId" > PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement Just to be absolutely clear, those happened *before* the following errors? And there were no 'during exception cleanup' like errors before? > 3) shortly (?) after that, I got: > > WARNING: did not find subXID 14955 in MyProc > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > WARNING: you don't own a lock of type RowExclusiveLock > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > LOG: could not send data to client: Broken pipe > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > STATEMENT: SELECT CDSReconcileRunTable(2151) > WARNING: ReleaseLockIfHeld: failed?? > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > ERROR: failed to re-find shared proclock object > CONTEXT: PL/pgSQL function cdsreconcileruntable(bigint) line 35 > during exception cleanup > STATEMENT: SELECT CDSReconcileRunTable(2151) > WARNING: AbortSubTransaction while in ABORT state > WARNING: did not find subXID 14955 in MyProc > WARNING: you don't own a lock of type AccessShareLock > WARNING: ReleaseLockIfHeld: failed?? > ERROR: failed to re-find shared proclock object > WARNING: AbortSubTransaction while in ABORT state > WARNING: did not find subXID 14955 in MyProc > WARNING: you don't own a lock of type AccessShareLock > WARNING: ReleaseLockIfHeld: failed?? > WARNING: you don't own a lock of type ShareLock > TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > > 0)", File: "lock.c", Line: 1240) > LOG: server process (PID 10117) was terminated by signal 6: Aborted > LOG: terminating any other active server processes Ick. Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log entries before? It's hard to know from here, but the 'during exception cleanup' indicates a problem in abort handling. Were there any deadlock detected errors closeby? You're catching deadlock errors in a subtransaction. Hm. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: