Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hung backends stuck in spinlock heavy endless loop
Date
Msg-id 20150115191542.GD14782@awork2.anarazel.de
Whole thread Raw
In response to Re: hung backends stuck in spinlock heavy endless loop  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: hung backends stuck in spinlock heavy endless loop
Re: hung backends stuck in spinlock heavy endless loop
List pgsql-hackers
Hi,

> The plot thickens!  I looped the test, still stock 9.4 as of this time
> and went to lunch. When I came back, the database was in recovery
> mode.  Here is the rough sequence of events.
>

Whoa. That looks scary. Did you see (some of) those errors before? Most
of them should have been emitted independently of being built with
assertions.

> 1) REINDEXED pg_class (wanted clean slate for full reproduction)
> 2) before too long (unfortunately did not configure timestamps in the
> log) starting seeing:
> 
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0,
> expected 1 at character 8
> QUERY:  UPDATE CDSRunTable SET
>           Finished = clock_timestamp(),
>           DidSucceed = _DidSucceed,
>           ErrorMessage = _ErrorMessage
>         WHERE CDSRunTableId = _CdsRunTableId
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 82 at
> SQL statement
> STATEMENT:  SELECT CDSReconcileRunTable(2020)
> 
> ..and again with a FATAL
> FATAL:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> ERROR:  root page 3 of index "pg_class_oid_index" has level 0, expected 1
> CONTEXT:  SQL statement "UPDATE CDSRunTable SET
>           Finished = clock_timestamp(),
>           DidSucceed = _DidSucceed,
>           ErrorMessage = _ErrorMessage
>         WHERE CDSRunTableId = _CdsRunTableId"
>     PL/pgSQL function cdsreconcileruntable(bigint) line 82 at SQL statement

Just to be absolutely clear, those happened *before* the following
errors? And there were no 'during exception cleanup' like errors before?

> 3) shortly (?) after that, I got:
> 
> WARNING:  did not find subXID 14955 in MyProc
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> WARNING:  you don't own a lock of type RowExclusiveLock
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> LOG:  could not send data to client: Broken pipe
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  ReleaseLockIfHeld: failed??
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> ERROR:  failed to re-find shared proclock object
> CONTEXT:  PL/pgSQL function cdsreconcileruntable(bigint) line 35
> during exception cleanup
> STATEMENT:  SELECT CDSReconcileRunTable(2151)
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> ERROR:  failed to re-find shared proclock object
> WARNING:  AbortSubTransaction while in ABORT state
> WARNING:  did not find subXID 14955 in MyProc
> WARNING:  you don't own a lock of type AccessShareLock
> WARNING:  ReleaseLockIfHeld: failed??
> WARNING:  you don't own a lock of type ShareLock
> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode]
> > 0)", File: "lock.c", Line: 1240)
> LOG:  server process (PID 10117) was terminated by signal 6: Aborted
> LOG:  terminating any other active server processes

Ick.

Were there any 'LOG: Handling deadlock detected on CdsRunTableId' log
entries before? It's hard to know from here, but the 'during exception
cleanup' indicates a problem in abort handling.  Were there any deadlock
detected errors closeby?

You're catching deadlock errors in a subtransaction. Hm.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: XLOG_PARAMETER_CHANGE handling of wal_log_hints
Next
From: Andres Freund
Date:
Subject: Re: hung backends stuck in spinlock heavy endless loop