Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hung backends stuck in spinlock heavy endless loop
Date
Msg-id 20150116142227.GF16991@alap3.anarazel.de
Whole thread Raw
In response to Re: hung backends stuck in spinlock heavy endless loop  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: hung backends stuck in spinlock heavy endless loop
Re: hung backends stuck in spinlock heavy endless loop
List pgsql-hackers
Hi,

On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> >> Running this test on another set of hardware to verify -- if this
> >> turns out to be a false alarm which it may very well be, I can only
> >> offer my apologies!  I've never had a new drive fail like that, in
> >> that manner.  I'll burn the other hardware in overnight and report
> >> back.
> 
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:

Damn.

Is there any chance you can package this somehow so that others can run
it locally? It looks hard to find the actual bug here without adding
instrumentation to to postgres.

> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20
> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
>           DELETE FROM "onesitepmc"."propertyguestcard" t
>           WHERE EXISTS
>           (
>             SELECT 1 FROM "propertyguestcard_insert" d
>             WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
>           )

> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT:  PL/pgSQL
> function cdsreconciletable(text,text,text,text,boolean) line 197 at
> EXECUTE statement
>     SQL statement "SELECT        * FROM CDSReconcileTable(
>               t.CDSServer,
>               t.CDSDatabase,
>               t.SchemaName,
>               t.TableName)"
>     PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement


This was the first error? None of the 'could not find subXID' errors
beforehand?


> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING:  did not find
> subXID 7553 in MyProc
> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG:  could not send data
> to client: Broken pipe
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT:  SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:  you don't own a
> lock of type ShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR:  failed to re-find
> shared lock object
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT:  PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT:  SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> AbortSubTransaction while in ABORT state

This indicates a bug in our subtransaction abort handling. It looks to
me like there actually might be several. But it's probably a consequence
of an earlier bug. It's hard to diagnose the actual issue, because we're
not seing the original error(s) :(.

Could you add a EmitErrorReport(); before the FlushErrorState() in
pl_exec.c's exec_stmt_block()?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: hung backends stuck in spinlock heavy endless loop
Next
From: Merlin Moncure
Date:
Subject: Re: hung backends stuck in spinlock heavy endless loop