Re: hung backends stuck in spinlock heavy endless loop - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: hung backends stuck in spinlock heavy endless loop |
Date | |
Msg-id | 20150116142227.GF16991@alap3.anarazel.de Whole thread Raw |
In response to | Re: hung backends stuck in spinlock heavy endless loop (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: hung backends stuck in spinlock heavy endless loop
Re: hung backends stuck in spinlock heavy endless loop |
List | pgsql-hackers |
Hi, On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote: > On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote: > > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> Running this test on another set of hardware to verify -- if this > >> turns out to be a false alarm which it may very well be, I can only > >> offer my apologies! I've never had a new drive fail like that, in > >> that manner. I'll burn the other hardware in overnight and report > >> back. > > huh -- well possibly. not. This is on a virtual machine attached to a > SAN. It ran clean for several (this is 9.4 vanilla, asserts off, > checksums on) hours then the starting having issues: Damn. Is there any chance you can package this somehow so that others can run it locally? It looks hard to find the actual bug here without adding instrumentation to to postgres. > [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page > verification failed, calculated checksum 59143 but expected 59137 at > character 20 > [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY: > DELETE FROM "onesitepmc"."propertyguestcard" t > WHERE EXISTS > ( > SELECT 1 FROM "propertyguestcard_insert" d > WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid") > ) > [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT: PL/pgSQL > function cdsreconciletable(text,text,text,text,boolean) line 197 at > EXECUTE statement > SQL statement "SELECT * FROM CDSReconcileTable( > t.CDSServer, > t.CDSDatabase, > t.SchemaName, > t.TableName)" > PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement This was the first error? None of the 'could not find subXID' errors beforehand? > [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING: did not find > subXID 7553 in MyProc > [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG: could not send data > to client: Broken pipe > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT: SELECT > CDSReconcileRunTable(1160) > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a > lock of type AccessShareLock > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a > lock of type ShareLock > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > ReleaseLockIfHeld: failed?? > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find > shared lock object > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL > function cdsreconcileruntable(bigint) line 35 during exception cleanup > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT: SELECT > CDSReconcileRunTable(1160) > [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: > AbortSubTransaction while in ABORT state This indicates a bug in our subtransaction abort handling. It looks to me like there actually might be several. But it's probably a consequence of an earlier bug. It's hard to diagnose the actual issue, because we're not seing the original error(s) :(. Could you add a EmitErrorReport(); before the FlushErrorState() in pl_exec.c's exec_stmt_block()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: