Thread: REINDEX deadlock - Postgresql -9.1
We are hitting a situation where REINDEX is resulting in postgresql to go to dead lock state* for ever*. On debugging the issue we found that 3 connections are going in to some dead lock state. 1. *idle in transaction * 2. *REINDEX waiting * 3. *SELECT waiting * All these connections are made in the same minute. Once in deadlock state we are not able to make new connections to db.(So not able to view pg_locks also). New connections appears as *'startup waiting*' in *ps* output. Initially we suspected *<idle in transaction>* is the result of not closing a connection. But it seems it got stuck after creating a connection and is not able to proceed. Any clues .. Thanks Anoop
Anoop K wrote: > We are hitting a situation where REINDEX is resulting in postgresql to go= to dead lock state for ever. > On debugging the issue we found that > 3 connections are going in to some dead lock state. >=20 > 1. idle in transaction > 2. REINDEX waiting > 3. SELECT waiting >=20 > All these connections are made in the same minute. Once in deadlock state= we are not able to make new > connections to db.(So not able to view pg_locks also). New connections ap= pears as 'startup waiting' in > ps output. Initially we suspected <idle in transaction> is the result of = not closing a connection. But > it seems it got stuck after creating a connection and is not able to proc= eed. >=20 > Any clues .. Check the contents of pg_locks: What locks does the "idle in transaction" session hold? Who holds the locks that block SELECT, REINDEX and new connections? Turn on log_statement=3D'all' to see what the "idle in transaction" session did since it started. Yours, Laurenz Albe
I have the setup in problem state. But I am not able to make psql connections to view the lock details. psql connections are hanging. Is there any other info which can be collected in this state ? Also we don't know the steps to reproduce the issue. On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at>wrote: > Anoop K wrote: > > We are hitting a situation where REINDEX is resulting in postgresql to > go to dead lock state for ever. > > On debugging the issue we found that > > 3 connections are going in to some dead lock state. > > > > 1. idle in transaction > > 2. REINDEX waiting > > 3. SELECT waiting > > > > All these connections are made in the same minute. Once in deadlock > state we are not able to make new > > connections to db.(So not able to view pg_locks also). New connections > appears as 'startup waiting' in > > ps output. Initially we suspected <idle in transaction> is the result of > not closing a connection. But > > it seems it got stuck after creating a connection and is not able to > proceed. > > > > Any clues .. > > Check the contents of pg_locks: > What locks does the "idle in transaction" session hold? > Who holds the locks that block SELECT, REINDEX and new connections? > > Turn on log_statement='all' to see what the "idle in transaction" > session did since it started. > > Yours, > Laurenz Albe >
On Thu, Feb 7, 2013 at 2:08 PM, Anoop K <anoopk6@gmail.com> wrote: > I have the setup in problem state. But I am not able to make psql > connections to view the lock details. > psql connections are hanging. Is there any other info which can be collected > in this state ? > Try attaching each process involved in the deadlock to gdb and print the call stack. That may or may not be useful, but given your situation I wonder if you have a deadlock at LWLock level. Do you have any external module installed ? Or any triggers written in C ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
It sounds like you're running out of connections. Have you tried connecting as postgres? It has 2 or 3 superuser connections reserved by default. On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: > I have the setup in problem state. But I am not able to make psql > connections to view the lock details. > psql connections are hanging. Is there any other info which can be collected > in this state ? > > Also we don't know the steps to reproduce the issue. > > > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at> > wrote: >> >> Anoop K wrote: >> > We are hitting a situation where REINDEX is resulting in postgresql to >> > go to dead lock state for ever. >> > On debugging the issue we found that >> > 3 connections are going in to some dead lock state. >> > >> > 1. idle in transaction >> > 2. REINDEX waiting >> > 3. SELECT waiting >> > >> > All these connections are made in the same minute. Once in deadlock >> > state we are not able to make new >> > connections to db.(So not able to view pg_locks also). New connections >> > appears as 'startup waiting' in >> > ps output. Initially we suspected <idle in transaction> is the result of >> > not closing a connection. But >> > it seems it got stuck after creating a connection and is not able to >> > proceed. >> > >> > Any clues .. >> >> Check the contents of pg_locks: >> What locks does the "idle in transaction" session hold? >> Who holds the locks that block SELECT, REINDEX and new connections? >> >> Turn on log_statement='all' to see what the "idle in transaction" >> session did since it started. >> >> Yours, >> Laurenz Albe > > -- To understand recursion, one must first understand recursion.
We did run out of conns as our processes which tried to connect (over few days) got hung in '*startup waiting state'. *Even superuser conns are also over. Thanks Anoop On Thu, Feb 7, 2013 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote: > It sounds like you're running out of connections. Have you tried > connecting as postgres? It has 2 or 3 superuser connections reserved > by default. > > On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: > > I have the setup in problem state. But I am not able to make psql > > connections to view the lock details. > > psql connections are hanging. Is there any other info which can be > collected > > in this state ? > > > > Also we don't know the steps to reproduce the issue. > > > > > > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at> > > wrote: > >> > >> Anoop K wrote: > >> > We are hitting a situation where REINDEX is resulting in postgresql to > >> > go to dead lock state for ever. > >> > On debugging the issue we found that > >> > 3 connections are going in to some dead lock state. > >> > > >> > 1. idle in transaction > >> > 2. REINDEX waiting > >> > 3. SELECT waiting > >> > > >> > All these connections are made in the same minute. Once in deadlock > >> > state we are not able to make new > >> > connections to db.(So not able to view pg_locks also). New connections > >> > appears as 'startup waiting' in > >> > ps output. Initially we suspected <idle in transaction> is the result > of > >> > not closing a connection. But > >> > it seems it got stuck after creating a connection and is not able to > >> > proceed. > >> > > >> > Any clues .. > >> > >> Check the contents of pg_locks: > >> What locks does the "idle in transaction" session hold? > >> Who holds the locks that block SELECT, REINDEX and new connections? > >> > >> Turn on log_statement='all' to see what the "idle in transaction" > >> session did since it started. > >> > >> Yours, > >> Laurenz Albe > > > > > > > > -- > To understand recursion, one must first understand recursion. >
I dont have C triggers. I can attach gdb and get stacktrace. Wondering if it will take the processes out of problem state. Thanks Anoop On Thu, Feb 7, 2013 at 3:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com>wrote: > On Thu, Feb 7, 2013 at 2:08 PM, Anoop K <anoopk6@gmail.com> wrote: > > I have the setup in problem state. But I am not able to make psql > > connections to view the lock details. > > psql connections are hanging. Is there any other info which can be > collected > > in this state ? > > > > Try attaching each process involved in the deadlock to gdb and print > the call stack. That may or may not be useful, but given your > situation I wonder if you have a deadlock at LWLock level. Do you have > any external module installed ? Or any triggers written in C ? > > Thanks, > Pavan > > -- > Pavan Deolasee > http://www.linkedin.com/in/pavandeolasee >
So have you tried connecting as a superuser? On Thu, Feb 7, 2013 at 3:19 AM, Anoop K <anoopk6@gmail.com> wrote: > We did run out of conns as our processes which tried to connect (over few > days) got hung in 'startup waiting state'. Even superuser conns are also > over. > > Thanks > Anoop > > > On Thu, Feb 7, 2013 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> It sounds like you're running out of connections. Have you tried >> connecting as postgres? It has 2 or 3 superuser connections reserved >> by default. >> >> On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: >> > I have the setup in problem state. But I am not able to make psql >> > connections to view the lock details. >> > psql connections are hanging. Is there any other info which can be >> > collected >> > in this state ? >> > >> > Also we don't know the steps to reproduce the issue. >> > >> > >> > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at> >> > wrote: >> >> >> >> Anoop K wrote: >> >> > We are hitting a situation where REINDEX is resulting in postgresql >> >> > to >> >> > go to dead lock state for ever. >> >> > On debugging the issue we found that >> >> > 3 connections are going in to some dead lock state. >> >> > >> >> > 1. idle in transaction >> >> > 2. REINDEX waiting >> >> > 3. SELECT waiting >> >> > >> >> > All these connections are made in the same minute. Once in deadlock >> >> > state we are not able to make new >> >> > connections to db.(So not able to view pg_locks also). New >> >> > connections >> >> > appears as 'startup waiting' in >> >> > ps output. Initially we suspected <idle in transaction> is the result >> >> > of >> >> > not closing a connection. But >> >> > it seems it got stuck after creating a connection and is not able to >> >> > proceed. >> >> > >> >> > Any clues .. >> >> >> >> Check the contents of pg_locks: >> >> What locks does the "idle in transaction" session hold? >> >> Who holds the locks that block SELECT, REINDEX and new connections? >> >> >> >> Turn on log_statement='all' to see what the "idle in transaction" >> >> session did since it started. >> >> >> >> Yours, >> >> Laurenz Albe >> > >> > >> >> >> >> -- >> To understand recursion, one must first understand recursion. > > -- To understand recursion, one must first understand recursion.
Actually some of our processes connect as superuser. So even that is over and is in hung state. On Thu, Feb 7, 2013 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote: > So have you tried connecting as a superuser? > > On Thu, Feb 7, 2013 at 3:19 AM, Anoop K <anoopk6@gmail.com> wrote: > > We did run out of conns as our processes which tried to connect (over few > > days) got hung in 'startup waiting state'. Even superuser conns are also > > over. > > > > Thanks > > Anoop > > > > > > On Thu, Feb 7, 2013 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> > > wrote: > >> > >> It sounds like you're running out of connections. Have you tried > >> connecting as postgres? It has 2 or 3 superuser connections reserved > >> by default. > >> > >> On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: > >> > I have the setup in problem state. But I am not able to make psql > >> > connections to view the lock details. > >> > psql connections are hanging. Is there any other info which can be > >> > collected > >> > in this state ? > >> > > >> > Also we don't know the steps to reproduce the issue. > >> > > >> > > >> > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz <laurenz.albe@wien.gv.at > > > >> > wrote: > >> >> > >> >> Anoop K wrote: > >> >> > We are hitting a situation where REINDEX is resulting in postgresql > >> >> > to > >> >> > go to dead lock state for ever. > >> >> > On debugging the issue we found that > >> >> > 3 connections are going in to some dead lock state. > >> >> > > >> >> > 1. idle in transaction > >> >> > 2. REINDEX waiting > >> >> > 3. SELECT waiting > >> >> > > >> >> > All these connections are made in the same minute. Once in deadlock > >> >> > state we are not able to make new > >> >> > connections to db.(So not able to view pg_locks also). New > >> >> > connections > >> >> > appears as 'startup waiting' in > >> >> > ps output. Initially we suspected <idle in transaction> is the > result > >> >> > of > >> >> > not closing a connection. But > >> >> > it seems it got stuck after creating a connection and is not able > to > >> >> > proceed. > >> >> > > >> >> > Any clues .. > >> >> > >> >> Check the contents of pg_locks: > >> >> What locks does the "idle in transaction" session hold? > >> >> Who holds the locks that block SELECT, REINDEX and new connections? > >> >> > >> >> Turn on log_statement='all' to see what the "idle in transaction" > >> >> session did since it started. > >> >> > >> >> Yours, > >> >> Laurenz Albe > >> > > >> > > >> > >> > >> > >> -- > >> To understand recursion, one must first understand recursion. > > > > > > > > -- > To understand recursion, one must first understand recursion. >
Processes should always connect by some other role with suspendable superuser connections for situations like this. Do your processes really need superuser access all the time? If you could turn it off for a bit you could get into your database and troubleshoot from there first. Not being able to connect to your db because you ran out of superuser connections is a bad thing. On Thu, Feb 7, 2013 at 4:00 AM, Anoop K <anoopk6@gmail.com> wrote: > Actually some of our processes connect as superuser. So even that is over > and is in hung state. > > > On Thu, Feb 7, 2013 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com> > wrote: >> >> So have you tried connecting as a superuser? >> >> On Thu, Feb 7, 2013 at 3:19 AM, Anoop K <anoopk6@gmail.com> wrote: >> > We did run out of conns as our processes which tried to connect (over >> > few >> > days) got hung in 'startup waiting state'. Even superuser conns are also >> > over. >> > >> > Thanks >> > Anoop >> > >> > >> > On Thu, Feb 7, 2013 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com> >> > wrote: >> >> >> >> It sounds like you're running out of connections. Have you tried >> >> connecting as postgres? It has 2 or 3 superuser connections reserved >> >> by default. >> >> >> >> On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: >> >> > I have the setup in problem state. But I am not able to make psql >> >> > connections to view the lock details. >> >> > psql connections are hanging. Is there any other info which can be >> >> > collected >> >> > in this state ? >> >> > >> >> > Also we don't know the steps to reproduce the issue. >> >> > >> >> > >> >> > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz >> >> > <laurenz.albe@wien.gv.at> >> >> > wrote: >> >> >> >> >> >> Anoop K wrote: >> >> >> > We are hitting a situation where REINDEX is resulting in >> >> >> > postgresql >> >> >> > to >> >> >> > go to dead lock state for ever. >> >> >> > On debugging the issue we found that >> >> >> > 3 connections are going in to some dead lock state. >> >> >> > >> >> >> > 1. idle in transaction >> >> >> > 2. REINDEX waiting >> >> >> > 3. SELECT waiting >> >> >> > >> >> >> > All these connections are made in the same minute. Once in >> >> >> > deadlock >> >> >> > state we are not able to make new >> >> >> > connections to db.(So not able to view pg_locks also). New >> >> >> > connections >> >> >> > appears as 'startup waiting' in >> >> >> > ps output. Initially we suspected <idle in transaction> is the >> >> >> > result >> >> >> > of >> >> >> > not closing a connection. But >> >> >> > it seems it got stuck after creating a connection and is not able >> >> >> > to >> >> >> > proceed. >> >> >> > >> >> >> > Any clues .. >> >> >> >> >> >> Check the contents of pg_locks: >> >> >> What locks does the "idle in transaction" session hold? >> >> >> Who holds the locks that block SELECT, REINDEX and new connections? >> >> >> >> >> >> Turn on log_statement='all' to see what the "idle in transaction" >> >> >> session did since it started. >> >> >> >> >> >> Yours, >> >> >> Laurenz Albe >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> To understand recursion, one must first understand recursion. >> > >> > >> >> >> >> -- >> To understand recursion, one must first understand recursion. > > -- To understand recursion, one must first understand recursion.
I will try. Here are the gdb stacktraces of hung processes. <idle in transaction> #0 0x00007fbdfaceb3e2 in recv () from /lib64/libc.so.6 #1 0x000000000058bde6 in secure_read () #2 0x000000000059697b in ?? () #3 0x0000000000596d7b in pq_getbyte () #4 0x00000000006334af in PostgresMain () #5 0x00000000005f4d69 in ?? () #6 0x00000000005f7501 in PostmasterMain () #7 0x0000000000598df0 in main () REINDEX: ------- #0 0x00007fbdfacecca7 in semop () from /lib64/libc.so.6 #1 0x00000000005e8927 in PGSemaphoreLock () #2 0x0000000000624821 in ProcSleep () #3 0x000000000062145c in ?? () #4 0x0000000000622c6a in LockAcquireExtended () #5 0x0000000000620518 in LockRelationOid () #6 0x000000000046efe5 in relation_open () #7 0x0000000000476bc3 in index_open () #8 0x00000000004b8b14 in reindex_index () #9 0x00000000004b8f42 in reindex_relation () #10 0x000000000052d223 in ReindexDatabase () #11 0x00000000006347f7 in ?? () #12 0x000000000063580d in ?? () #13 0x0000000000635f92 in PortalRun () #14 0x00000000006325db in ?? () #15 0x0000000000633553 in PostgresMain () #16 0x00000000005f4d69 in ?? () #17 0x00000000005f7501 in PostmasterMain () #18 0x0000000000598df0 in main () SELECT BLOCKING --------------- #0 0x00007fbdfacecca7 in semop () from /lib64/libc.so.6 #1 0x00000000005e8927 in PGSemaphoreLock () #2 0x0000000000624821 in ProcSleep () #3 0x000000000062145c in ?? () #4 0x0000000000622c6a in LockAcquireExtended () #5 0x0000000000620518 in LockRelationOid () #6 0x000000000046efe5 in relation_open () #7 0x0000000000476bc3 in index_open () #8 0x00000000005e47d1 in get_relation_info () #9 0x00000000005e67e9 in build_simple_rel () #10 0x00000000005cf6e2 in add_base_rels_to_query () #11 0x00000000005cf6e2 in add_base_rels_to_query () #12 0x00000000005cf733 in add_base_rels_to_query () #13 0x00000000005d0286 in query_planner () #14 0x00000000005d1dab in ?? () #15 0x00000000005d3866 in subquery_planner () #16 0x00000000005d3b20 in standard_planner () #17 0x00000000006315fa in pg_plan_query () #18 0x00000000006316e4 in pg_plan_queries () #19 0x00000000006326f2 in ?? () #20 0x0000000000633553 in PostgresMain () #21 0x00000000005f4d69 in ?? () #22 0x00000000005f7501 in PostmasterMain () #23 0x0000000000598df0 in main () NEW CONN -------- (gdb) bt #0 0x00007fbdfacecca7 in semop () from /lib64/libc.so.6 #1 0x00000000005e8927 in PGSemaphoreLock () #2 0x0000000000624821 in ProcSleep () #3 0x000000000062145c in ?? () #4 0x0000000000622c6a in LockAcquireExtended () #5 0x0000000000620518 in LockRelationOid () #6 0x000000000046efe5 in relation_open () #7 0x0000000000476bc3 in index_open () #8 0x00000000006dd185 in InitCatCachePhase2 () #9 0x00000000006e74ed in InitCatalogCachePhase2 () #10 0x00000000006e549a in RelationCacheInitializePhase3 () #11 0x00000000006fba51 in InitPostgres () #12 0x0000000000633153 in PostgresMain () #13 0x00000000005f4d69 in ?? () #14 0x00000000005f7501 in PostmasterMain () #15 0x0000000000598df0 in main () On Thu, Feb 7, 2013 at 4:37 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote: > Processes should always connect by some other role with suspendable > superuser connections for situations like this. Do your processes > really need superuser access all the time? If you could turn it off > for a bit you could get into your database and troubleshoot from there > first. Not being able to connect to your db because you ran out of > superuser connections is a bad thing. > > On Thu, Feb 7, 2013 at 4:00 AM, Anoop K <anoopk6@gmail.com> wrote: > > Actually some of our processes connect as superuser. So even that is over > > and is in hung state. > > > > > > On Thu, Feb 7, 2013 at 4:29 PM, Scott Marlowe <scott.marlowe@gmail.com> > > wrote: > >> > >> So have you tried connecting as a superuser? > >> > >> On Thu, Feb 7, 2013 at 3:19 AM, Anoop K <anoopk6@gmail.com> wrote: > >> > We did run out of conns as our processes which tried to connect (over > >> > few > >> > days) got hung in 'startup waiting state'. Even superuser conns are > also > >> > over. > >> > > >> > Thanks > >> > Anoop > >> > > >> > > >> > On Thu, Feb 7, 2013 at 3:37 PM, Scott Marlowe < > scott.marlowe@gmail.com> > >> > wrote: > >> >> > >> >> It sounds like you're running out of connections. Have you tried > >> >> connecting as postgres? It has 2 or 3 superuser connections reserved > >> >> by default. > >> >> > >> >> On Thu, Feb 7, 2013 at 1:38 AM, Anoop K <anoopk6@gmail.com> wrote: > >> >> > I have the setup in problem state. But I am not able to make psql > >> >> > connections to view the lock details. > >> >> > psql connections are hanging. Is there any other info which can be > >> >> > collected > >> >> > in this state ? > >> >> > > >> >> > Also we don't know the steps to reproduce the issue. > >> >> > > >> >> > > >> >> > On Thu, Feb 7, 2013 at 1:23 PM, Albe Laurenz > >> >> > <laurenz.albe@wien.gv.at> > >> >> > wrote: > >> >> >> > >> >> >> Anoop K wrote: > >> >> >> > We are hitting a situation where REINDEX is resulting in > >> >> >> > postgresql > >> >> >> > to > >> >> >> > go to dead lock state for ever. > >> >> >> > On debugging the issue we found that > >> >> >> > 3 connections are going in to some dead lock state. > >> >> >> > > >> >> >> > 1. idle in transaction > >> >> >> > 2. REINDEX waiting > >> >> >> > 3. SELECT waiting > >> >> >> > > >> >> >> > All these connections are made in the same minute. Once in > >> >> >> > deadlock > >> >> >> > state we are not able to make new > >> >> >> > connections to db.(So not able to view pg_locks also). New > >> >> >> > connections > >> >> >> > appears as 'startup waiting' in > >> >> >> > ps output. Initially we suspected <idle in transaction> is the > >> >> >> > result > >> >> >> > of > >> >> >> > not closing a connection. But > >> >> >> > it seems it got stuck after creating a connection and is not > able > >> >> >> > to > >> >> >> > proceed. > >> >> >> > > >> >> >> > Any clues .. > >> >> >> > >> >> >> Check the contents of pg_locks: > >> >> >> What locks does the "idle in transaction" session hold? > >> >> >> Who holds the locks that block SELECT, REINDEX and new > connections? > >> >> >> > >> >> >> Turn on log_statement='all' to see what the "idle in transaction" > >> >> >> session did since it started. > >> >> >> > >> >> >> Yours, > >> >> >> Laurenz Albe > >> >> > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> To understand recursion, one must first understand recursion. > >> > > >> > > >> > >> > >> > >> -- > >> To understand recursion, one must first understand recursion. > > > > > > > > -- > To understand recursion, one must first understand recursion. >
Note that if those processes use persistent connections you'll need to restart them to free up the connections. In the meantime you can use the step of renaming your superuser account. Then cutting all superuser conns and turning off superuser of postgres user temporarily. I wouldn't do this all night or anything, as autovac usually connects as postgres to do its thing.
On Wed, Feb 6, 2013 at 11:55 PM, Anoop K <anoopk6@gmail.com> wrote: > We are hitting a situation where REINDEX is resulting in postgresql to go to > dead lock state for ever. On debugging the issue we found that > 3 connections are going in to some dead lock state. > > idle in transaction > REINDEX waiting > SELECT waiting > > All these connections are made in the same minute. Once in deadlock state we > are not able to make new connections to db.(So not able to view pg_locks > also). New connections appears as 'startup waiting' in ps output. Initially > we suspected <idle in transaction> is the result of not closing a > connection. But it seems it got stuck after creating a connection and is not > able to proceed. This may or may not be a deadlock. Unless you've got a circle it's not a deadlock, it's just a "cascading lock overloading your connection limit" failure. You can get these with slony and vacuums and ddl. Say I want to run a DDL script. Someone is running vacuum (could be autovac process). I run slony execute to run ddl and it waits with hard table locks, and all the updates stall behind that. Your db then runs out of connections. What we need to know is what that idle in transaction is just sitting there waiting to do, which is usually a combination of db state and application state. As a short term fix you can set some reasonable statement level timeout on the reindex's connection, user or database. If no reindex ever takes more than a minute and you give it 5 minutes and check the logs for it you can see how often it fails (once every month or once every minute you check for a while etc then you could set that user's connect. If that user is the superuser things become problematic.
Anoop K <anoopk6@gmail.com> wrote: >I will try. Here are the gdb stacktraces of hung processes. > > ><idle in transaction> Have you tried `kill -SIGTERM` on the "idle in transaction" pid? -Kevin
In an attempt to get access, I ended up killing a postgres process and the whole thing *recovered from hang* state. Now don't have more data points to debug. I feel the trigger is the connection in <*idle in transaction>* state. On examining the application side(Java) stacktrace, I found that other end is trying to make connection. at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143) at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112) at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71) at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:272) at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:269) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:106) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:64) at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:123) at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:28) at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:20) at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:30) at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:22) at org.postgresql.Driver.makeConnection(Driver.java:391) at org.postgresql.Driver.connect(Driver.java:265) On Thu, Feb 7, 2013 at 4:52 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote: > On Wed, Feb 6, 2013 at 11:55 PM, Anoop K <anoopk6@gmail.com> wrote: > > We are hitting a situation where REINDEX is resulting in postgresql to > go to > > dead lock state for ever. On debugging the issue we found that > > 3 connections are going in to some dead lock state. > > > > idle in transaction > > REINDEX waiting > > SELECT waiting > > > > All these connections are made in the same minute. Once in deadlock > state we > > are not able to make new connections to db.(So not able to view pg_locks > > also). New connections appears as 'startup waiting' in ps output. > Initially > > we suspected <idle in transaction> is the result of not closing a > > connection. But it seems it got stuck after creating a connection and is > not > > able to proceed. > > This may or may not be a deadlock. Unless you've got a circle it's > not a deadlock, it's just a "cascading lock overloading your > connection limit" failure. You can get these with slony and vacuums > and ddl. Say I want to run a DDL script. Someone is running vacuum > (could be autovac process). I run slony execute to run ddl and it > waits with hard table locks, and all the updates stall behind that. > Your db then runs out of connections. What we need to know is what > that idle in transaction is just sitting there waiting to do, which is > usually a combination of db state and application state. > > As a short term fix you can set some reasonable statement level > timeout on the reindex's connection, user or database. If no reindex > ever takes more than a minute and you give it 5 minutes and check the > logs for it you can see how often it fails (once every month or once > every minute you check for a while etc then you could set that user's > connect. If that user is the superuser things become problematic. >
On Thu, Feb 7, 2013 at 8:19 PM, Anoop K <anoopk6@gmail.com> wrote: > In an attempt to get access, I ended up killing a postgres process and the > whole thing recovered from hang state. Now don't have more data points to > debug. > Sorry, I was going to ask what REINDEX was really indexing ? System tables ? ISTM that the idle in transaction connection was holding some kind of a heavy weight lock on one of the catalog tables and that may be causing all other transactions to just wait. For example, I can reproduce this by doing the following: Session 1: BEGIN; REINDEX TABLE pg_class; <stay idle in transaction> Session 2: REINDEX TABLE pg_attribute; <will hang> Try starting a new Session 3: <will hung> The stack traces of these processes will look similar to what you posted. And as soon as you end the transaction on the first session, everything will proceed. You may want to look at your application code and see if you're causing this kind of deadlock (or livelock, not sure what is a better term to describe this situation) Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > Sorry, I was going to ask what REINDEX was really indexing ? System > tables ? The stack trace for the REINDEX process includes ReindexDatabase(), so if it was running as a superuser it would be trying to reindex system catalogs too. We don't actually know that the particular table it's working on at the moment is a system catalog, but that seems like a fairly good guess. The process that's blocked in startup is definitely blocked on somebody's exclusive lock (or at least exclusive lock request) on a system catalog index, and there are not that many operations besides REINDEX that would take out such a lock. I'm guessing that something holds a lock (maybe only AccessShareLock) on a system catalog index, and REINDEX is blocked trying to get exclusive lock on that index, and then all incoming processes are queuing up behind REINDEX's request, since they'll all be trying to open the same set of catcache-supporting indexes. > ISTM that the idle in transaction connection was holding some > kind of a heavy weight lock on one of the catalog tables and that may > be causing all other transactions to just wait. It doesn't need to have been an exclusive lock to block REINDEX. I suspect this theory is correct otherwise, because if it were a true deadlock the deadlock detector should have noticed it. If it's just "everybody is blocked behind that idle transaction", the deadlock detector will not think that it should do anything about it. regards, tom lane
REINDEX was for the whole database. It seems REINDEX was blocked by the <*idle in transaction*> process. What we are not able to explain is how that connection went in to <*idle in transaction*> state. The app stacktrace confirms that app (JDBC) is trying to open a connection. We do close connection after use. So can't think how transaction went in to idle state. Thanks Anoop On Fri, Feb 8, 2013 at 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavan Deolasee <pavan.deolasee@gmail.com> writes: > > Sorry, I was going to ask what REINDEX was really indexing ? System > > tables ? > > The stack trace for the REINDEX process includes ReindexDatabase(), so > if it was running as a superuser it would be trying to reindex system > catalogs too. We don't actually know that the particular table it's > working on at the moment is a system catalog, but that seems like a > fairly good guess. The process that's blocked in startup is definitely > blocked on somebody's exclusive lock (or at least exclusive lock > request) on a system catalog index, and there are not that many > operations besides REINDEX that would take out such a lock. > > I'm guessing that something holds a lock (maybe only AccessShareLock) > on a system catalog index, and REINDEX is blocked trying to get > exclusive lock on that index, and then all incoming processes are > queuing up behind REINDEX's request, since they'll all be trying > to open the same set of catcache-supporting indexes. > > > ISTM that the idle in transaction connection was holding some > > kind of a heavy weight lock on one of the catalog tables and that may > > be causing all other transactions to just wait. > > It doesn't need to have been an exclusive lock to block REINDEX. > I suspect this theory is correct otherwise, because if it were a > true deadlock the deadlock detector should have noticed it. If it's > just "everybody is blocked behind that idle transaction", the deadlock > detector will not think that it should do anything about it. > > regards, tom lane >
You might want to consider adding a pooler like pgbouncer to the equation so that the pooler is what runs out of connections and not the database. Then you could at least get into it to fix things. On Thu, Feb 7, 2013 at 9:04 PM, Anoop K <anoopk6@gmail.com> wrote: > REINDEX was for the whole database. It seems REINDEX was blocked by the > <idle in transaction> process. > > What we are not able to explain is how that connection went in to <idle in > transaction> state. The app stacktrace confirms that app (JDBC) is trying to > open a connection. We do close connection after use. > So can't think how transaction went in to idle state. > > Thanks > Anoop > > > On Fri, Feb 8, 2013 at 12:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Pavan Deolasee <pavan.deolasee@gmail.com> writes: >> > Sorry, I was going to ask what REINDEX was really indexing ? System >> > tables ? >> >> The stack trace for the REINDEX process includes ReindexDatabase(), so >> if it was running as a superuser it would be trying to reindex system >> catalogs too. We don't actually know that the particular table it's >> working on at the moment is a system catalog, but that seems like a >> fairly good guess. The process that's blocked in startup is definitely >> blocked on somebody's exclusive lock (or at least exclusive lock >> request) on a system catalog index, and there are not that many >> operations besides REINDEX that would take out such a lock. >> >> I'm guessing that something holds a lock (maybe only AccessShareLock) >> on a system catalog index, and REINDEX is blocked trying to get >> exclusive lock on that index, and then all incoming processes are >> queuing up behind REINDEX's request, since they'll all be trying >> to open the same set of catcache-supporting indexes. >> >> > ISTM that the idle in transaction connection was holding some >> > kind of a heavy weight lock on one of the catalog tables and that may >> > be causing all other transactions to just wait. >> >> It doesn't need to have been an exclusive lock to block REINDEX. >> I suspect this theory is correct otherwise, because if it were a >> true deadlock the deadlock detector should have noticed it. If it's >> just "everybody is blocked behind that idle transaction", the deadlock >> detector will not think that it should do anything about it. >> >> regards, tom lane > > -- To understand recursion, one must first understand recursion.