Thread: Orphaned locks in 7.0?

Orphaned locks in 7.0?

From
Alfred Perlstein
Date:
Argh! I can't reproduce this:

NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
NOTICE:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend died abnormally
andpossibly corrupted shared memory.       I have rolled back the current transaction and am going to terminate your
databasesystem connection and exit.       Please reconnect to the database system and repeat your query.
 
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
The connection to the server was lost. Attempting reset: Failed.




Basically I was running two instances of psql, in one I issued:
         one                              two

begin;
lock data;  -- some table                                     lock data;^C -- cancel
select * from data;^C -- cancel
 
end;                                                                          lock data;^C -- HUNG then aborted

It's annoying that I can't seem to reproduce this, and I know LOCKs
are only to be requested during a transaction, but it did happen.

thanks,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Orphaned locks in 7.0?

From
Tom Lane
Date:
Alfred Perlstein <bright@wintelcom.net> writes:
> Argh! I can't reproduce this:

Was a core file left behind?  Can you get a backtrace from it?
        regards, tom lane


Re: Orphaned locks in 7.0?

From
Alfred Perlstein
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [000511 11:26] wrote:
> Alfred Perlstein <bright@wintelcom.net> writes:
> > Argh! I can't reproduce this:
> 
> Was a core file left behind?  Can you get a backtrace from it?

I enabled assertion checking and debug, here we go:

Core was generated by `postgres'.
Program terminated with signal 6, Abort trap.
Reading symbols from /usr/lib/libcrypt.so.2...done.
Reading symbols from /usr/lib/libm.so.2...done.
Reading symbols from /usr/lib/libreadline.so.4...done.
Reading symbols from /usr/lib/libncurses.so.5...done.
Reading symbols from /usr/lib/libc.so.4...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0  0x48281fd8 in kill () from /usr/lib/libc.so.4
(gdb) bt
#0  0x48281fd8 in kill () from /usr/lib/libc.so.4
#1  0x482bb4a2 in abort () from /usr/lib/libc.so.4
#2  0x8143c53 in ExcAbort () at excabort.c:27
#3  0x8143bd2 in ExcUnCaught (excP=0x81a5708, detail=0, data=0x0,    message=0x8189040 "!((result->nHolding > 0) &&
(result->holders[lockmode]>= 0))") at exc.c:170
 
#4  0x8143c19 in ExcRaise (excP=0x81a5708, detail=0, data=0x0,    message=0x8189040 "!((result->nHolding > 0) &&
(result->holders[lockmode]>= 0))") at exc.c:187
 
#5  0x8143308 in ExceptionalCondition (   conditionName=0x8189040 "!((result->nHolding > 0) &&
(result->holders[lockmode]>= 0))", exceptionP=0x81a5708, detail=0x0, fileName=0x8188e0c "lock.c",    lineNumber=617) at
assert.c:73
#6  0x810422e in LockAcquire (lockmethod=1, locktag=0xbfbfe808, lockmode=1)   at lock.c:617
#7  0x81036d1 in LockRelation (relation=0x8471ba0, lockmode=1) at lmgr.c:148
#8  0x8071957 in heap_open (relationId=1249, lockmode=1) at heapam.c:551
#9  0x813e329 in SearchSysCache (cache=0x847c018, v1=8490746, v2=136106584,    v3=0, v4=0) at catcache.c:1009
#10 0x8142210 in SearchSysCacheTuple (cacheId=4, key1=8490746, key2=136106584,    key3=0, key4=0) at syscache.c:532
#11 0x80cfc5d in make_var (pstate=0x81cd020, relid=8490746,    refname=0x81cd180 "data", attrname=0x81cd258 "referer")
atparse_node.c:202
 
#12 0x80d12be in expandAll (pstate=0x81cd020, relname=0x81cd180 "data",    ref=0x81cd158, this_resno=0x81cd020) at
parse_relation.c:408
#13 0x80d25b8 in ExpandAllTables (pstate=0x81cd020) at parse_target.c:444
#14 0x80d213b in transformTargetList (pstate=0x81cd020, targetlist=0x81ccea8)   at parse_target.c:139
#15 0x80c0ef6 in transformSelectStmt (pstate=0x81cd020, stmt=0x81ccf50)   at analyze.c:1423
#16 0x80bf780 in transformStmt (pstate=0x81cd020, parseTree=0x81ccf50)   at analyze.c:238
#17 0x80bf3c2 in parse_analyze (pl=0x81cd008, parentParseState=0x0)   at analyze.c:75
#18 0x80cafa1 in parser (str=0x8469018 "select * from data;", typev=0x0,    nargs=0) at parser.c:64
#19 0x8109923 in pg_parse_and_rewrite (   query_string=0x8469018 "select * from data;", typev=0x0, nargs=0,
aclOverride=0'\000') at postgres.c:395
 
#20 0x8109bcb in pg_exec_query_dest (   query_string=0x8469018 "select * from data;", dest=Remote, aclOverride=0)   at
postgres.c:580
#21 0x8109b91 in pg_exec_query (query_string=0x8469018 "select * from data;")   at postgres.c:562
#22 0x810ab4a in PostgresMain (argc=7, argv=0xbfbff138, real_argc=8,    real_argv=0xbfbffb98) at postgres.c:1590
#23 0x80f00d6 in DoBackend (port=0x8463000) at postmaster.c:2006
#24 0x80efc7d in BackendStartup (port=0x8463000) at postmaster.c:1775
#25 0x80eeea1 in ServerLoop () at postmaster.c:1035
#26 0x80ee88a in PostmasterMain (argc=8, argv=0xbfbffb98) at postmaster.c:723
#27 0x80bf327 in main (argc=8, argv=0xbfbffb98) at main.c:93
#28 0x80633d5 in _start ()

(gdb) up
#6  0x810422e in LockAcquire (lockmethod=1, locktag=0xbfbfe808, lockmode=1)   at lock.c:617
617                     Assert((result->nHolding > 0) && (result->holders[lockmode] >= 0));
(gdb) list
612                     XID_PRINT("LockAcquire: new", result);
613             }
614             else
615             {
616                     XID_PRINT("LockAcquire: found", result);
617                     Assert((result->nHolding > 0) && (result->holders[lockmode] >= 0));
618                     Assert(result->nHolding <= lock->nActive);
619             }
620     
621             /* ----------------
(gdb) 

Seems to be what brought things down.

If you need anything else, let me know.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Orphaned locks in 7.0?

From
Alfred Perlstein
Date:
* Hiroshi Inoue <Inoue@tpf.co.jp> [000515 02:07] wrote:
> > -----Original Message-----
> > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> > Behalf Of Alfred Perlstein
> > 
> > Basically I was running two instances of psql, in one I issued:
> > 
> >           one                              two
> > 
> > begin;
> > lock data;  -- some table
> >                                       lock data;^C -- cancel
> >                                       select * from data;^C -- cancel
> > end;
> >                                       
> >                                       lock data;^C -- HUNG then aborted
> > 
> > It's annoying that I can't seem to reproduce this, and I know LOCKs
> > are only to be requested during a transaction, but it did happen.
> >
> 
> Could the following example explain your HUNG problem ?
> 
> Session-1
>     # begin;
>     BEGIN
>     =# lock t;
>     LOCK TABLE
> 
> Session-2
>     =# begin;
>     BEGIN
>     =# lock t; 
>     [blocked] ^C
>     Cancel request sent
>     ERROR:  Query cancel requested while waiting lock
>     reindex=# select * from t;
>     [blocked]
> 
> Session-1
>     =# commit;
>     COMMIT
> 
> Session-2
>     ERROR:  LockRelation: LockAcquire failed
>     =# abort;
>     ROLLBACK
>     =# lock t;
>     [blocked]

That looks pretty much like the sequence of events that lead up to
the problem, the problem is that I was just manually testing out
the way locks work and didn't write down the exact steps I took.

This is probably exactly the right steps though.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


RE: Orphaned locks in 7.0?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Hiroshi Inoue
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > 
> > Anyway, it sounds like we agree that this is the approach to pursue.
> > Do you have time to chase down the details?
> 
> OK,I will examine a little though I'm a little busy this week.
>

Sorry,I'm so late and haven't so much time to examin the details.
I'm afraid another point now.
Woundn't this change waste XIDs in case of abort loop ?

Anyway,I examied the loop in PostgresMain()(;;){ .. StartTransactionCommand() .. pg_exec_query() ..
CommitTransactionCommand()(/AbortCurrentTrabsaction())..}
 

In my thoughts,the follwoing commands preceded by +?
would be added,ones preceded by -? would be removed.

StartTransactionCommand()TBLOCK_DEFAULT    StartTransaction()    ->TBLOCK_BEGIN                ->
TBLOCK_INPROGRESSTBLOCK_INPROGRES               ->TBLOCK_END        CommitTransaction()    ->
StartTransaction()   -> TBLOCK_DEFAULTTBLOCK_ABORT                ->TBLOCK_ENDABORT                ->
 
CommitTransactionCommand()TBLOCK_DEFAULT    CommitTransaction()    ->TBLOCK_BEGIN                ->
TBLOCK_INPROGRESSTBLOCK_INPROGRESS   CommandCounterIncrement()    ->TBLOCK_END        CommitTransaction()    ->
TBLOCK_DEFAULTTBLOCK_ABORT   +? AbortTransaction()            +? StartTransaction()    ->TBLOCK_ENDABORT    +?
AbortTransaction()   -> TBLOCK_DEFAULT
 

BeginTransactionBlock() ( <- BEGIN command )TRANS_DISABLED                ->otherwise        -> TBLOCK_BEGIN    ->
TBLOCK_INPROGRESS

UserAbortTransaction() ( <- ROLLBACK command )TRANS_DISABLED                ->TBLOCK_INPROGRESS     -?
AbortTransaction()   -> TBLOCK_ENDABORTTBLOCK_ABORT                -> TBLOCK_ENDABORTotherwise        -?
AbortTransaction()   -> TBLOCK_ENDABORT
 

EndTransactionBlock() ( <- COMMIT command )TRANS_DISABLED                ->TBLOCK_INPROGRESS                ->
TBLOCK_END   TBLOCK_ABORT                -> TBLOCK_ENDABORTotherwise                    -> TBLOCK_ENDABORT
 

AbortCurrentTransaction() ( elog(ERROR/FATAL) )TBLOCK_DEFAULT    AbortTransaction()    ->TBLOCK_BEGIN
AbortTransaction()           +? StartTransaction()    -> TBLOCK_ABORTTBLOCK_INGRESS    AbortTransaction()            +?
StartTransaction()   -> TBLOCK_ABORTTBLOCK_END        AbortTransaction()    -> TBLOCK_DEFAULTTBLOCK_ABORT    +?
AbortTransaction()           +? StartTransaction()    ->TBLOCK_ENDABORT    +? AbortTransaction()    -> TBLOCK_DEFAULT
 

AbortOutAnyTransaction() ( Async_UnlistenOnExit() )TRANS_DEFAULT                -> TBLOCK_DEFAULTotherwise
AbortTransaction()   -> TBLOCK_DEFAULT
 


Regards.

Hiroshi Inoue
Inoue@tpf.co.jp