Not your father's question about deadlocks - Mailing list pgsql-general
From | Clarence Gardner |
---|---|
Subject | Not your father's question about deadlocks |
Date | |
Msg-id | Pine.SUN.4.02.10611142235290.21053-100000@beach.silcom.com Whole thread Raw |
Responses |
Re: Not your father's question about deadlocks
|
List | pgsql-general |
Once upon a time, I put a question regarding deadlocks to the group, and Tom Lane immediately answered with this: >The guy waiting on the tuple-specific lock is second in >line to actually mung the tuple. Whoever is first in line behind the >current tenant will be blocked trying to acquire ShareLock on the >current tenant's transaction ID. > >What you appear to have is a situation where two transactions are trying >to lock or update the same two rows in different orders. Without a lot >more info about your application logic, I couldn't guess why this seems >to be associated with having more than two transaction interested in the >same tuple. > >Note that the guy looking for ShareLock on the tuple is evidently either >doing SELECT FOR SHARE on this tuple, or trying to install a new tuple >referencing this one as a foreign key (which does SELECT FOR SHARE under >the hood). But he's blocked by someone who's done either SELECT FOR >UPDATE or an actual UPDATE on that tuple. I'm still creeping up on the problem. I configured postgres to resolve a deadlock only after a month, and now I seem to have a complete snapshot of one of them. The details are below. The part that doesn't fit with what Tom says is that there are only two transactions involved. The first and third "guys" are the same guy! The first one, I'll call it X71, is the one doing the insert. The LedgerDetail table references various other tables; the one of interest is the buys table, and it is inserting a record referencing buy #955. It had done several other inserts before this, some of them also referencing buy #955. (In fact, the immediately preceding statement was one such.) This is evidenced by the RowShareLock that it holds on the buys table. The other one, X78, is executing a stored procedure which has only one reference to buys, this statement: update buys set budget=budget+remainder where id=buynum; So the timeline is this X71 X78 ----------------------- ---------------------- insert buy955 reference insert buy955 reference insert buy955 reference update buy955 (blocks) insert buy955 reference (blocks) Here is the state of the system. I have edited it somewhat. The pg_locks table actually shows five waiters; the three that I've deleted started much later than when the deadlock happened, so I've deleted all the pg_locks rows that belonged to their transactions. I've also deleted all pg_locks rows for X71 and X78 that referred to relations that the other transaction had no lock on. And I've deleted empty or constant columns, and the relation oid column. ====================================================================== 27478 | insert into LedgerDetail (ledger,strategy,advId,campaignId,buyId,creativeId,siteId,providerType,adspotId,checkbookId,count,dr,cr,dr_usd,cr_usd,time) values('imp',',61144,',533,676,955,4076,2,5,656,31275,14,0,110936,0,142142,1163530800) | 2006-11-14 19:22:43.760703+00 | 2006-11-14 19:22:29.342683+00 99056 | FETCH 1 FROM "PgSQL_0CDD618C" | 2006-11-14 19:22:43.748025+00 | 2006-11-13 19:21:40.038948+00 2006-11-14 11:22:43 PST[99056]LOG: statement: DECLARE "PgSQL_0CDD618C" CURSOR FOR select repayBuyBudget(955, 31275, fromusd(ecc2usd(7790), 'EUR')) relation | locktype | page | tuple | transactionid | transaction | pid | mode | granted ---------------------------+---------------+-------+-------+---------------+--------------+-------+------------------+--------- buys | relation | | | | 101953371 | 27478 | AccessShareLock | t buys | relation | | | | 101953371 | 27478 | RowShareLock | t buys | tuple | 38 | 11 | | 101953371 | 27478 | ShareLock | f buys_pkey | relation | | | | 101953371 | 27478 | AccessShareLock | t | transactionid | | | 101953371 | 101954678 | 99056 | ShareLock | f buys | relation | | | | 101954678 | 99056 | AccessShareLock | t buys | relation | | | | 101954678 | 99056 | RowExclusiveLock | t buys | tuple | 38 | 11 | | 101954678 | 99056 | ExclusiveLock | t buys_pkey | relation | | | | 101954678 | 99056 | AccessShareLock | t buys_pkey | relation | | | | 101954678 | 99056 | RowExclusiveLock | t ====================================================================== That scenario seems quite simple, but I can't reproduce the deadlock with this seemingly-identical sequence. The "Sn" prompts show the sequence of statements done from psql in two terminal sessions. S1 => create table t1(f1 int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for table "t1" CREATE TABLE S2 => create table t2(f1 int references t1); CREATE TABLE S3 => insert into t1 values(1); INSERT 0 1 S4 => begin; BEGIN S7 => update t1 set f1=1; (blocks) S5 => begin; BEGIN S6 => insert into t2 values(1); INSERT 0 1 S8 => insert into t2 values(1); INSERT 0 1 S8 is the statement that seems analagous to the one that blocks in the real system, but it doesn't here. Apart from that test being different somehow that I can't figure out, situation on the real system seems to suggest that I need to either lock all the reference tables at the beginning of X71 (uncomfortable), or have it commit after each insert (unacceptable). Thanks in advance for your help.
pgsql-general by date: