Re: BUG #5443: Undetected deadlock situation - Mailing list pgsql-bugs

From Claudio Freire
Subject Re: BUG #5443: Undetected deadlock situation
Date
Msg-id 1275586454.24950.8.camel@klauss.livra.local
Whole thread Raw
In response to Re: BUG #5443: Undetected deadlock situation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, 2010-04-30 at 11:50 -0400, Tom Lane wrote:

> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Eliminating null columns and mangling column headers for length, I
> > get this:
>
> > locktype    |  tranid  | virtualx |  pid  |      mode     | gr
> > transactionid | 39773877 | 63/15761 | 11157 | ShareLock     | f
> > transactionid | 39773877 | 4/10902  |  6421 | ExclusiveLock | t
>
> > So it looks like two locks on the same transaction ID by different
> > transactions.  How does that happen?
>
> That's perfectly normal --- it indicates that pid 11157 is waiting for
> a row-level lock that's currently held by pid 6421.  We translate
> row-level locking delays into waits for XID locks in order to limit
> the number of distinct locks that have to be remembered in the shared
> lock table.  (We'd soon blow out shared memory if per-row lock data had
> to be kept there.)
>
> As Peter stated, there's no evidence of an actual problem in this
> bug report.  I'd go looking for clients sitting with open
> transactions...


It may be that indeed there isn't a deadlock, but an abnormal
performance drop.

I couldn't get a gdb trace before I just had to work around the issue
since it happens in a production server, and when it does it means
trouble for our app.

The workaround is to break the transaction into a SELECT for the rows to
be updated, followed by individual updates batched in transactions of a
couple hundred.

With that pattern, everything works as expected. The SELECT takes a
sweet time (but doesn't block any other transaction), and the updates go
pretty fast thanks to HOT.

I didn't try a SELECT FOR UPDATE followed by all the updates in a single
transaction yet, that would result in the same behavior as the massive
update, but perhaps without the "deadlock" (or performance drop).

What I did do is analyze server load during the events, and as I
suspected, disk activity during the "deadlocks" seems to suggest a
vacuuming taking place. Although there was no autovacuum entry in
pg_stat_activity every time I checked, disk activity precisely matches
the case when autovacuum decides to vacuum a big table.

That's about as much information I can give. We've worked around the
issue successfully and it hasn't happened since. Even if it is not a
proper deadlock, the performance drop is unacceptable. I've done massive
updates before, and that performance drop was not expected (more than
one day for updating 30k rows on a table with a couple indices).

pgsql-bugs by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: BUG #5488: pg_dump does not quote column names -> pg_restore may fail when upgrading
Next
From: Claudio Freire
Date:
Subject: Re: BUG #5443: Undetected deadlock situation