Re: ALTER TABLE deadlock with concurrent INSERT - Mailing list pgsql-hackers

From Noah Misch
Subject Re: ALTER TABLE deadlock with concurrent INSERT
Date
Msg-id 20110304073642.GA953@tornado.leadboat.com
Whole thread Raw
In response to ALTER TABLE deadlock with concurrent INSERT  (Joe Conway <mail@joeconway.com>)
Responses Re: ALTER TABLE deadlock with concurrent INSERT
List pgsql-hackers
On Wed, Mar 02, 2011 at 12:25:16PM -0800, Joe Conway wrote:
> I'm working with a client on an application upgrade script which
> executes a function to conditionally do an:
> 
>   ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz
> 
> If this is run while the application is concurrently doing inserts into
> foo, we are occasionally seeing deadlocks. Aside from the fact that they
> are better off not altering the table amid concurrent inserts, I'm
> trying to understand why this is even able to happen. I expect one to
> block the other, not a deadlock.
> 
> This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
> to 8.4.latest).
> 
> We have not been able to repeat this forcibly. Here is what the log shows:
> ------------------------------
> 2011-02-25 14:38:07 PST [31686]: [1-1] ERROR:  deadlock detected
> 2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL:  Process 31686 waits for
> AccessExclusiveLock on relation 16896 of database 16386; blocked by
> process 31634.
>         Process 31634 waits for RowExclusiveLock on relation 16902 of
> database 16386; blocked by process 31686.
>         Process 31686: SELECT change_column_type('attribute_summary',
> 'sequence_number', 'numeric');
>         Process 31634: insert into attribute_summary (attribute_value,
> sequence_number, attribute_id) values ($1, $2, $3)
> 2011-02-25 14:38:07 PST [31686]: [3-1] HINT:  See server log for query
> details.
> 2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT:  SQL statement "ALTER
> TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"
>         PL/pgSQL function "change_column_type" line 18 at EXECUTE statement
> 2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT:  SELECT
> change_column_type('attribute_summary', 'sequence_number', 'numeric');
> ------------------------------

Does relation 16902 (attribute_summary) have a foreign key constraint over the
sequence_number column, in either direction, with relation 16896?  That would
explain it:

session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
session 2: SELECT 1 FROM rel16896 LIMIT 0;
session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Off the cuff, I think you could make sure this never deadlocks with a PL/pgSQL
recipe like this:

LOOPBEGIN    LOCK TABLE rel16896;    LOCK TABLE attribute_summary NOWAIT;    EXIT;EXCEPTION WHEN lock_not_available
THENEND;
END LOOP;

Granted, the cure may be worse than the disease.

nm


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: why is max standby delay only 35 minutes?
Next
From: Fujii Masao
Date:
Subject: Re: Sync Rep v19