Thread: BUG #8434: Why does dead lock occur many times ?

BUG #8434: Why does dead lock occur many times ?

From

katsumata.tomonari@po.ntts.co.jp

Date:

04 September 2013, 09:15:07

The following bug has been logged on the website:

Bug reference:      8434
Logged by:          Tomonari Katsumata
Email address:      katsumata.tomonari@po.ntts.co.jp
PostgreSQL version: 9.3rc1
Operating system:   RedHatEnterpriseLinux 6.4(x86_64)
Description:

Hi,


I'm testing PostgreSQL 9.3rc1.
Many times updates and selects for update become dead lock situation.


The reproduce is:
1. initializing data
createdb testdb
psql testdb -c "create table t (col1 int, col2 int, col3 text);"
psql testdb -c "insert into t values (1, 4, 'A');"
psql testdb -c "insert into t values (2, 5, 'B');"
psql testdb -c "insert into t values (3, 6, 'C');"


2. executing updates and selects for update
(run below script)
----
#!/bin/sh


./tx1 > /dev/null &
./tx2 > /dev/null &
./tx3 > /dev/null &


wait
----


tx1 is:
----
#!/bin/sh


while :
do
psql testdb << EOF
BEGIN;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----


tx2 is:
----
#!/bin/sh


while :
do
psql testdb << EOF
BEGIN;
SELECT col1, col2, col3 FROM t WHERE col1 = 3 AND col2 = 6 FOR UPDATE;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----


tx3 is:
----
#!/bin/sh


while :
do
psql testdb << EOF
BEGIN;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----


Then, I got below messages.
--------
2013-09-04 15:25:25 JST 29630 5226d254.73be-1 659102 (pgsql, testdb,
[local], psql) LOG:  00000: process 29630 detected deadlock while waiting
for ShareLock on transaction 659103 after 1000.136 ms
2013-09-04 15:25:25 JST 29630 5226d254.73be-2 659102 (pgsql, testdb,
[local], psql) LOCATION:  ProcSleep, proc.c:1232
2013-09-04 15:25:25 JST 29630 5226d254.73be-3 659102 (pgsql, testdb,
[local], psql) STATEMENT:  UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
6;
2013-09-04 15:25:25 JST 29630 5226d254.73be-4 659102 (pgsql, testdb,
[local], psql) ERROR:  40P01: deadlock detected
2013-09-04 15:25:25 JST 29630 5226d254.73be-5 659102 (pgsql, testdb,
[local], psql) DETAIL:  Process 29630 waits for ShareLock on transaction
659103; blocked by process 29631.
        Process 29631 waits for ExclusiveLock on tuple (0,153) of relation
16385 of database 16384; blocked by process 29630.
        Process 29630: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
        Process 29631: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
2013-09-04 15:25:25 JST 29630 5226d254.73be-6 659102 (pgsql, testdb,
[local], psql) HINT:  See server log for query details.
2013-09-04 15:25:25 JST 29630 5226d254.73be-7 659102 (pgsql, testdb,
[local], psql) LOCATION:  DeadLockReport, deadlock.c:956
2013-09-04 15:25:25 JST 29630 5226d254.73be-8 659102 (pgsql, testdb,
[local], psql) STATEMENT:  UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
6;
--------


I did not get these messages with PostgreSQL 9.2.4.


Why did I get the dead lock situation with PostgreSQL9.3rc1?
degrading? or I'm missing something?


regards,
------------
Tomonari Katsumata

Re: BUG #8434: Why does dead lock occur many times ?

From

OTSUKA Kenji

Date:

19 September 2013, 18:28:32

Hi,

This issue occurred even on 9.3.0.
I took the following information, and I attached them.

    pg_locks.txt          ... pg_locks during deadlock
    pg_stat_activity.txt ... pg_stat_activity during deadlock
    postgresql.log        ... PostgreSQL log (including LOCK_DEBUG log)

The OID of the table is 16459.

I run 3 transactions.
One transaction of them executes SELECT FOR UPDATE and UPDATE a row.
And two transactions of them execute only UPDATE the same line.

The results is that all of UPDATE is waiting.

I changed a little bit how to reproduce.
It is as follows.

- Compiling PostgreSQL
      Add -DLOCK_DEBUG to CFLAGS

- Changing postgresql.conf following parameters
      log_lock_waits = on
      deadlock_timeout = 1min   # for getting information during deadlock
      debug_deadlocks = on
      trace_lock_table = 16459

      log_line_prefix = '%t [%p] %q(%a) '
      logging_collector = on
      log_filename = 'postgresql.log'
      log_min_messages = info
      log_error_verbosity = verbose

- Testing
      1. Initializing data
          Executing createdb.sh
          This creates a table with 2 columns, and insert 1 row.

      2. Running the transactions
          Executing test.sh
          This runs 3 transactions.

          2 transactions of them (tx1) are
            BEGIN;
            UPDATE t SET col2 = 'A' WHERE col1 = 1;
            COMMIT;

          1 transaction of them (tx2) is
            BEGIN;
            SELECT * FROM t WHERE col1 = 1 FOR UPDATE;
            UPDATE t SET col2 = 'A' WHERE col1 = 1;
            COMMIT;

regards,

2013/9/4 <katsumata.tomonari@po.ntts.co.jp>

The following bug has been logged on the website:

Bug reference: 8434
Logged by: Tomonari Katsumata
Email address: katsumata.tomonari@po.ntts.co.jp
PostgreSQL version: 9.3rc1
Operating system: RedHatEnterpriseLinux 6.4(x86_64)
Description:

Hi,

I'm testing PostgreSQL 9.3rc1.
Many times updates and selects for update become dead lock situation.

The reproduce is:
1. initializing data
createdb testdb
psql testdb -c "create table t (col1 int, col2 int, col3 text);"
psql testdb -c "insert into t values (1, 4, 'A');"
psql testdb -c "insert into t values (2, 5, 'B');"
psql testdb -c "insert into t values (3, 6, 'C');"

2. executing updates and selects for update
(run below script)
----
#!/bin/sh

./tx1 > /dev/null &
./tx2 > /dev/null &
./tx3 > /dev/null &

wait
----

tx1 is:
----
#!/bin/sh

while :
do
psql testdb << EOF
BEGIN;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----

tx2 is:
----
#!/bin/sh

while :
do
psql testdb << EOF
BEGIN;
SELECT col1, col2, col3 FROM t WHERE col1 = 3 AND col2 = 6 FOR UPDATE;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----

tx3 is:
----
#!/bin/sh

while :
do
psql testdb << EOF
BEGIN;
UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
COMMIT;
\q
EOF
done
----

Then, I got below messages.
--------
2013-09-04 15:25:25 JST 29630 5226d254.73be-1 659102 (pgsql, testdb,
[local], psql) LOG: 00000: process 29630 detected deadlock while waiting
for ShareLock on transaction 659103 after 1000.136 ms
2013-09-04 15:25:25 JST 29630 5226d254.73be-2 659102 (pgsql, testdb,
[local], psql) LOCATION: ProcSleep, proc.c:1232
2013-09-04 15:25:25 JST 29630 5226d254.73be-3 659102 (pgsql, testdb,
[local], psql) STATEMENT: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
6;
2013-09-04 15:25:25 JST 29630 5226d254.73be-4 659102 (pgsql, testdb,
[local], psql) ERROR: 40P01: deadlock detected
2013-09-04 15:25:25 JST 29630 5226d254.73be-5 659102 (pgsql, testdb,
[local], psql) DETAIL: Process 29630 waits for ShareLock on transaction
659103; blocked by process 29631.
Process 29631 waits for ExclusiveLock on tuple (0,153) of relation
16385 of database 16384; blocked by process 29630.
Process 29630: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
Process 29631: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
2013-09-04 15:25:25 JST 29630 5226d254.73be-6 659102 (pgsql, testdb,
[local], psql) HINT: See server log for query details.
2013-09-04 15:25:25 JST 29630 5226d254.73be-7 659102 (pgsql, testdb,
[local], psql) LOCATION: DeadLockReport, deadlock.c:956
2013-09-04 15:25:25 JST 29630 5226d254.73be-8 659102 (pgsql, testdb,
[local], psql) STATEMENT: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
6;
--------

I did not get these messages with PostgreSQL 9.2.4.

Why did I get the dead lock situation with PostgreSQL9.3rc1?
degrading? or I'm missing something?

regards,
------------
Tomonari Katsumata

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
OTSUKA Kenji

Attachment

Re: BUG #8434: Why does dead lock occur many times ?

From

Tomonari Katsumata

Date:

28 September 2013, 01:39:23

Hi,

Thanks to your info, otsuka-san.

It seems to be related to some changes in
the [Improve concurrency of foreign key locking] commit(*).
(*) commitid : 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Because I could not reproduce "dead lock" before the commit.
And I'm thinking that also this commit is related to BUG#8470.
If so, I hope Alvaro will find the solution for this problem.

Maybe I can't take enough time, but I'll investigate this continuously.
If someone could reproduce this, please share information.

regards,
-------
NTT Software Corporation
Tomonari Katsumata



2013/9/19 OTSUKA Kenji <otsuka.knj@gmail.com>

> Hi,
>
> This issue occurred even on 9.3.0.
> I took the following information, and I attached them.
>
>     pg_locks.txt          ... pg_locks during deadlock
>     pg_stat_activity.txt  ... pg_stat_activity during deadlock
>     postgresql.log        ... PostgreSQL log (including LOCK_DEBUG log)
>
> The OID of the table is 16459.
>
> I run 3 transactions.
> One transaction of them executes SELECT FOR UPDATE and UPDATE a row.
> And two transactions of them execute only UPDATE the same line.
>
> The results is that all of UPDATE is waiting.
>
>
> I changed a little bit how to reproduce.
> It is as follows.
>
>   - Compiling PostgreSQL
>       Add -DLOCK_DEBUG to CFLAGS
>
>   - Changing postgresql.conf following parameters
>       log_lock_waits = on
>       deadlock_timeout = 1min   # for getting information during deadlock
>       debug_deadlocks = on
>       trace_lock_table = 16459
>
>       log_line_prefix = '%t [%p] %q(%a) '
>       logging_collector = on
>       log_filename = 'postgresql.log'
>       log_min_messages = info
>       log_error_verbosity = verbose
>
>   - Testing
>       1. Initializing data
>           Executing createdb.sh
>           This creates a table with 2 columns, and insert 1 row.
>
>       2. Running the transactions
>           Executing test.sh
>           This runs 3 transactions.
>
>           2 transactions of them (tx1) are
>             BEGIN;
>             UPDATE t SET col2 = 'A' WHERE col1 = 1;
>             COMMIT;
>
>           1 transaction of them (tx2) is
>             BEGIN;
>             SELECT * FROM t WHERE col1 = 1 FOR UPDATE;
>             UPDATE t SET col2 = 'A' WHERE col1 = 1;
>             COMMIT;
>
> regards,
>
>
>
> 2013/9/4 <katsumata.tomonari@po.ntts.co.jp>
>
> The following bug has been logged on the website:
>>
>> Bug reference:      8434
>> Logged by:          Tomonari Katsumata
>> Email address:      katsumata.tomonari@po.ntts.co.jp
>> PostgreSQL version: 9.3rc1
>> Operating system:   RedHatEnterpriseLinux 6.4(x86_64)
>> Description:
>>
>> Hi,
>>
>>
>> I'm testing PostgreSQL 9.3rc1.
>> Many times updates and selects for update become dead lock situation.
>>
>>
>> The reproduce is:
>> 1. initializing data
>> createdb testdb
>> psql testdb -c "create table t (col1 int, col2 int, col3 text);"
>> psql testdb -c "insert into t values (1, 4, 'A');"
>> psql testdb -c "insert into t values (2, 5, 'B');"
>> psql testdb -c "insert into t values (3, 6, 'C');"
>>
>>
>> 2. executing updates and selects for update
>> (run below script)
>> ----
>> #!/bin/sh
>>
>>
>> ./tx1 > /dev/null &
>> ./tx2 > /dev/null &
>> ./tx3 > /dev/null &
>>
>>
>> wait
>> ----
>>
>>
>> tx1 is:
>> ----
>> #!/bin/sh
>>
>>
>> while :
>> do
>> psql testdb << EOF
>> BEGIN;
>> UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
>> COMMIT;
>> \q
>> EOF
>> done
>> ----
>>
>>
>> tx2 is:
>> ----
>> #!/bin/sh
>>
>>
>> while :
>> do
>> psql testdb << EOF
>> BEGIN;
>> SELECT col1, col2, col3 FROM t WHERE col1 = 3 AND col2 = 6 FOR UPDATE;
>> UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
>> COMMIT;
>> \q
>> EOF
>> done
>> ----
>>
>>
>> tx3 is:
>> ----
>> #!/bin/sh
>>
>>
>> while :
>> do
>> psql testdb << EOF
>> BEGIN;
>> UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 = 6;
>> COMMIT;
>> \q
>> EOF
>> done
>> ----
>>
>>
>> Then, I got below messages.
>> --------
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-1 659102 (pgsql, testdb,
>> [local], psql) LOG:  00000: process 29630 detected deadlock while waiting
>> for ShareLock on transaction 659103 after 1000.136 ms
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-2 659102 (pgsql, testdb,
>> [local], psql) LOCATION:  ProcSleep, proc.c:1232
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-3 659102 (pgsql, testdb,
>> [local], psql) STATEMENT:  UPDATE t SET col3 = 'c' WHERE col1 = 3 AND
>> col2 =
>> 6;
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-4 659102 (pgsql, testdb,
>> [local], psql) ERROR:  40P01: deadlock detected
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-5 659102 (pgsql, testdb,
>> [local], psql) DETAIL:  Process 29630 waits for ShareLock on transaction
>> 659103; blocked by process 29631.
>>         Process 29631 waits for ExclusiveLock on tuple (0,153) of relation
>> 16385 of database 16384; blocked by process 29630.
>>         Process 29630: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
>> 6;
>>         Process 29631: UPDATE t SET col3 = 'c' WHERE col1 = 3 AND col2 =
>> 6;
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-6 659102 (pgsql, testdb,
>> [local], psql) HINT:  See server log for query details.
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-7 659102 (pgsql, testdb,
>> [local], psql) LOCATION:  DeadLockReport, deadlock.c:956
>> 2013-09-04 15:25:25 JST 29630 5226d254.73be-8 659102 (pgsql, testdb,
>> [local], psql) STATEMENT:  UPDATE t SET col3 = 'c' WHERE col1 = 3 AND
>> col2 =
>> 6;
>> --------
>>
>>
>> I did not get these messages with PostgreSQL 9.2.4.
>>
>>
>> Why did I get the dead lock situation with PostgreSQL9.3rc1?
>> degrading? or I'm missing something?
>>
>>
>> regards,
>> ------------
>> Tomonari Katsumata
>>
>>
>>
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>>
>
>
>
> --
> OTSUKA Kenji
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>
>

Re: BUG #8434: Why does dead lock occur many times ?

From

Alvaro Herrera

Date:

28 September 2013, 01:47:16

Hi,

Sorry I hadn't noticed this thread.

Tomonari Katsumata escribió:

> It seems to be related to some changes in
> the [Improve concurrency of foreign key locking] commit(*).
> (*) commitid : 0ac5ad5134f2769ccbaefec73844f8504c4d6182
> Because I could not reproduce "dead lock" before the commit.

Yes, sounds pretty likely.

> And I'm thinking that also this commit is related to BUG#8470.

Yes, I replied to that thread because I suspect the same.

> If so, I hope Alvaro will find the solution for this problem.

I will take a look at this problem next week.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8434: Why does dead lock occur many times ?

From

Alvaro Herrera

Date:

04 October 2013, 17:22:25

I spent a lot of time looking at this issue.  Your problem scenario can
be reduced to the following test case:

CREATE TABLE t (a INT);
INSERT INTO t VALUES (1);

The three sessions need to execute the following commands, in this
sequence:

-- Session 1
/* 1 */ BEGIN;
SELECT * FROM t FOR UPDATE;

-- Session 2
/* 2 */ BEGIN;
UPDATE t SET a = 2;        -- blocks waiting on S1

-- Session 3
/* 3 */ BEGIN;
UPDATE t SET a = 3;        -- blocks
-- this is waiting on S1, but since S2 is also waiting on S1 and will acquire a
-- conflicting lock first, this session will only be released after S2 releases
-- its own lock.

Session 1
/* 4 */ UPDATE t SET a = 4;
COMMIT;                -- releases session 2
BEGIN;
SELECT * FROM t FOR UPDATE;    -- blocks waiting on S2 (*)

Session 2:
/* 5 */ COMMIT;                -- releases session 1
BEGIN;
UPDATE t SET a = 5;            -- blocks waiting on S3

Session 1:
/* 6 */ UPDATE t SET a = 6;        -- blocks waiting on S3

At this point, all sessions are blocked on their UPDATE commands, and
deadlock is reported.  (For debugging, it's useful to set a very large
deadlock_timeout and examine pg_locks, etc.)

The point at which things started to go wrong was where the session 1's
SELECT FOR UPDATE with the (*) was allowed to continue after session 2
commits.  In 9.2 that session remains blocked, and instead we release
session 3.

The problem is the handling of following of the update chain during a
lock command; the question is should sessions be blocked waiting for
locks on future versions of the row?  They currently don't block, which
is what leads the SELECT FOR UPDATE to continue; but if we make them
block, other things break.  I initially toyed with the following patch,
which removes the deadlock the above report and also your original test
case:

--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4832,15 +4832,12 @@ l4:
         xmax = HeapTupleHeaderGetRawXmax(mytup.t_data);

         /*
-         * If this tuple is updated and the key has been modified (or
-         * deleted), what we do depends on the status of the updating
-         * transaction: if it's live, we sleep until it finishes; if it has
-         * committed, we have to fail (i.e. return HeapTupleUpdated); if it
-         * aborted, we ignore it. For updates that didn't touch the key, we
-         * can just plough ahead.
+         * If this tuple is updated, what we do depends on the status of the
+         * updating transaction: if it's live, we sleep until it finishes; if
+         * it has committed, we have to fail (i.e. return HeapTupleUpdated); if
+         * it aborted, we ignore it.
          */
-        if (!(old_infomask & HEAP_XMAX_INVALID) &&
-            (mytup.t_data->t_infomask2 & HEAP_KEYS_UPDATED))
+        if (!(old_infomask & HEAP_XMAX_INVALID))
         {
             TransactionId update_xid;


However, the differences in behavior this causes (visible by running the
isolation tests) don't look good to me.  It's quite possible that there
are other bugs elsewhere.

Sadly, I have to look at other things now and I won't have time to
research this further until after next week's minor releases; so
whatever we decide to do, if anything, will not be in 9.3.1.

Thanks for the report.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #8434: Why does dead lock occur many times ?

From

Alvaro Herrera

Date:

27 November 2013, 17:18:14

I think I have figured this out.  This code was being too simplistic in
that it was only checking whether there was a key-update or not; but
that, it seems, is not sufficiently fine-grained.  If we instead think
in terms of an already-acquired MultiXactStatus, and a new
MultiXactStatus corresponding to the lock we're trying to acquire, we
can decide granularly whether each future version of the tuple can be
locked by us or not.  And if not, we can decide whether we need to wait
on the transaction holding the lock, or fail if it already committed.
(There is a funny trick here which is that we represent the held lock
with a MultiXactStatus, even if the lock is only a plain Xid and not a
multi.  This doesn't seem a problem to me.)

So I propose the attached patch.  This doesn't change the behavior
codified in the existing isolation tests, and it fixes my reduction of
your original test case.  (I haven't run your original test case yet,
but I soon will.)

Note: I don't quite like that this patch duplicates some code in
compute_new_xmax_infomask() which determines the MultiXactStatus from
the infomask bits.  Not sure what a good refactoring is, though, because
that code just issues a WARNING when the LOCK_ONLY bit is set and no
other lockmode is set; whereas the new code issues an ERROR.  It seems
hard to justify doing otherwise in either place, though; and doing
something more complicated doesn't seem warranted for such a corner
case.

Note: this patch doesn't apply to master in isolation.  You will need
the patch I mention in
http://www.postgresql.org/message-id/20131125201039.GF6597@eldon.alvh.no-ip.org
even though I haven't posted it yet.  Will do so shortly.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

granular-chain-following.patch

Re: BUG #8434: Why does dead lock occur many times ?

From

Tomonari Katsumata

Date:

28 November 2013, 00:53:08

Hi Alvaro,

Thank you for finding it out!

I'll test it from now, so please
give me a while time.

regards,

(2013/11/28 1:42), Alvaro Herrera wrote:
 > I think I have figured this out.  This code was being too simplistic in
 > that it was only checking whether there was a key-update or not; but
 > that, it seems, is not sufficiently fine-grained.  If we instead think
 > in terms of an already-acquired MultiXactStatus, and a new
 > MultiXactStatus corresponding to the lock we're trying to acquire, we
 > can decide granularly whether each future version of the tuple can be
 > locked by us or not.  And if not, we can decide whether we need to wait
 > on the transaction holding the lock, or fail if it already committed.
 > (There is a funny trick here which is that we represent the held lock
 > with a MultiXactStatus, even if the lock is only a plain Xid and not a
 > multi.  This doesn't seem a problem to me.)
 >
 > So I propose the attached patch.  This doesn't change the behavior
 > codified in the existing isolation tests, and it fixes my reduction of
 > your original test case.  (I haven't run your original test case yet,
 > but I soon will.)
 >
 > Note: I don't quite like that this patch duplicates some code in
 > compute_new_xmax_infomask() which determines the MultiXactStatus from
 > the infomask bits.  Not sure what a good refactoring is, though, because
 > that code just issues a WARNING when the LOCK_ONLY bit is set and no
 > other lockmode is set; whereas the new code issues an ERROR. It seems
 > hard to justify doing otherwise in either place, though; and doing
 > something more complicated doesn't seem warranted for such a corner
 > case.
 >
 > Note: this patch doesn't apply to master in isolation.  You will need
 > the patch I mention in
 >
http://www.postgresql.org/message-id/20131125201039.GF6597@eldon.alvh.no-ip.org
 > even though I haven't posted it yet.  Will do so shortly.
 >

Re: BUG #8434: Why does dead lock occur many times ?

From

Tomonari Katsumata

Date:

28 November 2013, 04:00:53

Hi Alvaro

I applied all patches you told me against 93_STABLE.
But I could not build the source with the patches.

It seems your patch use GetMultiXactIdMembers with four arguments,
but the function requires three arguments.
----
nmembers = GetMultiXactIdMembers(rawxmax, &members, false, false);
----

do you have another patch?

Am I missing something?

(2013/11/28 1:42), Alvaro Herrera wrote:
 > I think I have figured this out.  This code was being too simplistic in
 > that it was only checking whether there was a key-update or not; but
 > that, it seems, is not sufficiently fine-grained.  If we instead think
 > in terms of an already-acquired MultiXactStatus, and a new
 > MultiXactStatus corresponding to the lock we're trying to acquire, we
 > can decide granularly whether each future version of the tuple can be
 > locked by us or not.  And if not, we can decide whether we need to wait
 > on the transaction holding the lock, or fail if it already committed.
 > (There is a funny trick here which is that we represent the held lock
 > with a MultiXactStatus, even if the lock is only a plain Xid and not a
 > multi.  This doesn't seem a problem to me.)
 >
 > So I propose the attached patch.  This doesn't change the behavior
 > codified in the existing isolation tests, and it fixes my reduction of
 > your original test case.  (I haven't run your original test case yet,
 > but I soon will.)
 >
 > Note: I don't quite like that this patch duplicates some code in
 > compute_new_xmax_infomask() which determines the MultiXactStatus from
 > the infomask bits.  Not sure what a good refactoring is, though, because
 > that code just issues a WARNING when the LOCK_ONLY bit is set and no
 > other lockmode is set; whereas the new code issues an ERROR. It seems
 > hard to justify doing otherwise in either place, though; and doing
 > something more complicated doesn't seem warranted for such a corner
 > case.
 >
 > Note: this patch doesn't apply to master in isolation.  You will need
 > the patch I mention in
 >
http://www.postgresql.org/message-id/20131125201039.GF6597@eldon.alvh.no-ip.org
 > even though I haven't posted it yet.  Will do so shortly.
 >
 >

Re: BUG #8434: Why does dead lock occur many times ?

From

Alvaro Herrera

Date:

28 November 2013, 11:22:27

Tomonari Katsumata wrote:
> Hi Alvaro
>
> I applied all patches you told me against 93_STABLE.
> But I could not build the source with the patches.
>
> It seems your patch use GetMultiXactIdMembers with four arguments,
> but the function requires three arguments.
> ----
> nmembers = GetMultiXactIdMembers(rawxmax, &members, false, false);
> ----
>
> do you have another patch?

Sorry, yes.  This is the complete series of patches, although I have not
tried them against 9.3 yet.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: BUG #8434: Why does dead lock occur many times ?

From

Tomonari Katsumata

Date:

29 November 2013, 04:32:42

Hi Alvaro,

 >> do you have another patch?
 >
 > Sorry, yes.  This is the complete series of patches, although I have not
 > tried them against 9.3 yet.
 >
Thank you.
I could apply them against 93_STABLE and build it.
(commitID: d32e8387bac15ad353029c259e832f78d456c7a8)

deadlock did not occur with my test case.
It seems these patches resolve my problem.

If it doesn't have any other problems,
would you commit them for next release(9.3.2)?

FYI:
These patch doesn't resolve bug#8470 yet.
I could reproduce the problem on patched PostgreSQL.

regards,

Re: BUG #8434: Why does dead lock occur many times ?

From

Alvaro Herrera

Date:

29 November 2013, 12:48:08

Tomonari Katsumata wrote:
> Hi Alvaro,
>
> >> do you have another patch?
> >
> > Sorry, yes.  This is the complete series of patches, although I have not
> > tried them against 9.3 yet.
> >
> Thank you.
> I could apply them against 93_STABLE and build it.
> (commitID: d32e8387bac15ad353029c259e832f78d456c7a8)
>
> deadlock did not occur with my test case.
> It seems these patches resolve my problem.
>
> If it doesn't have any other problems,
> would you commit them for next release(9.3.2)?

Yes, it's already committed for 9.3.2.  There are a bunch of other fixes
related to multixact which I intend to commit soon too.

> FYI:
> These patch doesn't resolve bug#8470 yet.
> I could reproduce the problem on patched PostgreSQL.

Yeah, I tested that too, but that problem is not related.  I also tested
it with the fix to another performance problem which I posted as
http://www.postgresql.org/message-id/20131127222435.GC5513@eldon.alvh.no-ip.org
but that's not it either.  It really requires a completely separate
optimization.

Thanks,

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services