Re: Global Deadlock Information - Mailing list pgsql-cluster-hackers

From Satoshi Nagayasu
Subject Re: Global Deadlock Information
Date
Msg-id 1831154951-1265475035-cardhu_decombobulator_blackberry.rim.net-253244861-@bda047.bisx.prodap.on.blackberry
Whole thread Raw
In response to Re: Global Deadlock Information  (Koichi Suzuki <koichi.szk@gmail.com>)
List pgsql-cluster-hackers
I don't think something special/extra required for the global wait-for graph logging,
because each sub-transaction should be processed (and recovered) as a local transaction on the each RM (Resource
Manager)
under the distributed transaction environment.

A running transaction is not prepared/commited,
so no more special/extra record is needed when a local transaction is aborted.

-----Original Message-----
From: Koichi Suzuki <koichi.szk@gmail.com>
Date: Sun, 7 Feb 2010 01:23:36
To: Satoshi Nagayasu<satoshi.nagayasu@gmail.com>
Cc: Markus Wanner<markus@bluegap.ch>; <pgsql-cluster-hackers@postgresql.org>
Subject: Re: [pgsql-cluster-hackers] Global Deadlock Information

Hi,

I'm very interested in how log it takes to determine the global
deadlock using global wait-for graph and if global deadlock detection
disturb other on-going transactions.

----------
Koichi Suzuki



2010/2/7 Satoshi Nagayasu <satoshi.nagayasu@gmail.com>:
> Hi Markus,
>
> I attempted in two ways to resolve global deadlock situation
> in the PostgresForest development.
>
> (1) Use the lock_timeout to avoid from a global deadlock.
>
> The lock_timeout feature is a very simple way to avoid
> from the global deadlock situation.
>
> I disagree "statement_timeout is the way to avoid global
> deadlocks" too, because the statement_timeout kills
> the healthy/long-running transaction by its timeout.
>
> Some developers (including me!) proposed the lock_timeout
> GUC option.
>
> http://archives.postgresql.org/pgsql-hackers/2004-06/msg00935.php
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01167.php
>
> I still believe the "lock timeout" feature could help
> resolving a global deadlock in the cluster environment.
>
> (2) Use the global wait-for graph to detect a global deadlock.
>
> I had an experimental implemetation to use the global wait-for
> graph to prevent the global deadlock.
>
> http://en.wikipedia.org/wiki/Deadlock#Distributed_deadlock
>
> I used the node(server) identifiers and the pg_locks information
> to build the global wait-for graph, and the kill signal
> (or pg_cancel()?) to abort a victim transaction causing
> the deadlock.
>
> I don't think the callback function is needed to replace
> the current deadlock resolution feature,
> but I agree we need a consensus how we could avoid
> the global deadlock situation in the cluster.
>
> Thanks,
>
> On 2010/02/06 18:13, Markus Wanner wrote:
>>
>> Hi,
>>
>> I'd like to start a thread for discussion of the second item on the
>> ClusterFeatures [1] list: Global Deadlock Information.
>>
>> IIRC there are two aspects to this item: a) the plain notification of a
>> deadlock and b) some way to control or intercept deadlock resolution.
>>
>> The problem this item seems to address is the potential for deadlocks
>> between transactions on different nodes. Or put another way: between a
>> local transaction and one that's to be applied from a remote node (or
>> even between two remote ones - similar issue, though). To ensure
>> congruency between nodes, they must take the same measures to resolve
>> the deadlock, i.e. abort the same transaction(s).
>>
>> I certainly disagree with the statement on the wiki that the
>> "statement_timeout is the way to avoid global deadlocks", because I
>> don't want to have to wait that long until a deadlock gets resolved.
>> Further it doesn't even guarantee congruency, depending on the
>> implementation of your clustering solution.
>>
>> I fail to see how a plain notification API would help much. After all,
>> this could result in one node notifying having aborted transaction A to
>> resolve a deadlock while another node notifies having aborted
>> transaction B. You'd end up having to abort two (or more) transaction
>> instead of just one to resolve a conflict.
>>
>> It could get more useful, if enabling such a notification would turn off
>> the existing deadlock resolver and leave the resolution of the deadlock
>> to the clustering solution. I'd call that an interception.
>>
>> Such an interception API should IMO provide a way to register a
>> callback, which replaces the current deadlock resolver. Upon detection
>> of a deadlock, the callback should get a list of transaction ids that
>> are part of the lock cycle. It's then up to that callback, to chose one
>> and abort that to resolve the conflict.
>>
>> And now, Greg's List:
>> > 1) What feature does this help add from a user perspective?
>>
>> Preventing cluster-wide deadlocks (while maintaining congruency of
>> replicas).
>>
>> > 2) Which replication projects would be expected to see an improvement
>> > from this addition?
>>
>> I suspect all multi-master solutions are affected, certainly Postgres-R
>> would benefit. Single-master ones certainly don't need it.
>>
>> > 3) What makes it difficult to implement?
>>
>> I don't see any real stumbling block. Deciding on an API needs consensus.
>>
>> > 4) Are there any other items on the list this depends on, or that it
>> > is expected to have a significant positive/negative interaction with?
>>
>> Not that I know of.
>>
>> > 5) What replication projects include a feature like this already, or a
>> > prototype of a similar one, that might be used as a proof of concept
>> > or example implementation?
>>
>> Old Postgres-R versions once had such an interception, but it currently
>> lacks a solution for this problem. I don't know of any other project
>> that's already solved this.
>>
>> > 6) Who is already working on it/planning to work on it/needs it for
>> > their related project?
>>
>> I'm not currently working on it and don't plan to do so (at least) until
>> PgCon 2010.
>>
>>
>> Cluster hackers, is this a good summary which covers your needs as well?
>> Something missing?
>>
>> Regards
>>
>> Markus Wanner
>>
>> [1]: feature wish list of cluster hackers:
>> http://wiki.postgresql.org/wiki/ClusterFeatures
>>
>>
>
>
> --
> NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
>
> --
> Sent via pgsql-cluster-hackers mailing list
> (pgsql-cluster-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-cluster-hackers
>


pgsql-cluster-hackers by date:

Previous
From: Koichi Suzuki
Date:
Subject: Re: Global Deadlock Information
Next
From: "Satoshi Nagayasu"
Date:
Subject: Re: Global Deadlock Information