Re: Global Deadlock Information - Mailing list pgsql-cluster-hackers
From | Satoshi Nagayasu |
---|---|
Subject | Re: Global Deadlock Information |
Date | |
Msg-id | 1831154951-1265475035-cardhu_decombobulator_blackberry.rim.net-253244861-@bda047.bisx.prodap.on.blackberry Whole thread Raw |
In response to | Re: Global Deadlock Information (Koichi Suzuki <koichi.szk@gmail.com>) |
List | pgsql-cluster-hackers |
I don't think something special/extra required for the global wait-for graph logging, because each sub-transaction should be processed (and recovered) as a local transaction on the each RM (Resource Manager) under the distributed transaction environment. A running transaction is not prepared/commited, so no more special/extra record is needed when a local transaction is aborted. -----Original Message----- From: Koichi Suzuki <koichi.szk@gmail.com> Date: Sun, 7 Feb 2010 01:23:36 To: Satoshi Nagayasu<satoshi.nagayasu@gmail.com> Cc: Markus Wanner<markus@bluegap.ch>; <pgsql-cluster-hackers@postgresql.org> Subject: Re: [pgsql-cluster-hackers] Global Deadlock Information Hi, I'm very interested in how log it takes to determine the global deadlock using global wait-for graph and if global deadlock detection disturb other on-going transactions. ---------- Koichi Suzuki 2010/2/7 Satoshi Nagayasu <satoshi.nagayasu@gmail.com>: > Hi Markus, > > I attempted in two ways to resolve global deadlock situation > in the PostgresForest development. > > (1) Use the lock_timeout to avoid from a global deadlock. > > The lock_timeout feature is a very simple way to avoid > from the global deadlock situation. > > I disagree "statement_timeout is the way to avoid global > deadlocks" too, because the statement_timeout kills > the healthy/long-running transaction by its timeout. > > Some developers (including me!) proposed the lock_timeout > GUC option. > > http://archives.postgresql.org/pgsql-hackers/2004-06/msg00935.php > http://archives.postgresql.org/pgsql-hackers/2010-01/msg01167.php > > I still believe the "lock timeout" feature could help > resolving a global deadlock in the cluster environment. > > (2) Use the global wait-for graph to detect a global deadlock. > > I had an experimental implemetation to use the global wait-for > graph to prevent the global deadlock. > > http://en.wikipedia.org/wiki/Deadlock#Distributed_deadlock > > I used the node(server) identifiers and the pg_locks information > to build the global wait-for graph, and the kill signal > (or pg_cancel()?) to abort a victim transaction causing > the deadlock. > > I don't think the callback function is needed to replace > the current deadlock resolution feature, > but I agree we need a consensus how we could avoid > the global deadlock situation in the cluster. > > Thanks, > > On 2010/02/06 18:13, Markus Wanner wrote: >> >> Hi, >> >> I'd like to start a thread for discussion of the second item on the >> ClusterFeatures [1] list: Global Deadlock Information. >> >> IIRC there are two aspects to this item: a) the plain notification of a >> deadlock and b) some way to control or intercept deadlock resolution. >> >> The problem this item seems to address is the potential for deadlocks >> between transactions on different nodes. Or put another way: between a >> local transaction and one that's to be applied from a remote node (or >> even between two remote ones - similar issue, though). To ensure >> congruency between nodes, they must take the same measures to resolve >> the deadlock, i.e. abort the same transaction(s). >> >> I certainly disagree with the statement on the wiki that the >> "statement_timeout is the way to avoid global deadlocks", because I >> don't want to have to wait that long until a deadlock gets resolved. >> Further it doesn't even guarantee congruency, depending on the >> implementation of your clustering solution. >> >> I fail to see how a plain notification API would help much. After all, >> this could result in one node notifying having aborted transaction A to >> resolve a deadlock while another node notifies having aborted >> transaction B. You'd end up having to abort two (or more) transaction >> instead of just one to resolve a conflict. >> >> It could get more useful, if enabling such a notification would turn off >> the existing deadlock resolver and leave the resolution of the deadlock >> to the clustering solution. I'd call that an interception. >> >> Such an interception API should IMO provide a way to register a >> callback, which replaces the current deadlock resolver. Upon detection >> of a deadlock, the callback should get a list of transaction ids that >> are part of the lock cycle. It's then up to that callback, to chose one >> and abort that to resolve the conflict. >> >> And now, Greg's List: >> > 1) What feature does this help add from a user perspective? >> >> Preventing cluster-wide deadlocks (while maintaining congruency of >> replicas). >> >> > 2) Which replication projects would be expected to see an improvement >> > from this addition? >> >> I suspect all multi-master solutions are affected, certainly Postgres-R >> would benefit. Single-master ones certainly don't need it. >> >> > 3) What makes it difficult to implement? >> >> I don't see any real stumbling block. Deciding on an API needs consensus. >> >> > 4) Are there any other items on the list this depends on, or that it >> > is expected to have a significant positive/negative interaction with? >> >> Not that I know of. >> >> > 5) What replication projects include a feature like this already, or a >> > prototype of a similar one, that might be used as a proof of concept >> > or example implementation? >> >> Old Postgres-R versions once had such an interception, but it currently >> lacks a solution for this problem. I don't know of any other project >> that's already solved this. >> >> > 6) Who is already working on it/planning to work on it/needs it for >> > their related project? >> >> I'm not currently working on it and don't plan to do so (at least) until >> PgCon 2010. >> >> >> Cluster hackers, is this a good summary which covers your needs as well? >> Something missing? >> >> Regards >> >> Markus Wanner >> >> [1]: feature wish list of cluster hackers: >> http://wiki.postgresql.org/wiki/ClusterFeatures >> >> > > > -- > NAGAYASU Satoshi <satoshi.nagayasu@gmail.com> > > -- > Sent via pgsql-cluster-hackers mailing list > (pgsql-cluster-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-cluster-hackers >
pgsql-cluster-hackers by date: