Thread: Hot Standby and deadlock detection
Greg Stark has requested that I re-add max_standby_delay = -1. I deferred that in favour of relation-specific conflict resolution, though that seems too major a change from comments received. As discussed in various other posts, in order to re-add the -1 option we need to add deadlock detection. I woke up today with a simplifying assumption and have worked out a solution, the easy parts of which I have committed earlier. Part #2 is to make Startup process do deadlock detection. I attach a WIP patch for comments since signal handling has been a much-discussed area in recent weeks. Normal deadlock detection waits for deadlock_timeout before doing the detection. That is a simple performance tuning mechanism which I think is probably unnecessary with hot standby, at least in the first instance. The way this would work is if Startup waits on a buffer pin we immediately send out a request to all backends to cancel themselves if they are holding the buffer pin required && waiting on a lock. We then sleep until max_standby_delay. When max_standby_delay = -1 we only sleep until deadlock timeout and then check (on the Startup process). That keeps the signal handler code simple and reduces the number of test cases required to confirm everything is solid. This patch and the last commit together present everything we need to reenable max_standby_delay = -1, so that change is included here also. ? -- Simon Riggs www.2ndQuadrant.com
Attachment
Simon Riggs wrote: > The way this would work is if Startup waits on a buffer pin we > immediately send out a request to all backends to cancel themselves if > they are holding the buffer pin required && waiting on a lock. We then > sleep until max_standby_delay. When max_standby_delay = -1 we only sleep > until deadlock timeout and then check (on the Startup process). Should wake up to check for deadlocks after deadlock_timeout also when max_standby_delay > deadlock_timeout. max_standby_delay could be hours - we want to detect a deadlock sooner than that. Generally speaking, max_standby_delay==-1 codepath shouldn't be any different from the max_standby_delay>0 codepath. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2010-02-01 at 09:40 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > The way this would work is if Startup waits on a buffer pin we > > immediately send out a request to all backends to cancel themselves if > > they are holding the buffer pin required && waiting on a lock. We then > > sleep until max_standby_delay. When max_standby_delay = -1 we only sleep > > until deadlock timeout and then check (on the Startup process). > > Should wake up to check for deadlocks after deadlock_timeout also when > max_standby_delay > deadlock_timeout. max_standby_delay could be hours - > we want to detect a deadlock sooner than that. The patch does detect deadlocks sooner that that - "immediately", as described above. The simplified logic is if (MaxStandbyDelay == 0)immediate time out any buffer pin holders else if (MaxStandbyDelay == -1)wait for deadlock_timeout then check for deadlockers else if (standby_delay > MaxStandbyDelay)immediate time out on buffer pin else {immediate(*) check for deadlockerswait for remainder of time then time out any buffer pin holders } (*) Doing it this way makes the logic sigalarm handler code easier/more bug free. The only difference is a potential performance gain from not running deadlock detection early. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Mon, 2010-02-01 at 09:40 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> The way this would work is if Startup waits on a buffer pin we >>> immediately send out a request to all backends to cancel themselves if >>> they are holding the buffer pin required && waiting on a lock. We then >>> sleep until max_standby_delay. When max_standby_delay = -1 we only sleep >>> until deadlock timeout and then check (on the Startup process). >> Should wake up to check for deadlocks after deadlock_timeout also when >> max_standby_delay > deadlock_timeout. max_standby_delay could be hours - >> we want to detect a deadlock sooner than that. > > The patch does detect deadlocks sooner that that - "immediately", as > described above. Umm, so why not run the deadlock check immediately in max_standby_delay=-1 case as well? Why is that case handled differently from max_standby_delay>0 case? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2010-02-01 at 17:50 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2010-02-01 at 09:40 +0200, Heikki Linnakangas wrote: > >> Simon Riggs wrote: > >>> The way this would work is if Startup waits on a buffer pin we > >>> immediately send out a request to all backends to cancel themselves if > >>> they are holding the buffer pin required && waiting on a lock. We then > >>> sleep until max_standby_delay. When max_standby_delay = -1 we only sleep > >>> until deadlock timeout and then check (on the Startup process). > >> Should wake up to check for deadlocks after deadlock_timeout also when > >> max_standby_delay > deadlock_timeout. max_standby_delay could be hours - > >> we want to detect a deadlock sooner than that. > > > > The patch does detect deadlocks sooner that that - "immediately", as > > described above. > > Umm, so why not run the deadlock check immediately in > max_standby_delay=-1 case as well? Why is that case handled differently > from max_standby_delay>0 case? Cos the code to do that is easy. I'll do the deadlock check immediately and make it even easier. -- Simon Riggs www.2ndQuadrant.com
On Mon, 2010-02-01 at 17:50 +0200, Heikki Linnakangas wrote: > Umm, so why not run the deadlock check immediately in > max_standby_delay=-1 case as well? Why is that case handled differently > from max_standby_delay>0 case? Done, tested, working. Will commit tomorrow if no further questions or comments. -- Simon Riggs www.2ndQuadrant.com