Thread: Possible bug in cascaded standby
Hello,
I am experimenting with the cascade standby and hit a problem which is reproducible with the current HEAD. I haven't tried other branches, but not sure if the test setup I am trying even works for older releases because of the timeline ID issue.
Anyways, I set up a cascaded standby such that it streams from the first standby and then stopped the original master and promoted the first standby to be the new master. If I then try to smart shutdown the cascaded standby, it fails after waiting for the walreceiver to terminate. What's worse, the walsender on the first standby gets into an infinite loop consuming 100% CPU.
I tried to investigate this a bit, but haven't made progress worth reporting. I can spend more time, but just wanted to make sure that I'm not trying something which is a known issue or limitation. BTW, this is on my Macbook Pro. Attached is the script that I used to set up the environment. You will need to modify it for your setup though.
Thanks,
Attachment
On Thu, Jun 6, 2013 at 1:03 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > Hello, > > I am experimenting with the cascade standby and hit a problem which is > reproducible with the current HEAD. I haven't tried other branches, but not > sure if the test setup I am trying even works for older releases because of > the timeline ID issue. > > Anyways, I set up a cascaded standby such that it streams from the first > standby and then stopped the original master and promoted the first standby > to be the new master. If I then try to smart shutdown the cascaded standby, > it fails after waiting for the walreceiver to terminate. What's worse, the > walsender on the first standby gets into an infinite loop consuming 100% > CPU. > > I tried to investigate this a bit, but haven't made progress worth > reporting. I can spend more time, but just wanted to make sure that I'm not > trying something which is a known issue or limitation. BTW, this is on my > Macbook Pro. Attached is the script that I used to set up the environment. > You will need to modify it for your setup though. I was not able to reproduce the problem. Maybe this is the timing problem. Could you share the server log of each server at the time when the problem happened? Just in case, I attached the server logs which I got when I ran the script to reproduce the problem. Regards, -- Fujii Masao
Attachment
On Wed, Jun 5, 2013 at 10:57 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
I was not able to reproduce the problem. Maybe this is the timing problem.
Hmm. I can't reproduce this on my Ubuntu box either. I will retry on the Mac machine in the evening. Surprisingly, I could reproduce it very easily on that box. What I'd observed is that the walreceiver on the cascaded standby is stuck at walreceiver.c:447, which in turn is waiting infinitely at libpqwalreceiver.c:501 i.e. PQgetResult() call.
I'll retry and report back if I see the problem on the offending platform.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
I'll retry and report back if I see the problem on the offending platform.
Just to close out this thread, I can't reproduce this on the Mac OS either. While I'd done a "make clean" earlier, "make distclean" did the trick. Sorry for the noise.
Thanks,
Pavan
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee