Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers
From | Amit kapila |
---|---|
Subject | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown |
Date | |
Msg-id | 6C0B27F7206C9E4CA54AE035729E9C382853645A@szxeml509-mbs Whole thread Raw |
In response to | Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown |
List | pgsql-hackers |
On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: On 02.10.2012 10:36, Amit kapila wrote: > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: >>> So let's think how this should ideally work from a user's point of view. >>> I think there should be just two settings: walsender_timeout and >>> walreceiver_timeout. walsender_timeout specifies how long a walsender >>> will keep a connection open if it doesn't hear from the walreceiver, and >>> walreceiver_timeout is the same for walreceiver. The system should >>> The Ping/Pong messages don't necessarily need to be new message types, >>> we can use the message types we currently have, perhaps with an >>> additional flag attached to them, to request the other side to reply >>> immediately. > >> Can't we make the decision to send reply immediately based on message type, because these message types will be unique. > >> To clarify my understanding, >> 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be HotStandby feedback message ('h'). >> 2. the reply message from walreceiver side will be current reply message ('r'). > Yep. I wonder why need separate message types for Hot Standby Feedback > 'h' and Reply 'r', though. Seems it would be simpler to have just one > messasge type that includes all the fields from both messages. moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for heart-beat purpose. >> 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or canuse some existing message only? >> if new, do we need to send any additional information along with it, for existing messages can we use keepalive messageit self as reply message but with an additional byte >> to indicate it is reply? > Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag. Okay. I have done it in Patch. Thank you for suggestions. I have addressed your suggestions in patch attached with this mail. Following changes are done to support replication timeout in sender as well as receiver: 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. 3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r'(i.e. reply) or 'h' (i.e. heart-beat). 4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout. In this case it will send keep-alive of type 'h'. 5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive. 6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message. So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage). 7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message. 8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7,reply is sent if wal_receiver_status_interval has lapsed (this part is same as earlier). 9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, thenit will send the hot-standby heartbeat. This heart-beat has been modified to send only sendTime. 10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alivemessage is sent of type 'r'. 11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sendertask. 12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receivertask. With Regards, Amit Kapila.
Attachment
pgsql-hackers by date: