Thread: BUG #7534: walreceiver takes long time to detect n/w breakdown
The following bug has been logged on the website: Bug reference: 7534 Logged by: Amit Kapila Email address: amit.kapila@huawei.com PostgreSQL version: 9.2.0 Operating system: Suse 10 Description: = 1. Both master and standby machine are connected normally, = 2. then you use the command: ifconfig ip down; make the network card of master and standby down, = Observation = master can detect connect abnormal, but the standby can't detect connect abnormal and show a connected channel long time. =
On Wed, Sep 12, 2012 at 1:54 PM, <amit.kapila@huawei.com> wrote: > The following bug has been logged on the website: > > Bug reference: 7534 > Logged by: Amit Kapila > Email address: amit.kapila@huawei.com > PostgreSQL version: 9.2.0 > Operating system: Suse 10 > Description: > > 1. Both master and standby machine are connected normally, > 2. then you use the command: ifconfig ip down; make the network card of > master and standby down, > > Observation > master can detect connect abnormal, but the standby can't detect connect > abnormal and show a connected channel long time. The master will detect it quicker, because it will get an error when it tries to send something. But the standby should detect it either when sending the feedback message (what's your wal_receiver_status_interval set to?) or when ythe kernel does (have you configured the tcp keepalive on the slave somehow?) Oh, and what do you actually mean by "long time"? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: > The following bug has been logged on the website: > > Bug reference: 7534 > Logged by: Amit Kapila > Email address: amit.kapila@huawei.com > PostgreSQL version: 9.2.0 > Operating system: Suse 10 > Description: > > 1. Both master and standby machine are connected normally, > 2. then you use the command: ifconfig ip down; make the network card of > master and standby down, > > Observation > master can detect connect abnormal, but the standby can't detect connect > abnormal and show a connected channel long time. What about setting keepalives_xxx libpq parameters? http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS Keepalives are not a perfect solution for the termination of connection, but it would help to a certain extent. If you need something like walreceiver-version of replication_timeout, such feature has not been implemented yet. Please feel free to implement that! Regards, -- Fujii Masao
On Wednesday, September 12, 2012 10:12 PM Magnus Hagander wrote: On Wed, Sep 12, 2012 at 1:54 PM, <amit.kapila@huawei.com> wrote: >> The following bug has been logged on the website: > >> Bug reference: 7534 >> Logged by: Amit Kapila >> Email address: amit.kapila@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse 10 >> Description: > >> 1. Both master and standby machine are connected normally, >> 2. then you use the command: ifconfig ip down; make the network card of >> master and standby down, > >> Observation >> master can detect connect abnormal, but the standby can't detect connect >> abnormal and show a connected channel long time. > The master will detect it quicker, because it will get an error when > it tries to send something. > But the standby should detect it either when sending the feedback > message (what's your wal_receiver_status_interval set to?) or when > ythe kernel does (have you configured the tcp keepalive on the slave > somehow?) wal_receiver_status_interval - 10s (we have not changed this. Used as default). We have tried by using tcp keepalive as well, it might not be able to detect as receiver is anyway trying to send Receiver status. It fails during send socket call from XLogWalRcvSendReply() after calling the same many times as internally might be in send() until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data. Also in walsender, it is failing to replication_timeout parameter not due to send failure. So in my opinion, the full-proof solution would be to have mechanism (replication_timeout) similar to walsender in walreceiver. > Oh, and what do you actually mean by "long time"? 15-20 mins. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wednesday, September 12, 2012 10:15 PM Fujii Masao On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >> The following bug has been logged on the website: >> >> Bug reference: 7534 >> Logged by: Amit Kapila >> Email address: amit.kapila@huawei.com >> PostgreSQL version: 9.2.0 >> Operating system: Suse 10 >> Description: > >> 1. Both master and standby machine are connected normally, >> 2. then you use the command: ifconfig ip down; make the network card of >> master and standby down, > >> Observation >> master can detect connect abnormal, but the standby can't detect connect >> abnormal and show a connected channel long time. > What about setting keepalives_xxx libpq parameters? > http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE YWORDS > Keepalives are not a perfect solution for the termination of connection, but > it would help to a certain extent. We have tried by enabling keepalive, but it didn't worked maybe because walreceiver is trying to send reveiver status. It fails in sending that after many attempts of same. > If you need something like walreceiver-version of replication_timeout, such feature has not been implemented yet. > Please feel free to implement that! I would like to implement such feature for walreceiver, but there is one confusion that whether to use same configuration parameter(replication_timeout) for walrecevier as for master or introduce a new configuration parameter (receiver_replication_timeout). The only point in having different timeout parameters for walsender and walreceiver is for the case of standby which has both walsender and walreceiver to send logs to cascaded standby, in such case somebody might want to have different timeout parameters for walsender and walreceiver.OTOH it will create confusion to have too many parameters. My opinion is to have one timeout parameter for both walsender and walrecevier. Let me know your suggestion/opinion about same. Note- I am marking cc to pgsql-hackers, as it will be a feature request. With Regards, Amit Kapila.
On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: > On Wednesday, September 12, 2012 10:15 PM Fujii Masao > On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 7534 >>> Logged by: Amit Kapila >>> Email address: amit.kapila@huawei.com >>> PostgreSQL version: 9.2.0 >>> Operating system: Suse 10 >>> Description: >> >>> 1. Both master and standby machine are connected normally, >>> 2. then you use the command: ifconfig ip down; make the network card of >>> master and standby down, >> >>> Observation >>> master can detect connect abnormal, but the standby can't detect connect >>> abnormal and show a connected channel long time. > >> What about setting keepalives_xxx libpq parameters? >> > http://www.postgresql.org/docs/devel/static/libpq-connect.html#LIBPQ-PARAMKE > YWORDS > >> Keepalives are not a perfect solution for the termination of connection, > but >> it would help to a certain extent. > > We have tried by enabling keepalive, but it didn't worked maybe because > walreceiver is trying to send reveiver status. > It fails in sending that after many attempts of same. > >> If you need something like walreceiver-version of replication_timeout, > such feature has not been implemented yet. >> Please feel free to implement that! > > I would like to implement such feature for walreceiver, but there is one > confusion that whether to use > same configuration parameter(replication_timeout) for walrecevier as for > master or introduce a new > configuration parameter (receiver_replication_timeout). I like the latter. I believe some users want to set the different timeout values, for example, in the case where the master and standby servers are placed in the same room, but cascaded standby is placed in other continent. Regards, -- Fujii Masao
On Thursday, September 13, 2012 10:57 PM Fujii Masao On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: > On Wednesday, September 12, 2012 10:15 PM Fujii Masao > On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>> The following bug has been logged on the website: >>> >>>> Bug reference: 7534 >>>> Logged by: Amit Kapila >>>> Email address: amit.kapila@huawei.com >>>> PostgreSQL version: 9.2.0 >>>> Operating system: Suse 10 >>>> Description: >> >>>> 1. Both master and standby machine are connected normally, >>>> 2. then you use the command: ifconfig ip down; make the network card of >>>> master and standby down, >> >>>> Observation >>>> master can detect connect abnormal, but the standby can't detect connect >>>> abnormal and show a connected channel long time. > > >> I would like to implement such feature for walreceiver, but there is one >> confusion that whether to use >> same configuration parameter(replication_timeout) for walrecevier as for >> master or introduce a new >> configuration parameter (receiver_replication_timeout). >I like the latter. I believe some users want to set the different >timeout values, >for example, in the case where the master and standby servers are placed in >the same room, but cascaded standby is placed in other continent. Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. The main changes are: 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. This is same as walsender functionality. As this is a feature, So I am uploading the attached patch in coming CommitFest. Suggestions/Comments? With Regards, Amit Kapila.
Attachment
On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote: > > On Thursday, September 13, 2012 10:57 PM Fujii Masao > On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>>> The following bug has been logged on the website: >>>> >>>>> Bug reference: 7534 >>>>> Logged by: Amit Kapila >>>>> Email address: amit.kapila@huawei.com >>>>> PostgreSQL version: 9.2.0 >>>>> Operating system: Suse 10 >>>>> Description: >>> >>>>> 1. Both master and standby machine are connected normally, >>>>> 2. then you use the command: ifconfig ip down; make the network card of >>>>> master and standby down, >>> >>>>> Observation >>>>> master can detect connect abnormal, but the standby can't detect connect >>>>> abnormal and show a connected channel long time. >> >> >>> I would like to implement such feature for walreceiver, but there is one >>> confusion that whether to use >>> same configuration parameter(replication_timeout) for walrecevier as for >>> master or introduce a new >>> configuration parameter (receiver_replication_timeout). > >>I like the latter. I believe some users want to set the different >>timeout values, >>for example, in the case where the master and standby servers are placed in >>the same room, but cascaded standby is placed in other continent. > > Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. > The main changes are: > 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. > 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. > This is same as walsender functionality. > > As this is a feature, So I am uploading the attached patch in coming CommitFest. > > Suggestions/Comments? You also need to change walsender so that it periodically sends the heartbeat message, like walreceiver does each wal_receiver_status_interval. Otherwise, walreceiver will detect the timeout wrongly whenever there is no traffic in the master. Regards, -- Fujii Masao
On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote: > > On Thursday, September 13, 2012 10:57 PM Fujii Masao > On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>>> The following bug has been logged on the website: >>>> I would like to implement such feature for walreceiver, but there is one >>>> confusion that whether to use >>>> same configuration parameter(replication_timeout) for walrecevier as for >>>> master or introduce a new >>>> configuration parameter (receiver_replication_timeout). > >>>I like the latter. I believe some users want to set the different >>>timeout values, >>>for example, in the case where the master and standby servers are placed in >>>the same room, but cascaded standby is placed in other continent. > >> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. >> The main changes are: >> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. >> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. >> This is same as walsender functionality. > >> As this is a feature, So I am uploading the attached patch in coming CommitFest. > >> Suggestions/Comments? > You also need to change walsender so that it periodically sends the heartbeat > message, like walreceiver does each wal_receiver_status_interval. Otherwise, > walreceiver will detect the timeout wrongly whenever there is no traffic in the > master. Doesn't current keepalive message from walsender will suffice that need? With Regards, Amit Kapila.
On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit.kapila@huawei.com> wrote: > On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: > On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote: >> >> On Thursday, September 13, 2012 10:57 PM Fujii Masao >> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >>> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>>>> The following bug has been logged on the website: > >>>>> I would like to implement such feature for walreceiver, but there is one >>>>> confusion that whether to use >>>>> same configuration parameter(replication_timeout) for walrecevier as for >>>>> master or introduce a new >>>>> configuration parameter (receiver_replication_timeout). >> >>>>I like the latter. I believe some users want to set the different >>>>timeout values, >>>>for example, in the case where the master and standby servers are placed in >>>>the same room, but cascaded standby is placed in other continent. >> >>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. >>> The main changes are: >>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. >>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit thewalreceiver. >>> This is same as walsender functionality. >> >>> As this is a feature, So I am uploading the attached patch in coming CommitFest. >> >>> Suggestions/Comments? > >> You also need to change walsender so that it periodically sends the heartbeat >> message, like walreceiver does each wal_receiver_status_interval. Otherwise, >> walreceiver will detect the timeout wrongly whenever there is no traffic in the >> master. > > Doesn't current keepalive message from walsender will suffice that need? No. Though the keepalive interval should be smaller than the timeout, IIRC there is no way to specify the keepalive interval now. Regards, -- Fujii Masao
On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote: On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit.kapila@huawei.com> wrote: > On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: > On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote: >> >> On Thursday, September 13, 2012 10:57 PM Fujii Masao >> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >>> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>>>>> The following bug has been logged on the website: > >>>>>> I would like to implement such feature for walreceiver, but there is one >>>>>> confusion that whether to use >>>>>> same configuration parameter(replication_timeout) for walrecevier as for >>>>>> master or introduce a new >>>>>> configuration parameter (receiver_replication_timeout). >> >>>>>I like the latter. I believe some users want to set the different >>>>>timeout values, >>>>>for example, in the case where the master and standby servers are placed in >>>>>the same room, but cascaded standby is placed in other continent. >> >>>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. >>>> The main changes are: >>>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. >>>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit thewalreceiver. >>> > This is same as walsender functionality. >> >>>> As this is a feature, So I am uploading the attached patch in coming CommitFest. >> >>>> Suggestions/Comments? > >>> You also need to change walsender so that it periodically sends the heartbeat >>> message, like walreceiver does each wal_receiver_status_interval. Otherwise, >>> walreceiver will detect the timeout wrongly whenever there is no traffic in the >>> master. > >> Doesn't current keepalive message from walsender will suffice that need? >No. Though the keepalive interval should be smaller than the timeout, >IIRC there is >no way to specify the keepalive interval now. Currently AFAICS in the code on idle system, it should send keepalive after 10s which is hardcoded value as sleeptime. You are right that if its not configurable, and somebody configures replication_timeout as value lower than 10s then thelogic will fail. So is it okay if a new config parameter similar to wal_receiver_status_interval be added and map it directly to sleeptimein the current code. There will be no need for any new heartbeat message, existing keepalive will sufice that purpose. With Regards, Amit Kapila.
On Sunday, September 16, 2012 12:14 AM Fujii Masao wrote: On Sat, Sep 15, 2012 at 4:26 PM, Amit kapila <amit.kapila@huawei.com> wrote: > On Saturday, September 15, 2012 11:27 AM Fujii Masao wrote: > On Fri, Sep 14, 2012 at 10:01 PM, Amit kapila <amit.kapila@huawei.com> wrote: >> >> On Thursday, September 13, 2012 10:57 PM Fujii Masao >> On Thu, Sep 13, 2012 at 1:22 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >>> On Wednesday, September 12, 2012 10:15 PM Fujii Masao >>> On Wed, Sep 12, 2012 at 8:54 PM, <amit.kapila@huawei.com> wrote: >>>>>>> The following bug has been logged on the website: > >>>>>> I would like to implement such feature for walreceiver, but there is one >>>>>> confusion that whether to use >>>>>> same configuration parameter(replication_timeout) for walrecevier as for >>>>>> master or introduce a new >>>>>> configuration parameter (receiver_replication_timeout). >> >>>>>I like the latter. I believe some users want to set the different >>>>>timeout values, >>>>>for example, in the case where the master and standby servers are placed in >>>>>the same room, but cascaded standby is placed in other continent. >> >>>> Thank you for your suggestion. I have implemented as per your suggestion to have separate timeout parameter for walreceiver. >>>> The main changes are: >>>> 1. Introduce a new configuration parameter wal_receiver_replication_timeout for walreceiver. >>>> 2. In function WalReceiverMain(), check if there is no communication till wal_receiver_replication_timeout, exit the walreceiver. >>>> This is same as walsender functionality. >> >>>> As this is a feature, So I am uploading the attached patch in coming CommitFest. >> >>>> Suggestions/Comments? > >>> You also need to change walsender so that it periodically sends the heartbeat >>> message, like walreceiver does each wal_receiver_status_interval. Otherwise, >>> walreceiver will detect the timeout wrongly whenever there is no traffic in the >>> master. > >> Doesn't current keepalive message from walsender will suffice that need? > No. Though the keepalive interval should be smaller than the timeout, > IIRC there is > no way to specify the keepalive interval now. To define the behavior correctly, according to me there are 2 options now: Approach-1 : Document that both(sender and receiver) the timeout parameters should be greater than wal_receiver_status_interval. If both are greater, then I think it might never timeout due to Idle. Approach-2 : Provide a variable wal_send_status_interval, such that if this is 0, then the current behavior would prevail and if its non-zero then KeepAlive message would be send maximum after that time. The modified code of WALSendLoop will be as follows: TimestampTz timeout = 0; long sleeptime = 10000; /* 10 s */ int wakeEvents; /* sleeptime should be equal to wal send interval if it is not zero otherwise default as 10 sec*/ if (wal_send_status_interval > 0) { sleeptime = wal_send_status_interval; } wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE| WL_TIMEOUT; if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITEABLE; else if (wal_send_status_interval > 0) { WalSndKeepalive(output_message); /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) break; } /* Determine time until replication timeout */ if (replication_timeout > 0) { timeout = TimestampTzPlusMilliseconds(last_reply_timestamp, replication_timeout); if (wal_send_status_interval <= 0) { sleeptime = 1 + (replication_timeout / 10); } } /* Sleep until something happens or replication timeout */ WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents, MyProcPort->sock, sleeptime); /* * Check for replication timeout. Note we ignore the corner case * possibility that the client replied just as we reached the * timeout ... he's supposed to reply *before* that. */ if (replication_timeout > 0 && GetCurrentTimestamp()>= timeout) { /* * Since typically expiration of replication timeout means * communication problem, we don't send the error message to * the standby. */ ereport(COMMERROR, (errmsg("terminating walsender process due to replication timeout"))); break; } } Which way you think is better or you have any other idea to handle. With Regards, Amit Kapila.
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila <amit.kapila@huawei.com> wrote: > To define the behavior correctly, according to me there are 2 options now: > > Approach-1 : > Document that both(sender and receiver) the timeout parameters should be > greater than wal_receiver_status_interval. > If both are greater, then I think it might never timeout due to Idle. In this approach, keepalive messages are sent each wal_receiver_status_interval? > Approach-2 : > Provide a variable wal_send_status_interval, such that if this is 0, then > the current behavior would prevail and if its non-zero then KeepAlive > message would be send maximum after that time. > The modified code of WALSendLoop will be as follows: <snip> > Which way you think is better or you have any other idea to handle. I think #2 is better because it's more intuitive to a user. Regards, -- Fujii Masao
On Tuesday, September 18, 2012 6:03 PM Fujii Masao wrote: On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >> To define the behavior correctly, according to me there are 2 options now: > >> Approach-1 : >> Document that both(sender and receiver) the timeout parameters should be >> greater than wal_receiver_status_interval. >> If both are greater, then I think it might never timeout due to Idle. > In this approach, keepalive messages are sent each wal_receiver_status_interval? wal_receiver_status_interval or sleeptime whichever is smaller. >> Approach-2 : >> Provide a variable wal_send_status_interval, such that if this is 0, then >> the current behavior would prevail and if its non-zero then KeepAlive >> message would be send maximum after that time. >> The modified code of WALSendLoop will be as follows: <snip> >> Which way you think is better or you have any other idea to handle. > I think #2 is better because it's more intuitive to a user. I shall update the Patch as per Approach-2 and upload the same. With Regards, Amit Kapila.
On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila <amit.kapila@huawei.com> wrote: >> Approach-2 : >> Provide a variable wal_send_status_interval, such that if this is 0, then >> the current behavior would prevail and if its non-zero then KeepAlive >> message would be send maximum after that time. >> The modified code of WALSendLoop will be as follows: <snip> >> Which way you think is better or you have any other idea to handle. >I think #2 is better because it's more intuitive to a user. Please find a patch attached for implementation of Approach-2. With Regards, Amit Kapila.
Attachment
On 21.09.2012 14:18, Amit kapila wrote: > On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: > On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila<amit.kapila@huawei.com> wrote: > >>> Approach-2 : >>> Provide a variable wal_send_status_interval, such that if this is 0, then >>> the current behavior would prevail and if its non-zero then KeepAlive >>> message would be send maximum after that time. >>> The modified code of WALSendLoop will be as follows: > > <snip> >>> Which way you think is better or you have any other idea to handle. > >> I think #2 is better because it's more intuitive to a user. > > Please find a patch attached for implementation of Approach-2. Hmm, I think we need to step back a bit. I've never liked the way replication_timeout works, where it's the user's responsibility to set wal_receiver_status_interval < replication_timeout. It's not very user-friendly. I'd rather not copy that same design to this walreceiver timeout. If there's two different timeouts like that, it's even worse, because it's easy to confuse the two. So let's think how this should ideally work from a user's point of view. I think there should be just two settings: walsender_timeout and walreceiver_timeout. walsender_timeout specifies how long a walsender will keep a connection open if it doesn't hear from the walreceiver, and walreceiver_timeout is the same for walreceiver. The system should figure out itself how often to send keepalive messages so that those timeouts are not reached. In walsender, after half of walsender_timeout has elapsed and we haven't received anything from the client, the walsender process should send a "ping" message to the client. Whenever the client receives a Ping, it replies. The walreceiver does the same; when half of walreceiver_timeout has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip resets the timer in both ends, regardless of which side initiated it, so if e.g walsender_timeout < walreceiver_timeout, the client will never have to initiate a Ping message, because walsender will always reach the walsender_timeout/2 point first and initiate the heartbeat message. The Ping/Pong messages don't necessarily need to be new message types, we can use the message types we currently have, perhaps with an additional flag attached to them, to request the other side to reply immediately. - Heikki
On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Hmm, I think we need to step back a bit. I've never liked the way > replication_timeout works, where it's the user's responsibility to set > wal_receiver_status_interval < replication_timeout. It's not very > user-friendly. I'd rather not copy that same design to this walreceiver > timeout. If there's two different timeouts like that, it's even worse, > because it's easy to confuse the two. I agree, but also note that wal_receiver_status_interval serves another user-visible purpose as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 1, 2012 at 7:38 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Hmm, I think we need to step back a bit. I've never liked the way > replication_timeout works, where it's the user's responsibility to set > wal_receiver_status_interval < replication_timeout. It's not very > user-friendly. I'd rather not copy that same design to this walreceiver > timeout. If there's two different timeouts like that, it's even worse, > because it's easy to confuse the two. Agreed. I'd like to specify the replication timeout like we do TCP keepalives, i.e., what about introducing something like following parameters? walsender_keepalives_idle walsender_keepalives_interval walsender_keeaplives_count walreceiver_keepalives_idle walreceiver_keepalives_interval walreceiver_keepalives_count I believe many users are basically familiar with TCP keepalives and how to specify it. So I think that this approach would be intuitive to users. Also this approach includes your proposal. If you specify walsender_keepalives_idle = walsender_timeout / 2 walsender_keepalives_interval = -1 (disable; Ping is never sent again if there is no reply after first Ping is sent) walsender_keepalives_count = 1 the replication timeout works as you proposed. But of course the downside of this approach is that the number of parameter for replication timeout is increased from two (replication_timeout and wal_receiver_status_interval) to six, and those parameters are confusingly similar to existing tcp_keepalives parameters, which might cause another confusion to users. One idea to solve this problem is to use existing tcp_keepalives paramters values for the replication timeout. Regards, -- Fujii Masao
On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > I believe many users are basically familiar with TCP keepalives and how to > specify it. So I think that this approach would be intuitive to users. My experience is that many users are unfamiliar with TCP keepalives and that when given the options they tend to do it wrong. I think a simpler system would be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of lun oct 01 21:02:54 -0300 2012: > On Mon, Oct 1, 2012 at 12:57 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > I believe many users are basically familiar with TCP keepalives and how to > > specify it. So I think that this approach would be intuitive to users. > > My experience is that many users are unfamiliar with TCP keepalives > and that when given the options they tend to do it wrong. I think a > simpler system would be better. +1 -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: On 21.09.2012 14:18, Amit kapila wrote: > On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: > On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila<amit.kapila@huawei.com> wrote: > >>>> Approach-2 : >>>> Provide a variable wal_send_status_interval, such that if this is 0, then >>>> the current behavior would prevail and if its non-zero then KeepAlive >>>> message would be send maximum after that time. >>>> The modified code of WALSendLoop will be as follows: > > <snip> >>>> Which way you think is better or you have any other idea to handle. > >>> I think #2 is better because it's more intuitive to a user. > >> Please find a patch attached for implementation of Approach-2. >So let's think how this should ideally work from a user's point of view. >I think there should be just two settings: walsender_timeout and >walreceiver_timeout. walsender_timeout specifies how long a walsender >will keep a connection open if it doesn't hear from the walreceiver, and >walreceiver_timeout is the same for walreceiver. The system should >figure out itself how often to send keepalive messages so that those >timeouts are not reached. By this it implies that we should remove wal_receiver_status_interval. Currently it is also used incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove thisvariable receiver might start sending that message sonner than required. Is that okay behavior? >In walsender, after half of walsender_timeout has elapsed and we haven't >received anything from the client, the walsender process should send a >"ping" message to the client. Whenever the client receives a Ping, it >replies. The walreceiver does the same; when half of walreceiver_timeout >has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip >resets the timer in both ends, regardless of which side initiated it, so >if e.g walsender_timeout < walreceiver_timeout, the client will never >have to initiate a Ping message, because walsender will always reach the >walsender_timeout/2 point first and initiate the heartbeat message. Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent. walreceiver should reset timer after sending reply for heartbeat message. Similar to above timers will be reset when receiver sent the heartbeat message. >The Ping/Pong messages don't necessarily need to be new message types, >we can use the message types we currently have, perhaps with an >additional flag attached to them, to request the other side to reply >immediately. Can't we make the decision to send reply immediately based on message type, because these message types will be unique. To clarify my understanding, 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standbyfeedback message ('h'). 2. the reply message from walreceiver side will be current reply message ('r'). 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can usesome existing message only? if new, do we need to send any additional information along with it, for existing messagescan we use keepalive message it self as reply message but with an additional byte to indicate it is reply? With Regards, Amit Kapila.
On Monday, October 01, 2012 8:36 PM Robert Haas wrote: On Mon, Oct 1, 2012 at 6:38 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> Hmm, I think we need to step back a bit. I've never liked the way >> replication_timeout works, where it's the user's responsibility to set >> wal_receiver_status_interval < replication_timeout. It's not very >> user-friendly. I'd rather not copy that same design to this walreceiver >> timeout. If there's two different timeouts like that, it's even worse, >> because it's easy to confuse the two. > I agree, but also note that wal_receiver_status_interval serves > another user-visible purpose as well. By above do you mean to say that wal_receiver_status_interval is used for reply of data sent by server to indicate till whatpoint receiver has flushed data or something else? With Regards, Amit Kapila.
On 02.10.2012 10:36, Amit kapila wrote: > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: >> So let's think how this should ideally work from a user's point of view. >> I think there should be just two settings: walsender_timeout and >> walreceiver_timeout. walsender_timeout specifies how long a walsender >> will keep a connection open if it doesn't hear from the walreceiver, and >> walreceiver_timeout is the same for walreceiver. The system should >> figure out itself how often to send keepalive messages so that those >> timeouts are not reached. > > By this it implies that we should remove wal_receiver_status_interval. Currently it is also used > incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove thisvariable > receiver might start sending that message sonner than required. > Is that okay behavior? I guess we should keep that setting, then, so that you can get status updates more often than would be required for heartbeat purposes. >> In walsender, after half of walsender_timeout has elapsed and we haven't >> received anything from the client, the walsender process should send a >> "ping" message to the client. Whenever the client receives a Ping, it >> replies. The walreceiver does the same; when half of walreceiver_timeout >> has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip >> resets the timer in both ends, regardless of which side initiated it, so >> if e.g walsender_timeout< walreceiver_timeout, the client will never >> have to initiate a Ping message, because walsender will always reach the >> walsender_timeout/2 point first and initiate the heartbeat message. > > Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent. Right. > walreceiver should reset timer after sending reply for heartbeat message.> Similar to above timers will be reset when receiversent the heartbeat message. walreceiver should reset the timer when it *receives* any message from walsender. If it sends the reply right away, I guess that's the same thing, but I'd phrase it so that it's the reception of a message from the other end that resets the timer. >> The Ping/Pong messages don't necessarily need to be new message types, >> we can use the message types we currently have, perhaps with an >> additional flag attached to them, to request the other side to reply >> immediately. > > Can't we make the decision to send reply immediately based on message type, because these message types will be unique. > > To clarify my understanding, > 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standbyfeedback message ('h'). > 2. the reply message from walreceiver side will be current reply message ('r'). Yep. I wonder why need separate message types for Hot Standby Feedback 'h' and Reply 'r', though. Seems it would be simpler to have just one messasge type that includes all the fields from both messages. > 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or canuse some existing message only? > if new, do we need to send any additional information along with it, for existing messages can we use keepalive messageit self as reply message but with an additional byte > to indicate it is reply? Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag. - Heikki
On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: On 02.10.2012 10:36, Amit kapila wrote: > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: >>> So let's think how this should ideally work from a user's point of view. >>> I think there should be just two settings: walsender_timeout and >>> walreceiver_timeout. walsender_timeout specifies how long a walsender >>> will keep a connection open if it doesn't hear from the walreceiver, and >>> walreceiver_timeout is the same for walreceiver. The system should >>> The Ping/Pong messages don't necessarily need to be new message types, >>> we can use the message types we currently have, perhaps with an >>> additional flag attached to them, to request the other side to reply >>> immediately. > >> Can't we make the decision to send reply immediately based on message type, because these message types will be unique. > >> To clarify my understanding, >> 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be HotStandby feedback message ('h'). >> 2. the reply message from walreceiver side will be current reply message ('r'). > Yep. I wonder why need separate message types for Hot Standby Feedback > 'h' and Reply 'r', though. Seems it would be simpler to have just one > messasge type that includes all the fields from both messages. moved the contents for Hot Standby Feedback 'h' to Reply 'r' and use 'h' for heart-beat purpose. >> 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or canuse some existing message only? >> if new, do we need to send any additional information along with it, for existing messages can we use keepalive messageit self as reply message but with an additional byte >> to indicate it is reply? > Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag. Okay. I have done it in Patch. Thank you for suggestions. I have addressed your suggestions in patch attached with this mail. Following changes are done to support replication timeout in sender as well as receiver: 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. 3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r'(i.e. reply) or 'h' (i.e. heart-beat). 4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout. In this case it will send keep-alive of type 'h'. 5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive. 6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message. So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage). 7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message. 8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7,reply is sent if wal_receiver_status_interval has lapsed (this part is same as earlier). 9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout, thenit will send the hot-standby heartbeat. This heart-beat has been modified to send only sendTime. 10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alivemessage is sent of type 'r'. 11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sendertask. 12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break at receivertask. With Regards, Amit Kapila.
Attachment
> -----Original Message----- > From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs- > owner@postgresql.org] On Behalf Of Amit kapila > Sent: Thursday, October 04, 2012 3:43 PM > To: Heikki Linnakangas > Cc: Fujii Masao; pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w > breakdown > > On Tuesday, October 02, 2012 1:56 PM Heikki Linnakangas wrote: > On 02.10.2012 10:36, Amit kapila wrote: > > On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: > >>> So let's think how this should ideally work from a user's point of > view. > >>> I think there should be just two settings: walsender_timeout and > >>> walreceiver_timeout. walsender_timeout specifies how long a > >>> walsender will keep a connection open if it doesn't hear from the > > Thank you for suggestions. > I have addressed your suggestions in patch attached with this mail. > > Following changes are done to support replication timeout in sender as > well as receiver: Testing Done for the Patch -------------------------------- 1. Verified the value of new configuration parameter and changed configuration parameter using the show command (using Show of specific parameter as well as show all). 2. Verified the new configuration parameter in --describe-config. 3. Verified the existing parameter replication_timeout's new name in --describe-config. 4. Start primary and standby node with default timeout, leave it for sometime in idle situation. It should not error out due to network break error. 5. a. Start primary and standby node with default timeout, bring down the network. b. Both sender and receiver should be able to detect network break-down almost at same time. c. Once the network is up again, connection should get re-established successfully. 5. a. Start primary and standby node with wal_sender_timeout less than wal_receiver_timeout, bring down the network. b. Sender should be able to detect network break-down before receiver task. c. Once the network is up again, connection should get re-established successfully. 6. a. Start primary and standby node with wal_receiver_timeout less than wal_sender_timeout, bring down the network. b. Receiver should be able to detect network break-down before sender task. c. Once the network is up again, connection should get re-established successfully. 7. a. In 5th test case, change the value of wal_receiver_status_interval to more than wal_receiver_timeout and hence more than wal_sender_timeout. b. Then bring down the network down. c. Sendertask should be able to detect network break-down once wal_sender_timeout has lapsed. d. Once the network is up again, connection should get re-established successfully. Intent of this test is to check there is no dependency of wal_sender_timeout on wal_receiver_status_interval for detection of Network break. All the above tests are passed. With Regards, Amit Kapila.
On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila <amit.kapila@huawei.com> wrote: > 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. > 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. -1 from me on a backward compatibility break here. I don't know what else to call the new GUC (replication_server_timeout?) but I'm not excited about breaking existing conf files, nor do I particularly like the proposed new names. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Monday, October 08, 2012 7:38 PM Robert Haas wrote: > On Thu, Oct 4, 2012 at 6:12 AM, Amit kapila <amit.kapila@huawei.com> > wrote: > > 1. One new configuration parameter wal_receiver_timeout is added to > detect timeout at receiver task. > > 2. Existing parameter replication_timeout is renamed to > wal_sender_timeout. > > -1 from me on a backward compatibility break here. I don't know what > else to call the new GUC (replication_server_timeout?) but I'm not > excited about breaking existing conf files, nor do I particularly like > the proposed new names. How about following: 1. replication_client_timeout -- shouldn't it be client as new configuration is for wal receiver 2. replication_standby_timeout If we introduce a new parameter for wal receiver, wouldn't replication_timeout be confusing for user? With Regards, Amit Kapila.
On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila <amit.kapila@huawei.com> wrote: > How about following: > 1. replication_client_timeout -- shouldn't it be client as new configuration > is for wal receiver > 2. replication_standby_timeout ISTM that the client and the standby are the same thing. > If we introduce a new parameter for wal receiver, wouldn't > replication_timeout be confusing for user? Maybe. I actually don't think that I understand what problem we're trying to solve here. If the connection between the master and the standby is lost, shouldn't the standby realize that it's no longer receiving keepalives from the master and terminate the connection? I thought I had tested this at some point and it was working, so either it's subsequently gotten broken again or the scenario you're talking about is different in some way that I don't currently understand. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tuesday, October 09, 2012 6:00 PM Robert Haas wrote: > On Mon, Oct 8, 2012 at 10:42 AM, Amit Kapila <amit.kapila@huawei.com> > wrote: > > How about following: > > 1. replication_client_timeout -- shouldn't it be client as new > configuration > > is for wal receiver > > 2. replication_standby_timeout > > ISTM that the client and the standby are the same thing. Yeah same, but may be one (replication_standby_timeout) can be more easily understandable by user. > > If we introduce a new parameter for wal receiver, wouldn't > > replication_timeout be confusing for user? > > Maybe. > I actually don't think that I understand what problem we're > trying to solve here. If the connection between the master and the > standby is lost, shouldn't the standby realize that it's no longer > receiving keepalives from the master and terminate the connection? For wal receiver keepalives are also like one kind of message, so the behavior is such that when it checks that it doesn't receive any message, it tries to send reply/feedback message to master after an interval of wal_receiver_status_interval. So after every wal_receiver_status_interval, wal receiver sends a reply, but still the socket send doesn't fail. It fails only after many send calls as internally might be in send(), until the sockets internal buffer is full, it keeps accumulating even if other side recv has not received the data. So that's the reason we decided to introduce a timeout parameter in wal receiver similar to what we have currently in walsender. > I > thought I had tested this at some point and it was working, so either > it's subsequently gotten broken again or the scenario you're talking > about is different in some way that I don't currently understand. Standby takes quite longer around 15 minutes to detect whereas master is able to detect quite sooner in 2-3 mins and master also mainly detects due to timeout functionality in wal sender. With Regards, Amit Kapila.
On 04.10.2012 13:12, Amit kapila wrote: > Following changes are done to support replication timeout in sender as well as receiver: > > 1. One new configuration parameter wal_receiver_timeout is added to detect timeout at receiver task. > 2. Existing parameter replication_timeout is renamed to wal_sender_timeout. Ok. The other option would be to have just one GUC, I'm open to bikeshedding on this one. On one hand, there's no reason the timeouts have to the same, so it would be nice to have separate settings, but on the other hand, I can't imagine a case where a single setting wouldn't work just as well. > 3. Now PrimaryKeepaliveMessage structure is modified to add one more field to indicate whether keep-alive is of type 'r'(i.e. > reply) or 'h' (i.e. heart-beat). > 4. Now the keep-alive message from sender will be sent to standby if it was idle for more than or equal to half of wal_sender_timeout. > In this case it will send keep-alive of type 'h'. > 5. Once the standby receiver a keep-alive, it needs to send an immediate reply to primary to indicate connection is alive. > 6. Now Reply message to send wal offset and Feedback message to send oldest transaction are merged into single Reply message. > So now the structure StandbyReplyMessage is changed to add two more fields as xmin and epoch. Also StandbyHSFeedbackMessage > structure is changed to remove xmin and epoch fields (as these are moved to StandbyReplyMessage). > 7. Because of changes as in step-6, once receiver task receives some data from primary then it will only send Reply Message. Oh I see. That's not what I meant by combining the keep-alive and hs feedback messages, I imagined that the hearbeats would *also* use the same message type. Ie. there would be only a single message type from standby to primary, used for: 1. updating the receive/apply pointer 2. HS feedback 3. for pinging the server when wal_receiver_timeout is approaching 4. to reply to to pings from the server. Since we didn't quite achieve that, it seems best leave out this merging of reply and HS feedback message types, to keep the patch small. We might still want to do that, but better do that as a separate patch. > 8. Same Reply message is sent in step-5 and step-7 but incase of step-5, then reply is sent immediately but incase of step-7,reply is sent > if wal_receiver_status_interval has lapsed (this part is same as earlier). > 9. Similar to sender, if receiver finds itself idle for more than or equal to half of configured wal_receiver_timeout,then it will send the > hot-standby heartbeat. This heart-beat has been modified to send only sendTime. > 10. Once sender task receiver heart-beat message from standby then it sends back the reply immediately. In this keep-alivemessage is > sent of type 'r'. > 11. If even after wal_sender_timeout no message received from standby then it will be considered as network break at sendertask. > 12. If even after wal_receiver_timeout no message received from primary then it will be considered as network break atreceiver task. Attached is an updated patch. I reverted the merging of message types and fixed a bunch of cosmetic issues. There was one bug: in the main loop of walreceiver, you send the "ping" message on every wakeup after enough time has passed since last reception. That means that if the server doesn't reply promptly, you send a new ping message every 100 ms (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same issue, but it was not quite as sever there because the naptime was longer. Fixed that. How does this look now? - Heikki
Attachment
On Wednesday, October 10, 2012 9:15 PM Heikki Linnakangas wrote: > On 04.10.2012 13:12, Amit kapila wrote: > > Following changes are done to support replication timeout in sender as > well as receiver: > > > > 1. One new configuration parameter wal_receiver_timeout is added to > detect timeout at receiver task. > > 2. Existing parameter replication_timeout is renamed to > wal_sender_timeout. > > Ok. The other option would be to have just one GUC, I'm open to > bikeshedding on this one. On one hand, there's no reason the timeouts > have to the same, so it would be nice to have separate settings, but on > the other hand, I can't imagine a case where a single setting wouldn't > work just as well. I think for below case, they are required to be separate: 1. M1 (Master), S1 (Standby 1), S2 (Standby 2) 2. S1 is standby for M1, and S2 is standby for S1. Basically a simple case of cascaded replication 3. M1 and S1 are on local network but S2 is placed at geographically different location. (what I want to say is n/w between M1-S1 is of good speed and S1-S2 is very slow) 4. In above case, user might want to configure different timeouts for sender and receiver on S1. > Attached is an updated patch. I reverted the merging of message types > and fixed a bunch of cosmetic issues. There was one bug: in the main > loop of walreceiver, you send the "ping" message on every wakeup after > enough time has passed since last reception. That means that if the > server doesn't reply promptly, you send a new ping message every 100 ms > (NAPTIME_PER_CYCLE), until it gets a reply. Walsender had the same > issue, but it was not quite as sever there because the naptime was > longer. Fixed that. Thanks. > > How does this look now? The Patch is fine and test results are also fine. With Regards, Amit Kapila.
On 11.10.2012 13:17, Amit Kapila wrote: >> How does this look now? > > The Patch is fine and test results are also fine. Ok, thanks. Committed. - Heikki
On Thursday, October 11, 2012 8:22 PM Heikki Linnakangas wrote: On 11.10.2012 13:17, Amit Kapila wrote: >>> How does this look now? > >> The Patch is fine and test results are also fine. >Ok, thanks. Committed. Thank you very much. With Regards, Amit Kapila.
On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 11.10.2012 13:17, Amit Kapila wrote: >>> >>> How does this look now? >> >> >> The Patch is fine and test results are also fine. > > > Ok, thanks. Committed. I found one typo. The attached patch fixes that typo. ISTM you need to update the protocol.sgml because you added the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. Is it worth adding the same mechanism (send back the reply immediately if walsender request a reply) into pg_basebackup and pg_receivexlog? Regards, -- Fujii Masao
Attachment
On 13.10.2012 19:35, Fujii Masao wrote: > On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> Ok, thanks. Committed. > > I found one typo. The attached patch fixes that typo. Thanks, fixed. > ISTM you need to update the protocol.sgml because you added > the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. Oh, I didn't remember that we've documented the specific structs that we pass around. It's quite bogus anyway to explain the messages the way we do currently, as they are actually dependent on the underlying architecture's endianess and padding. I think we should refactor the protocol to not transmit raw structs, but use pq_sentint and friends to construct the messages. This was discussed earlier (see http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com), I think there's consensus that 9.3 would be a good time to do that as we changed the XLogRecPtr format anyway. I'll look into doing that.. > Is it worth adding the same mechanism (send back the reply immediately > if walsender request a reply) into pg_basebackup and pg_receivexlog? Good catch. Yes, they should be taught about this too. I'll look into doing that too. - Heikki
On 15.10.2012 13:13, Heikki Linnakangas wrote: > On 13.10.2012 19:35, Fujii Masao wrote: >> ISTM you need to update the protocol.sgml because you added >> the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. > > Oh, I didn't remember that we've documented the specific structs that we > pass around. It's quite bogus anyway to explain the messages the way we > do currently, as they are actually dependent on the underlying > architecture's endianess and padding. I think we should refactor the > protocol to not transmit raw structs, but use pq_sentint and friends to > construct the messages. This was discussed earlier (see > http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com), > I think there's consensus that 9.3 would be a good time to do that as we > changed the XLogRecPtr format anyway. This is what I came up with. The replication protocol is now architecture-independent. The WAL format itself is still architecture-independent, of course, but this is useful if you want to e.g use pg_receivexlog to back up a server that runs on a different platform. I chose the int64 format to transmit timestamps, even when compiled with --disable-integer-datetimes. Please review if you have the time.. - Heikki
Attachment
On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 15.10.2012 13:13, Heikki Linnakangas wrote: >> >> On 13.10.2012 19:35, Fujii Masao wrote: >>> >>> ISTM you need to update the protocol.sgml because you added >>> the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. >> >> >> Oh, I didn't remember that we've documented the specific structs that we >> pass around. It's quite bogus anyway to explain the messages the way we >> do currently, as they are actually dependent on the underlying >> architecture's endianess and padding. I think we should refactor the >> protocol to not transmit raw structs, but use pq_sentint and friends to >> construct the messages. This was discussed earlier (see >> >> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com), >> I think there's consensus that 9.3 would be a good time to do that as we >> changed the XLogRecPtr format anyway. > > > This is what I came up with. The replication protocol is now > architecture-independent. The WAL format itself is still > architecture-independent, of course, but this is useful if you want to e.g > use pg_receivexlog to back up a server that runs on a different platform. > > I chose the int64 format to transmit timestamps, even when compiled with > --disable-integer-datetimes. > > Please review if you have the time.. Thanks for the patch! When I ran pg_receivexlog, I encountered the following error. $ pg_receivexlog -D hoge pg_receivexlog: unexpected termination of replication stream: ERROR: no data left in message pg_basebackup -X stream caused the same error. $ pg_basebackup -D hoge -X stream -c fast pg_basebackup: could not send feedback packet: no COPY in progress pg_basebackup: child process exited with error 1 In walreceiver.c, tmpbuf is allocated for every XLogWalRcvProcessMsg() call. It should be allocated just once and continue to be used till end, to reduce palloc overhead? + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64); + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char); These should be macro, to avoid calculation overhead? + /* Construct the the message and send it. */ + resetStringInfo(&reply_message); + pq_sendbyte(&reply_message, 'h'); + pq_sendint(&reply_message, xmin, 4); + pq_sendint(&reply_message, nextEpoch, 4); + walrcv_send(reply_message.data, reply_message.len); You seem to have forgotten to send the sendTime. Regards, -- Fujii Masao
On 15.10.2012 19:31, Fujii Masao wrote: > On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> On 15.10.2012 13:13, Heikki Linnakangas wrote: >>> >>> Oh, I didn't remember that we've documented the specific structs that we >>> pass around. It's quite bogus anyway to explain the messages the way we >>> do currently, as they are actually dependent on the underlying >>> architecture's endianess and padding. I think we should refactor the >>> protocol to not transmit raw structs, but use pq_sentint and friends to >>> construct the messages. This was discussed earlier (see >>> >>> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com), >>> I think there's consensus that 9.3 would be a good time to do that as we >>> changed the XLogRecPtr format anyway. >> >> >> This is what I came up with. The replication protocol is now >> architecture-independent. The WAL format itself is still >> architecture-independent, of course, but this is useful if you want to e.g >> use pg_receivexlog to back up a server that runs on a different platform. >> >> I chose the int64 format to transmit timestamps, even when compiled with >> --disable-integer-datetimes. >> >> Please review if you have the time.. > > Thanks for the patch! > > When I ran pg_receivexlog, I encountered the following error. Yeah, clearly I didn't test this near enough... I fixed the bugs you bumped into, new version attached. > + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64); > + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char); > > These should be macro, to avoid calculation overhead? The compiler will calculate this at compilation time, it's going to be a constant at runtime. - Heikki
Attachment
On Tue, Oct 16, 2012 at 9:31 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 15.10.2012 19:31, Fujii Masao wrote: >> >> On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas >> <hlinnakangas@vmware.com> wrote: >>> >>> On 15.10.2012 13:13, Heikki Linnakangas wrote: >>>> >>>> >>>> Oh, I didn't remember that we've documented the specific structs that we >>>> pass around. It's quite bogus anyway to explain the messages the way we >>>> do currently, as they are actually dependent on the underlying >>>> architecture's endianess and padding. I think we should refactor the >>>> protocol to not transmit raw structs, but use pq_sentint and friends to >>>> construct the messages. This was discussed earlier (see >>>> >>>> >>>> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com), >>>> I think there's consensus that 9.3 would be a good time to do that as we >>>> changed the XLogRecPtr format anyway. >>> >>> >>> >>> This is what I came up with. The replication protocol is now >>> architecture-independent. The WAL format itself is still >>> architecture-independent, of course, but this is useful if you want to >>> e.g >>> use pg_receivexlog to back up a server that runs on a different platform. >>> >>> I chose the int64 format to transmit timestamps, even when compiled with >>> --disable-integer-datetimes. >>> >>> Please review if you have the time.. >> >> >> Thanks for the patch! >> >> When I ran pg_receivexlog, I encountered the following error. > > > Yeah, clearly I didn't test this near enough... > > I fixed the bugs you bumped into, new version attached. Thanks for updating the patch! We should remove the check of integer_datetime by pg_basebackup background process and pg_receivexlog? Currently, they always check it, and then if its setting value is not the same between a client and server, they fail. Thanks to the patch, ISTM this check is no longer required. + pq_sendint64(&reply_message, GetCurrentIntegerTimestamp()); In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(), GetCurrentTimestamp() is called twice. I think that we can skip the latter call if integer-datetime is enabled because the return value of GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the same format. It's worth reducing the number of GetCurrentTimestamp() calls, I think. elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X", - (uint32) (reply_message.write >> 32), (uint32) reply_message.write, - (uint32) (reply_message.flush >> 32), (uint32) reply_message.flush, - (uint32) (reply_message.apply >> 32), (uint32) reply_message.apply); + (uint32) (writePtr >> 32), (uint32) writePtr, + (uint32) (flushPtr >> 32), (uint32) flushPtr, + (uint32) (applyPtr >> 32), (uint32) applyPtr); elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X", - (uint32) (reply.write >> 32), (uint32) reply.write, - (uint32) (reply.flush >> 32), (uint32) reply.flush, - (uint32) (reply.apply >> 32), (uint32) reply.apply); + (uint32) (writePtr >> 32), (uint32) writePtr, + (uint32) (flushPtr >> 32), (uint32) flushPtr, + (uint32) (applyPtr >> 32), (uint32) applyPtr); Isn't it worth logging not only WAL location but also the replyRequested flag in these debug message? The remaining of the patch looks good to me. >> + hdrlen = sizeof(int64) + sizeof(int64) + >> sizeof(int64); >> + hdrlen = sizeof(int64) + sizeof(int64) + >> sizeof(char); >> >> These should be macro, to avoid calculation overhead? > > > The compiler will calculate this at compilation time, it's going to be a > constant at runtime. Yes, you're right. Regards, -- Fujii Masao