Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id CAHGQGwE73DhFYbgN_8rCz1mpmJwjg4vkTrEBzm=X0MGOjW=wjQ@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Tue, Oct 16, 2012 at 9:31 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 15.10.2012 19:31, Fujii Masao wrote:
>>
>> On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
>> <hlinnakangas@vmware.com>  wrote:
>>>
>>> On 15.10.2012 13:13, Heikki Linnakangas wrote:
>>>>
>>>>
>>>> Oh, I didn't remember that we've documented the specific structs that we
>>>> pass around. It's quite bogus anyway to explain the messages the way we
>>>> do currently, as they are actually dependent on the underlying
>>>> architecture's endianess and padding. I think we should refactor the
>>>> protocol to not transmit raw structs, but use pq_sentint and friends to
>>>> construct the messages. This was discussed earlier (see
>>>>
>>>>
>>>> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
>>>> I think there's consensus that 9.3 would be a good time to do that as we
>>>> changed the XLogRecPtr format anyway.
>>>
>>>
>>>
>>> This is what I came up with. The replication protocol is now
>>> architecture-independent. The WAL format itself is still
>>> architecture-independent, of course, but this is useful if you want to
>>> e.g
>>> use pg_receivexlog to back up a server that runs on a different platform.
>>>
>>> I chose the int64 format to transmit timestamps, even when compiled with
>>> --disable-integer-datetimes.
>>>
>>> Please review if you have the time..
>>
>>
>> Thanks for the patch!
>>
>> When I ran pg_receivexlog, I encountered the following error.
>
>
> Yeah, clearly I didn't test this near enough...
>
> I fixed the bugs you bumped into, new version attached.

Thanks for updating the patch!

We should remove the check of integer_datetime by pg_basebackup
background process and pg_receivexlog? Currently, they always check
it, and then if its setting value is not the same between a client and
server, they fail. Thanks to the patch, ISTM this check is no longer
required.

+    pq_sendint64(&reply_message, GetCurrentIntegerTimestamp());

In XLogWalRcvSendReply() and XLogWalRcvSendHSFeedback(),
GetCurrentTimestamp() is called twice. I think that we can skip the
latter call if integer-datetime is enabled because the return value of
GetCurrentTimestamp() and GetCurrentIntegerTimestamp() is in the
same format. It's worth reducing the number of GetCurrentTimestamp()
calls, I think.
    elog(DEBUG2, "sending write %X/%X flush %X/%X apply %X/%X",
-         (uint32) (reply_message.write >> 32), (uint32) reply_message.write,
-         (uint32) (reply_message.flush >> 32), (uint32) reply_message.flush,
-         (uint32) (reply_message.apply >> 32), (uint32) reply_message.apply);
+         (uint32) (writePtr >> 32), (uint32) writePtr,
+         (uint32) (flushPtr >> 32), (uint32) flushPtr,
+         (uint32) (applyPtr >> 32), (uint32) applyPtr);
    elog(DEBUG2, "write %X/%X flush %X/%X apply %X/%X",
-         (uint32) (reply.write >> 32), (uint32) reply.write,
-         (uint32) (reply.flush >> 32), (uint32) reply.flush,
-         (uint32) (reply.apply >> 32), (uint32) reply.apply);
+         (uint32) (writePtr >> 32), (uint32) writePtr,
+         (uint32) (flushPtr >> 32), (uint32) flushPtr,
+         (uint32) (applyPtr >> 32), (uint32) applyPtr);

Isn't it worth logging not only WAL location but also the replyRequested
flag in these debug message?

The remaining of the patch looks good to me.

>> +                               hdrlen = sizeof(int64) + sizeof(int64) +
>> sizeof(int64);
>> +                               hdrlen = sizeof(int64) + sizeof(int64) +
>> sizeof(char);
>>
>> These should be macro, to avoid calculation overhead?
>
>
> The compiler will calculate this at compilation time, it's going to be a
> constant at runtime.

Yes, you're right.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Deprecations in authentication
Next
From: Daniel Farina
Date:
Subject: Re: xlog filename formatting functions in recovery