Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
Date
Msg-id CAHGQGwF9NQuqLm5GJKmEvQwxFkHQ=e2zXs4NC5zjbeoyvTustw@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
List pgsql-hackers
On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 15.10.2012 13:13, Heikki Linnakangas wrote:
>>
>> On 13.10.2012 19:35, Fujii Masao wrote:
>>>
>>> ISTM you need to update the protocol.sgml because you added
>>> the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
>>
>>
>> Oh, I didn't remember that we've documented the specific structs that we
>> pass around. It's quite bogus anyway to explain the messages the way we
>> do currently, as they are actually dependent on the underlying
>> architecture's endianess and padding. I think we should refactor the
>> protocol to not transmit raw structs, but use pq_sentint and friends to
>> construct the messages. This was discussed earlier (see
>>
>> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
>> I think there's consensus that 9.3 would be a good time to do that as we
>> changed the XLogRecPtr format anyway.
>
>
> This is what I came up with. The replication protocol is now
> architecture-independent. The WAL format itself is still
> architecture-independent, of course, but this is useful if you want to e.g
> use pg_receivexlog to back up a server that runs on a different platform.
>
> I chose the int64 format to transmit timestamps, even when compiled with
> --disable-integer-datetimes.
>
> Please review if you have the time..

Thanks for the patch!

When I ran pg_receivexlog, I encountered the following error.

$ pg_receivexlog -D hoge
pg_receivexlog: unexpected termination of replication stream: ERROR:
no data left in message

pg_basebackup -X stream caused the same error.

$ pg_basebackup -D hoge -X stream -c fast
pg_basebackup: could not send feedback packet: no COPY in progress
pg_basebackup: child process exited with error 1

In walreceiver.c, tmpbuf is allocated for every XLogWalRcvProcessMsg() call.
It should be allocated just once and continue to be used till end, to reduce
palloc overhead?

+                hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64);
+                hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char);

These should be macro, to avoid calculation overhead?

+    /* Construct the the message and send it. */
+    resetStringInfo(&reply_message);
+    pq_sendbyte(&reply_message, 'h');
+    pq_sendint(&reply_message, xmin, 4);
+    pq_sendint(&reply_message, nextEpoch, 4);
+    walrcv_send(reply_message.data, reply_message.len);

You seem to have forgotten to send the sendTime.

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Satoshi Nagayasu
Date:
Subject: Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Next
From: Robert Haas
Date:
Subject: Re: Potential autovacuum optimization: new tables