Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
Date
Msg-id 3f0b79eb0907090016t38841368v45b916c9e57b1fe7@mail.gmail.com
Whole thread Raw
In response to Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi,

On Tue, Jul 7, 2009 at 8:51 PM, Fujii Masao<masao.fujii@gmail.com> wrote:
> http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com
>> I don't think we need or should
>> allow running regular queries before entering "replication mode". the
>> backend should become a walsender process directly after authentication.
>
> I changed the protocol according to your suggestion.
> Here is the current protocol:

Just to the record, I'd like to explain the correspondence relationship
between Heikki's protocol and mine.

> ReplicationStart (B)
>    Byte1('l'): Identifies the message as a replication-start indicator.
>    Int32(17): Length of message contents in bytes, including self.
>    Int32: The timeline ID
>    Int32: The start log file of replication
>    Int32: The start byte offset of replication

This corresponds to "StartReplication <begin>". But this is sent
from the primary to the standby, though "StartReplication" is sent
in theopposite direction. So, in the current design, the primary
determines the WAL streaming start position, which indicates the
head of the next XLOG file of the switched file by walsender.

> XLogData (B)
>    Byte1('w'): Identifies the message as XLOG records.
>    Int32: Length of message contents in bytes, including self.
>    Int8: Flag bits indicating how the records should be treated.
>    Int32: The log file number of the records.
>    Int32: The byte offset of the records.
>    Byte n: The XLOG records.

This corresponds to "WALRange <begin> <end> <data>". But
XLogData doesn't have <begin> in order to reduce the wire
traffic because it can be calculated from <end> and the length
of the records.

> XLogResponse (F)
>    Byte1('r'):  Identifies the message as ACK for XLOG records.
>    Int32: Length of message contents in bytes, including self.
>    Int8: Flag bits indicating how the records were treated.
>    Int32: The log file number of the records.
>    Int32: The byte offset of the records.

This corresponds to "ReplicatedUpTo <end>". They are almost
the same.

> If there is a missing XLOG file which is required for recovery, the
> startup process connects to the primary as a normal client, and
> receives the binary contents of the file by using the following SQL.
> This has nothing to do with the above protocol. So, the transfer of
> missing file and synchronous XLOG streaming are performed
> concurrently.
>
> COPY (SELECT pg_read_xlogfilie('filename', true)) TO STDOUT WITH BINARY

This corresponds to "RequestWAL <begin> <end>". Since the
XLOG file written to the standby has to be recoverable, I use the
filename instead of XLogRecPtr here, and make the primary send
the whole file. Also, this filename can indicate not only XLOG file
but also a history file.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Round Robin Reviewers
Next
From: Dimitri Fontaine
Date:
Subject: Re: *_collapse_limit, geqo_threshold