Thread: Synch Rep v5

Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

I attached the updated version of Synch Rep patch (v5) on wiki.
The description of "User Overview" is also already updated.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep

On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> 4. sleeping
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
>>
>> I'm looking for the better idea. How should we resolve that problem?
>> Only reduce the timeout of pq_wait to 100ms? Get rid of
>> SA_RESTART only during pq_wait as follows?
>>
>>    remove SA_RESTART
>>    pq_wait()
>>    add SA_RESTART
>
> Not sure, will consider. Ask others as well.

I've not got an idea yet. Now (v5), I only reduce the timeout of
pq_wait to 100ms. Is this sufficient? Do you have any good idea?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep v5

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote:

> I attached the updated version of Synch Rep patch (v5) on wiki.
> The description of "User Overview" is also already updated.
> http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep

Looks good on initial read of Wiki. Few minor comments:

The advantage of doing things this way is that the base backup can take
place any way the user chooses, so potentially much faster than using a
single session.

I notice we use the same settings for keepalives. We may need that to be
a second set of parameters.

Progress reporting will be easier with HS, so you are right to leave
that alone.

Don't understand: "Completely automated catching up; User has to carry
out some procedure manually for making the standby catch up."

Multiple standby is still possible, but just using old file based
mechanisms. We would need to be careful about use of %R in that case.

I believe the max delay is 2* wal_sender_delay.

I like the way recovery_trigger_file avoids changing pg_standby, but I
guess we still need to plug that gap also one day. But does patch 10
also have the other mechanism?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Synch Rep v5

From
Simon Riggs
Date:
On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote:

> On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> 4. sleeping
> >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
> >>
> >> I'm looking for the better idea. How should we resolve that problem?
> >> Only reduce the timeout of pq_wait to 100ms? Get rid of
> >> SA_RESTART only during pq_wait as follows?
> >>
> >>    remove SA_RESTART
> >>    pq_wait()
> >>    add SA_RESTART
> >
> > Not sure, will consider. Ask others as well.
> 
> I've not got an idea yet. Now (v5), I only reduce the timeout of
> pq_wait to 100ms. Is this sufficient? Do you have any good idea?

To be honest I didn't follow that part of the discussion.

My preferred approach, mentioned earlier in the summer, was to use a
mechanism very similar to LWlocks. A proc queue with semaphores. Minimum
delay, no need for signals. The process doing the wakeup can walk up the
queue until it finds somebody whose wait-for-LSN is higher than has just
been sent/written. Doing it this way also gives us group commit when
synch rep is not enabled.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

On Sat, Jan 10, 2009 at 10:42 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Sat, 2009-01-10 at 19:16 +0900, Fujii Masao wrote:
>
>> On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> 4. sleeping
>> >> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
>> >>
>> >> I'm looking for the better idea. How should we resolve that problem?
>> >> Only reduce the timeout of pq_wait to 100ms? Get rid of
>> >> SA_RESTART only during pq_wait as follows?
>> >>
>> >>    remove SA_RESTART
>> >>    pq_wait()
>> >>    add SA_RESTART
>> >
>> > Not sure, will consider. Ask others as well.
>>
>> I've not got an idea yet. Now (v5), I only reduce the timeout of
>> pq_wait to 100ms. Is this sufficient? Do you have any good idea?
>
> To be honest I didn't follow that part of the discussion.
>
> My preferred approach, mentioned earlier in the summer, was to use a
> mechanism very similar to LWlocks. A proc queue with semaphores. Minimum
> delay, no need for signals. The process doing the wakeup can walk up the
> queue until it finds somebody whose wait-for-LSN is higher than has just
> been sent/written. Doing it this way also gives us group commit when
> synch rep is not enabled.

Yes, using semaphores for the communication is also my first approach.
The problem of this approach is that walsender cannot wait for both signal
from backends and the response from walreceiver concurrently, because
wait-for-semaphore is blocking at least. So, I use signal for the communication.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

Thanks for your comments!

On Sat, Jan 10, 2009 at 10:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I notice we use the same settings for keepalives. We may need that to be
> a second set of parameters.

Or, we should make walreceiver execute "SET tcp_keepalives_xxx TO yyy"
before starting replication if such settins are specified in recovery.conf?

> Don't understand: "Completely automated catching up; User has to carry
> out some procedure manually for making the standby catch up."

Oh sorry, this description is not correct; the standby can catch up with the
primary automatically if archive area is shared between those two servers.
In fact, xlogs generated before / during replication are shipped by
archiver / walsender, respectively.

I also updated the figures about flow of xlogs. Please check it.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Architecture_Design

> Multiple standby is still possible, but just using old file based
> mechanisms. We would need to be careful about use of %R in that case.

Yes. Synch Rep can work fine with existing warm-standby mechanism.

> I believe the max delay is 2* wal_sender_delay.

In async replication case, walsender tries to send the xlogs once per
wal_sender_delay, and receives the response from the standby on
demand. So, I think that max delay is wal_sender_delay. Am I missing
something?

> I like the way recovery_trigger_file avoids changing pg_standby, but I
> guess we still need to plug that gap also one day. But does patch 10
> also have the other mechanism?

As you imply, current synch-rep has already not needed the change
of pg_standby, so I'll get rid of the patch from synch-rep patchset.
Of course, this patch is still useful for existing warm-standby. I should
add this patch to commitfest for 8.5?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep v5

From
Simon Riggs
Date:
On Sun, 2009-01-11 at 17:19 +0900, Fujii Masao wrote:

> Thanks for your comments!
> 
> On Sat, Jan 10, 2009 at 10:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I notice we use the same settings for keepalives. We may need that to be
> > a second set of parameters.
> 
> Or, we should make walreceiver execute "SET tcp_keepalives_xxx TO yyy"
> before starting replication if such settins are specified in recovery.conf?

Sounds reasonable, if maybe not ideal.

> > Don't understand: "Completely automated catching up; User has to carry
> > out some procedure manually for making the standby catch up."
> 
> Oh sorry, this description is not correct; the standby can catch up with the
> primary automatically if archive area is shared between those two servers.
> In fact, xlogs generated before / during replication are shipped by
> archiver / walsender, respectively.
> 
> I also updated the figures about flow of xlogs. Please check it.
> http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Architecture_Design

Can't see anything different!

> > Multiple standby is still possible, but just using old file based
> > mechanisms. We would need to be careful about use of %R in that case.
> 
> Yes. Synch Rep can work fine with existing warm-standby mechanism.

If we want multiple standby servers they wouldn't both be able to trim
files from the archive. So we would need to change pg_standby so it
records the %R from multiple servers on the archive and only trimmed the
max of those %R values.

> > I believe the max delay is 2* wal_sender_delay.
> 
> In async replication case, walsender tries to send the xlogs once per
> wal_sender_delay, and receives the response from the standby on
> demand. So, I think that max delay is wal_sender_delay. Am I missing
> something?

Sending takes time as well, so it is send_time + delay at least.

> > I like the way recovery_trigger_file avoids changing pg_standby, but I
> > guess we still need to plug that gap also one day. But does patch 10
> > also have the other mechanism?
> 
> As you imply, current synch-rep has already not needed the change
> of pg_standby, so I'll get rid of the patch from synch-rep patchset.
> Of course, this patch is still useful for existing warm-standby. I should
> add this patch to commitfest for 8.5?

May as well leave it in, so people can use it with 8.3.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Synch Rep v5

From
Simon Riggs
Date:
On Sun, 2009-01-11 at 15:11 +0900, Fujii Masao wrote:

> Yes, using semaphores for the communication is also my first approach.
> The problem of this approach is that walsender cannot wait for both
> signal from backends and the response from walreceiver concurrently,
> because
> wait-for-semaphore is blocking at least. So, I use signal for the
> communication.

IIUC: In sync mode backend sends signal to walsender, then adds itself
to wait queue on semaphore. walsender responds to signal, sends more WAL
then waits for response. When response comes it then wakes backends on
the semaphore. In async mode, no signal is sent and we do not wait, we
just allow the walsender to wake up periodically and send.

Does it release waiters as soon as possible, or does it always respond
to new requests for sending? i.e. which has priority - responding to
waiters who need to be woken or responding to new requests to send?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

On Mon, Jan 12, 2009 at 1:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Multiple standby is still possible, but just using old file based
>> > mechanisms. We would need to be careful about use of %R in that case.
>>
>> Yes. Synch Rep can work fine with existing warm-standby mechanism.
>
> If we want multiple standby servers they wouldn't both be able to trim
> files from the archive. So we would need to change pg_standby so it
> records the %R from multiple servers on the archive and only trimmed the
> max of those %R values.

s/max/min ?

>> > I believe the max delay is 2* wal_sender_delay.
>>
>> In async replication case, walsender tries to send the xlogs once per
>> wal_sender_delay, and receives the response from the standby on
>> demand. So, I think that max delay is wal_sender_delay. Am I missing
>> something?
>
> Sending takes time as well, so it is send_time + delay at least.

Right. I'll change the doc into an exact description.

Similarly, though the document of "synchronous_commit" says

-----------
When off, there can be a delay between when success is reported to
the client and when the transaction is really guaranteed to be safe
against a server crash. (The maximum delay is three times wal_writer_delay.)
-----------

write_time also should be added up to the maximum delay.


>> > I like the way recovery_trigger_file avoids changing pg_standby, but I
>> > guess we still need to plug that gap also one day. But does patch 10
>> > also have the other mechanism?
>>
>> As you imply, current synch-rep has already not needed the change
>> of pg_standby, so I'll get rid of the patch from synch-rep patchset.
>> Of course, this patch is still useful for existing warm-standby. I should
>> add this patch to commitfest for 8.5?
>
> May as well leave it in, so people can use it with 8.3.

I'd like this patch to be reviewed and committed for 8.4 even if synch-rep
is postponed to 8.5. Because this is very useful for existing warm-standby
mechanism, and only warm-standby would be still a built-in replication
solution in 8.4.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep v5

From
Simon Riggs
Date:
On Tue, 2009-01-13 at 16:39 +0900, Fujii Masao wrote:
> > May as well leave it in, so people can use it with 8.3.
> 
> I'd like this patch to be reviewed and committed for 8.4 even if
> synch-rep
> is postponed to 8.5. Because this is very useful for existing
> warm-standby
> mechanism, and only warm-standby would be still a built-in replication
> solution in 8.4.

Yes, no worries. 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

On Mon, Jan 12, 2009 at 1:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On Sun, 2009-01-11 at 15:11 +0900, Fujii Masao wrote:
>
>> Yes, using semaphores for the communication is also my first approach.
>> The problem of this approach is that walsender cannot wait for both
>> signal from backends and the response from walreceiver concurrently,
>> because
>> wait-for-semaphore is blocking at least. So, I use signal for the
>> communication.
>
> IIUC: In sync mode backend sends signal to walsender, then adds itself
> to wait queue on semaphore. walsender responds to signal, sends more WAL
> then waits for response. When response comes it then wakes backends on
> the semaphore. In async mode, no signal is sent and we do not wait, we
> just allow the walsender to wake up periodically and send.

I'd like walsender not to wait for response in blocking mode. Because, if so,
the network delay would directly interfere with transaction processing. In
my design, walsender waits for response in non-blocking mode and tries to
send the next requested WAL if any responses have not arrived yet. So,
walsender needs to wait for signal from backend and response from standby
concurrently.

Problem is how walsender should wait for signal and response concurrently.
If we use select/poll for it, walsender cannot respond the signal immediately
in some platform. If we don't use them (which means the loop without sleep),
CPU utilization would jump.

> Does it release waiters as soon as possible, or does it always respond
> to new requests for sending? i.e. which has priority - responding to
> waiters who need to be woken or responding to new requests to send?

Responding to waiters should have higher priority, for stability of response
time. In fact, if both the signal from backend and the response from the
standby have arrived, walsender reads the response and wakes the waiters
up in first.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Synch Rep v5

From
"Fujii Masao"
Date:
Hi,

On Sat, Jan 10, 2009 at 7:16 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> I attached the updated version of Synch Rep patch (v5) on wiki.
> The description of "User Overview" is also already updated.
> http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Synch_Rep

Attached is the updated version of synch-rep patch (v0114). I adjusted
the patch against the latest head, fixed some bugs and wrote out some
documents.

I attached this patch also on wiki. There are README of the patch and
the description of user overview in wiki.
http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Version_History

If you are interested in synch-rep, please try testing or reviewing it.
Any comments and feedbacks welcome!

Though I'm not sure if synch-rep is postponed to 8.5, at least I'll work it
up to good place to leave off. Next, I'll try to get rid of file-based
log-shipping part from synch-rep, which is requested by some hackers.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment