Thread: New replication mode: write

New replication mode: write

From

Fujii Masao

Date:

13 January 2012, 06:41:28

Hi,

http://archives.postgresql.org/message-id/AANLkTilgyL3Y1jkDVHX02433COq7JLmqicsqmOsbuyA1%40mail.gmail.com

Previously I proposed the replication mode "recv" on the above thread,
but it's not
committed yet. Now I'd like to propose that mode again because it's
useful to reduce
the overhead of synchronous replication. Attached patch implements that mode.

If you choose that mode, transaction waits for its WAL to be write()'d
on the standby,
IOW, waits until the standby saves the WAL in the memory. Which provides lower
level of durability than that current synchronous replication (i.e.,
transaction waits for
its WAL to be flushed to the disk) does. However, it's practically
useful setting
because it can decrease the response time for the transaction, and
causes no data loss
unless both the master and the standby crashes and the database of the
master gets
corrupted at the same time.

In the patch, you can choose that mode by setting synchronous_commit to write.
I renamed that mode to "write" from "recv" on the basis of its actual behavior.

I measured how much "write" mode improves the performance in
synchronous replication.
Here is the result:

synchronous_commit = on
tps = 424.510843 (including connections establishing)
tps = 420.767883 (including connections establishing)
tps = 419.715658 (including connections establishing)
tps = 428.810001 (including connections establishing)
tps = 337.341445 (including connections establishing)

synchronous_commit = write
tps = 550.752712 (including connections establishing)
tps = 407.104036 (including connections establishing)
tps = 455.576190 (including connections establishing)
tps = 453.548672 (including connections establishing)
tps = 555.171325 (including connections establishing)

I used pgbench (scale factor = 100) as a benchmark and ran the
following command.

    pgbench -c 8 -j 8 -T 60 -M prepared

I always ran CHECKPOINT in both master and standby before starting each pgbench
test, to prevent CHECKPOINT from affecting the result of the performance test.

Thought? Comments?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

new_replication_mode_write_v1.patch

Re: New replication mode: write

From

Fujii Masao

Date:

13 January 2012, 11:28:26

On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>>> Thought? Comments?
>>
>> This is almost exactly the same as my patch series
>> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because
>> I was updating that patch myself last night for 9.2. I'm about half
>> way through doing that, since you and I agreed in Ottawa I would do
>> this. Perhaps it is better if we work together?
>
> I think this comment is mostly pointless. We don't have time to work
> together and there's no real reason to. You know what you're doing, so
> I'll leave you to do it.
>
> Please add the Apply mode.

OK, will do.

> In my patch, the reason I avoided doing WRITE mode (which we had
> previously referred to as RECV) was that no fsync of the WAL contents
> takes place. In that case we are applying changes using un-fsynced WAL
> data and in case of crash this would cause a problem.

My patch has not changed the execution order of WAL flush and replay.
WAL records are always replayed after they are flushed by walreceiver.
So, such a problem doesn't happen.

But which means that transaction might need to wait for WAL flush caused
by previous transaction even if WRITE mode is chosen. Which limits the
performance gain by WRITE mode, and should be improved later, I think.

> I was going to
> make the WalWriter available during recovery to cater for that. Do you
> not think that is no longer necessary?

That's still necessary to improve the performance in sync rep further, I think.
What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is:

* Allow WAL records to be replayed before they are flushed to the disk.
* Add new GUC parameter specifying whether to allow the standby to defer  WAL flush. If the parameter is false,
walreceiverflushes WAL whenever it  receives WAL (i.e., it's same as the current behavior). If true, walreceiver
doesn'tflush WAL at all. Instead, walwriter, backend or startup process  does that. Walwriter periodically checks
whetherthere is un-flushed WAL  file, and flushes it if exists. When the buffer page is written out, backend  or
startupprocess forces WAL flush up to buffer's LSN.

If the above GUC parameter is set to true (i.e., walreceiver doesn't flush
WAL at all) and WRITE mode is chosen, transaction doesn't need to wait
for WAL flush on the standby at all. Also the frequency of WAL flush on
the standby would become lower, which significantly reduces I/O load.
After all, the performance in sync rep would improve very much.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: New replication mode: write

From

Simon Riggs

Date:

13 January 2012, 11:53:06

On Fri, Jan 13, 2012 at 12:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

>> In my patch, the reason I avoided doing WRITE mode (which we had
>> previously referred to as RECV) was that no fsync of the WAL contents
>> takes place. In that case we are applying changes using un-fsynced WAL
>> data and in case of crash this would cause a problem.
>
> My patch has not changed the execution order of WAL flush and replay.
> WAL records are always replayed after they are flushed by walreceiver.
> So, such a problem doesn't happen.

> But which means that transaction might need to wait for WAL flush caused
> by previous transaction even if WRITE mode is chosen. Which limits the
> performance gain by WRITE mode, and should be improved later, I think.

If the WALreceiver still flushes that is OK.

The latency would be smoother and lower if the WALwriter were active.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Fujii Masao

Date:

16 January 2012, 11:45:43

On Fri, Jan 13, 2012 at 9:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>
>>>> Thought? Comments?
>>>
>>> This is almost exactly the same as my patch series
>>> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because
>>> I was updating that patch myself last night for 9.2. I'm about half
>>> way through doing that, since you and I agreed in Ottawa I would do
>>> this. Perhaps it is better if we work together?
>>
>> I think this comment is mostly pointless. We don't have time to work
>> together and there's no real reason to. You know what you're doing, so
>> I'll leave you to do it.
>>
>> Please add the Apply mode.
>
> OK, will do.

Done. Attached is the updated version of the patch.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

new_replication_mode_v2.patch

Re: New replication mode: write

From

Simon Riggs

Date:

16 January 2012, 15:17:59

On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

> Done. Attached is the updated version of the patch.

Thanks.

I'll review this first, but can't start immediately. Please expect
something back in 2 days.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Simon Riggs

Date:

20 January 2012, 10:41:30

On Mon, Jan 16, 2012 at 4:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>> Done. Attached is the updated version of the patch.
>
> Thanks.
>
> I'll review this first, but can't start immediately. Please expect
> something back in 2 days.

On initial review this looks fine.

I'll do a more thorough hands-on review now and commit if still OK.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Simon Riggs

Date:

23 January 2012, 06:58:41

On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

>>> Please add the Apply mode.
>>
>> OK, will do.
>
> Done. Attached is the updated version of the patch.

I notice that the Apply mode isn't fully implemented. I had in mind
that you would add the latch required to respond more quickly when
only the Apply pointer has changed.

Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
was there another reason for not implementing that?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Fujii Masao

Date:

23 January 2012, 08:02:51

On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>>>> Please add the Apply mode.
>>>
>>> OK, will do.
>>
>> Done. Attached is the updated version of the patch.
>
> I notice that the Apply mode isn't fully implemented. I had in mind
> that you would add the latch required to respond more quickly when
> only the Apply pointer has changed.
>
> Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
> was there another reason for not implementing that?

I agree that the feature you pointed is useful for the Apply mode. But
I'm afraid that implementing that feature is not easy and would make
the patch big and complicated, so I didn't implement the Apply mode first.

To make the walreceiver call WaitLatchOrSocket(), we would need to
merge it and libpq_select() into one function. But the former is the backend
function and the latter is the frontend one. Now I have no good idea to
merge them cleanly.

If we send back the reply as soon as the Apply pointer is changed, I'm
afraid quite lots of reply messages are sent frequently, which might
cause performance problem. This is also one of the reasons why I didn't
implement the quick-response feature. To address this problem, we might
need to change the master so that it sends the Wait pointer to the standby,
and change the standby so that it replies whenever the Apply pointer
catches up with the Wait one. This can reduce the number of useless
reply from the standby about the Apply pointer.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: New replication mode: write

From

Simon Riggs

Date:

23 January 2012, 08:28:41

On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>>>>> Please add the Apply mode.
>>>>
>>>> OK, will do.
>>>
>>> Done. Attached is the updated version of the patch.
>>
>> I notice that the Apply mode isn't fully implemented. I had in mind
>> that you would add the latch required to respond more quickly when
>> only the Apply pointer has changed.
>>
>> Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
>> was there another reason for not implementing that?
>
> I agree that the feature you pointed is useful for the Apply mode. But
> I'm afraid that implementing that feature is not easy and would make
> the patch big and complicated, so I didn't implement the Apply mode first.
>
> To make the walreceiver call WaitLatchOrSocket(), we would need to
> merge it and libpq_select() into one function. But the former is the backend
> function and the latter is the frontend one. Now I have no good idea to
> merge them cleanly.

We can wait on the socket wherever it comes from. poll/select doesn't
care how we got the socket.

So we just need a common handler that calls either
walreceiver/libpqwalreceiver function as required to handle the
wakeup.


> If we send back the reply as soon as the Apply pointer is changed, I'm
> afraid quite lots of reply messages are sent frequently, which might
> cause performance problem. This is also one of the reasons why I didn't
> implement the quick-response feature. To address this problem, we might
> need to change the master so that it sends the Wait pointer to the standby,
> and change the standby so that it replies whenever the Apply pointer
> catches up with the Wait one. This can reduce the number of useless
> reply from the standby about the Apply pointer.

We send back one reply per incoming message. The incoming messages
don't know request state and checking that has a cost which I don't
think is an appropriate payment since we only need this info when the
link goes quiet.

When the link goes quiet we still need to send replies if we have
apply mode, but we only need to send apply messages if the lsn has
changed because of a commit. That will considerably reduce the
messages sent so I don't see a problem.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Fujii Masao

Date:

23 January 2012, 09:03:24

On Mon, Jan 23, 2012 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>
>>>>>> Please add the Apply mode.
>>>>>
>>>>> OK, will do.
>>>>
>>>> Done. Attached is the updated version of the patch.
>>>
>>> I notice that the Apply mode isn't fully implemented. I had in mind
>>> that you would add the latch required to respond more quickly when
>>> only the Apply pointer has changed.
>>>
>>> Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or
>>> was there another reason for not implementing that?
>>
>> I agree that the feature you pointed is useful for the Apply mode. But
>> I'm afraid that implementing that feature is not easy and would make
>> the patch big and complicated, so I didn't implement the Apply mode first.
>>
>> To make the walreceiver call WaitLatchOrSocket(), we would need to
>> merge it and libpq_select() into one function. But the former is the backend
>> function and the latter is the frontend one. Now I have no good idea to
>> merge them cleanly.
>
> We can wait on the socket wherever it comes from. poll/select doesn't
> care how we got the socket.
>
> So we just need a common handler that calls either
> walreceiver/libpqwalreceiver function as required to handle the
> wakeup.

I'm afraid I could not understand your idea. Could you explain it in
more detail?

>> If we send back the reply as soon as the Apply pointer is changed, I'm
>> afraid quite lots of reply messages are sent frequently, which might
>> cause performance problem. This is also one of the reasons why I didn't
>> implement the quick-response feature. To address this problem, we might
>> need to change the master so that it sends the Wait pointer to the standby,
>> and change the standby so that it replies whenever the Apply pointer
>> catches up with the Wait one. This can reduce the number of useless
>> reply from the standby about the Apply pointer.
>
> We send back one reply per incoming message. The incoming messages
> don't know request state and checking that has a cost which I don't
> think is an appropriate payment since we only need this info when the
> link goes quiet.
>
> When the link goes quiet we still need to send replies if we have
> apply mode, but we only need to send apply messages if the lsn has
> changed because of a commit. That will considerably reduce the
> messages sent so I don't see a problem.

You mean to change the meaning of apply_location? Currently it indicates
the end + 1 of the last replayed WAL record, regardless of whether it's
a commit record or not. So too many replies can be sent per incoming
message because it might contain many WAL records. But you mean to
change apply_location only when a commit record is replayed?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: New replication mode: write

From

Simon Riggs

Date:

23 January 2012, 12:00:50

On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

>>> To make the walreceiver call WaitLatchOrSocket(), we would need to
>>> merge it and libpq_select() into one function. But the former is the backend
>>> function and the latter is the frontend one. Now I have no good idea to
>>> merge them cleanly.
>>
>> We can wait on the socket wherever it comes from. poll/select doesn't
>> care how we got the socket.
>>
>> So we just need a common handler that calls either
>> walreceiver/libpqwalreceiver function as required to handle the
>> wakeup.
>
> I'm afraid I could not understand your idea. Could you explain it in
> more detail?

We either tell libpqwalreceiver about the latch, or we tell
walreceiver about the socket used by libpqwalreceiver.

In either case we share a pointer from one module to another.

>>> If we send back the reply as soon as the Apply pointer is changed, I'm
>>> afraid quite lots of reply messages are sent frequently, which might
>>> cause performance problem. This is also one of the reasons why I didn't
>>> implement the quick-response feature. To address this problem, we might
>>> need to change the master so that it sends the Wait pointer to the standby,
>>> and change the standby so that it replies whenever the Apply pointer
>>> catches up with the Wait one. This can reduce the number of useless
>>> reply from the standby about the Apply pointer.
>>
>> We send back one reply per incoming message. The incoming messages
>> don't know request state and checking that has a cost which I don't
>> think is an appropriate payment since we only need this info when the
>> link goes quiet.
>>
>> When the link goes quiet we still need to send replies if we have
>> apply mode, but we only need to send apply messages if the lsn has
>> changed because of a commit. That will considerably reduce the
>> messages sent so I don't see a problem.
>
> You mean to change the meaning of apply_location? Currently it indicates
> the end + 1 of the last replayed WAL record, regardless of whether it's
> a commit record or not. So too many replies can be sent per incoming
> message because it might contain many WAL records. But you mean to
> change apply_location only when a commit record is replayed?

There is no change to the meaning of apply_location. The only change
is that we send that message only when it has an updated value of
committed lsn.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Fujii Masao

Date:

24 January 2012, 09:48:08

On Mon, Jan 23, 2012 at 9:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>
>>>> To make the walreceiver call WaitLatchOrSocket(), we would need to
>>>> merge it and libpq_select() into one function. But the former is the backend
>>>> function and the latter is the frontend one. Now I have no good idea to
>>>> merge them cleanly.
>>>
>>> We can wait on the socket wherever it comes from. poll/select doesn't
>>> care how we got the socket.
>>>
>>> So we just need a common handler that calls either
>>> walreceiver/libpqwalreceiver function as required to handle the
>>> wakeup.
>>
>> I'm afraid I could not understand your idea. Could you explain it in
>> more detail?
>
> We either tell libpqwalreceiver about the latch, or we tell
> walreceiver about the socket used by libpqwalreceiver.
>
> In either case we share a pointer from one module to another.

The former seems difficult because it's not easy to link libpqwalreceiver.so
to the latch. I will consider about the latter.

>>>> If we send back the reply as soon as the Apply pointer is changed, I'm
>>>> afraid quite lots of reply messages are sent frequently, which might
>>>> cause performance problem. This is also one of the reasons why I didn't
>>>> implement the quick-response feature. To address this problem, we might
>>>> need to change the master so that it sends the Wait pointer to the standby,
>>>> and change the standby so that it replies whenever the Apply pointer
>>>> catches up with the Wait one. This can reduce the number of useless
>>>> reply from the standby about the Apply pointer.
>>>
>>> We send back one reply per incoming message. The incoming messages
>>> don't know request state and checking that has a cost which I don't
>>> think is an appropriate payment since we only need this info when the
>>> link goes quiet.
>>>
>>> When the link goes quiet we still need to send replies if we have
>>> apply mode, but we only need to send apply messages if the lsn has
>>> changed because of a commit. That will considerably reduce the
>>> messages sent so I don't see a problem.
>>
>> You mean to change the meaning of apply_location? Currently it indicates
>> the end + 1 of the last replayed WAL record, regardless of whether it's
>> a commit record or not. So too many replies can be sent per incoming
>> message because it might contain many WAL records. But you mean to
>> change apply_location only when a commit record is replayed?
>
> There is no change to the meaning of apply_location. The only change
> is that we send that message only when it has an updated value of
> committed lsn.

This means that apply_location might return the different location from
pg_last_xlog_replay_location() on the standby, though in 9.1 they return
the same. Which might confuse a user. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: New replication mode: write

From

Simon Riggs

Date:

24 January 2012, 10:23:20

On Tue, Jan 24, 2012 at 10:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

>>> I'm afraid I could not understand your idea. Could you explain it in
>>> more detail?
>>
>> We either tell libpqwalreceiver about the latch, or we tell
>> walreceiver about the socket used by libpqwalreceiver.
>>
>> In either case we share a pointer from one module to another.
>
> The former seems difficult because it's not easy to link libpqwalreceiver.so
> to the latch. I will consider about the latter.

Yes, it might be too hard, but lets look.

>>> You mean to change the meaning of apply_location? Currently it indicates
>>> the end + 1 of the last replayed WAL record, regardless of whether it's
>>> a commit record or not. So too many replies can be sent per incoming
>>> message because it might contain many WAL records. But you mean to
>>> change apply_location only when a commit record is replayed?
>>
>> There is no change to the meaning of apply_location. The only change
>> is that we send that message only when it has an updated value of
>> committed lsn.
>
> This means that apply_location might return the different location from
> pg_last_xlog_replay_location() on the standby, though in 9.1 they return
> the same. Which might confuse a user. No?

The two values only match on a quiet system anyway, since both are
moving forwards.

They will still match on a quiet system.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Simon Riggs

Date:

24 January 2012, 19:29:03

On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

> Yes, it might be too hard, but lets look.

Your committer has timed out.... ;-)

committed write mode only

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: New replication mode: write

From

Fujii Masao

Date:

25 January 2012, 05:34:38

On Wed, Jan 25, 2012 at 5:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> Yes, it might be too hard, but lets look.
>
> Your committer has timed out.... ;-)
>
> committed write mode only

Thanks for the commit!

The apply mode is attractive, but I need more time to implement that completely.
I might not be able to complete that within this CF. So committing the
write mode
only is right decision, I think. If I have time after all of the
patches which I'm interested
in will have been committed, I will try the apply mode again, but
maybe for 9.3dev.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center