Thread: New replication mode: write
Hi, http://archives.postgresql.org/message-id/AANLkTilgyL3Y1jkDVHX02433COq7JLmqicsqmOsbuyA1%40mail.gmail.com Previously I proposed the replication mode "recv" on the above thread, but it's not committed yet. Now I'd like to propose that mode again because it's useful to reduce the overhead of synchronous replication. Attached patch implements that mode. If you choose that mode, transaction waits for its WAL to be write()'d on the standby, IOW, waits until the standby saves the WAL in the memory. Which provides lower level of durability than that current synchronous replication (i.e., transaction waits for its WAL to be flushed to the disk) does. However, it's practically useful setting because it can decrease the response time for the transaction, and causes no data loss unless both the master and the standby crashes and the database of the master gets corrupted at the same time. In the patch, you can choose that mode by setting synchronous_commit to write. I renamed that mode to "write" from "recv" on the basis of its actual behavior. I measured how much "write" mode improves the performance in synchronous replication. Here is the result: synchronous_commit = on tps = 424.510843 (including connections establishing) tps = 420.767883 (including connections establishing) tps = 419.715658 (including connections establishing) tps = 428.810001 (including connections establishing) tps = 337.341445 (including connections establishing) synchronous_commit = write tps = 550.752712 (including connections establishing) tps = 407.104036 (including connections establishing) tps = 455.576190 (including connections establishing) tps = 453.548672 (including connections establishing) tps = 555.171325 (including connections establishing) I used pgbench (scale factor = 100) as a benchmark and ran the following command. pgbench -c 8 -j 8 -T 60 -M prepared I always ran CHECKPOINT in both master and standby before starting each pgbench test, to prevent CHECKPOINT from affecting the result of the performance test. Thought? Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >>> Thought? Comments? >> >> This is almost exactly the same as my patch series >> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because >> I was updating that patch myself last night for 9.2. I'm about half >> way through doing that, since you and I agreed in Ottawa I would do >> this. Perhaps it is better if we work together? > > I think this comment is mostly pointless. We don't have time to work > together and there's no real reason to. You know what you're doing, so > I'll leave you to do it. > > Please add the Apply mode. OK, will do. > In my patch, the reason I avoided doing WRITE mode (which we had > previously referred to as RECV) was that no fsync of the WAL contents > takes place. In that case we are applying changes using un-fsynced WAL > data and in case of crash this would cause a problem. My patch has not changed the execution order of WAL flush and replay. WAL records are always replayed after they are flushed by walreceiver. So, such a problem doesn't happen. But which means that transaction might need to wait for WAL flush caused by previous transaction even if WRITE mode is chosen. Which limits the performance gain by WRITE mode, and should be improved later, I think. > I was going to > make the WalWriter available during recovery to cater for that. Do you > not think that is no longer necessary? That's still necessary to improve the performance in sync rep further, I think. What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is: * Allow WAL records to be replayed before they are flushed to the disk. * Add new GUC parameter specifying whether to allow the standby to defer WAL flush. If the parameter is false, walreceiverflushes WAL whenever it receives WAL (i.e., it's same as the current behavior). If true, walreceiver doesn'tflush WAL at all. Instead, walwriter, backend or startup process does that. Walwriter periodically checks whetherthere is un-flushed WAL file, and flushes it if exists. When the buffer page is written out, backend or startupprocess forces WAL flush up to buffer's LSN. If the above GUC parameter is set to true (i.e., walreceiver doesn't flush WAL at all) and WRITE mode is chosen, transaction doesn't need to wait for WAL flush on the standby at all. Also the frequency of WAL flush on the standby would become lower, which significantly reduces I/O load. After all, the performance in sync rep would improve very much. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Jan 13, 2012 at 12:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> In my patch, the reason I avoided doing WRITE mode (which we had >> previously referred to as RECV) was that no fsync of the WAL contents >> takes place. In that case we are applying changes using un-fsynced WAL >> data and in case of crash this would cause a problem. > > My patch has not changed the execution order of WAL flush and replay. > WAL records are always replayed after they are flushed by walreceiver. > So, such a problem doesn't happen. > But which means that transaction might need to wait for WAL flush caused > by previous transaction even if WRITE mode is chosen. Which limits the > performance gain by WRITE mode, and should be improved later, I think. If the WALreceiver still flushes that is OK. The latency would be smoother and lower if the WALwriter were active. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jan 13, 2012 at 9:27 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> >>>> Thought? Comments? >>> >>> This is almost exactly the same as my patch series >>> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because >>> I was updating that patch myself last night for 9.2. I'm about half >>> way through doing that, since you and I agreed in Ottawa I would do >>> this. Perhaps it is better if we work together? >> >> I think this comment is mostly pointless. We don't have time to work >> together and there's no real reason to. You know what you're doing, so >> I'll leave you to do it. >> >> Please add the Apply mode. > > OK, will do. Done. Attached is the updated version of the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > Done. Attached is the updated version of the patch. Thanks. I'll review this first, but can't start immediately. Please expect something back in 2 days. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 16, 2012 at 4:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > >> Done. Attached is the updated version of the patch. > > Thanks. > > I'll review this first, but can't start immediately. Please expect > something back in 2 days. On initial review this looks fine. I'll do a more thorough hands-on review now and commit if still OK. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Please add the Apply mode. >> >> OK, will do. > > Done. Attached is the updated version of the patch. I notice that the Apply mode isn't fully implemented. I had in mind that you would add the latch required to respond more quickly when only the Apply pointer has changed. Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or was there another reason for not implementing that? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > >>>> Please add the Apply mode. >>> >>> OK, will do. >> >> Done. Attached is the updated version of the patch. > > I notice that the Apply mode isn't fully implemented. I had in mind > that you would add the latch required to respond more quickly when > only the Apply pointer has changed. > > Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or > was there another reason for not implementing that? I agree that the feature you pointed is useful for the Apply mode. But I'm afraid that implementing that feature is not easy and would make the patch big and complicated, so I didn't implement the Apply mode first. To make the walreceiver call WaitLatchOrSocket(), we would need to merge it and libpq_select() into one function. But the former is the backend function and the latter is the frontend one. Now I have no good idea to merge them cleanly. If we send back the reply as soon as the Apply pointer is changed, I'm afraid quite lots of reply messages are sent frequently, which might cause performance problem. This is also one of the reasons why I didn't implement the quick-response feature. To address this problem, we might need to change the master so that it sends the Wait pointer to the standby, and change the standby so that it replies whenever the Apply pointer catches up with the Wait one. This can reduce the number of useless reply from the standby about the Apply pointer. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >>>>> Please add the Apply mode. >>>> >>>> OK, will do. >>> >>> Done. Attached is the updated version of the patch. >> >> I notice that the Apply mode isn't fully implemented. I had in mind >> that you would add the latch required to respond more quickly when >> only the Apply pointer has changed. >> >> Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or >> was there another reason for not implementing that? > > I agree that the feature you pointed is useful for the Apply mode. But > I'm afraid that implementing that feature is not easy and would make > the patch big and complicated, so I didn't implement the Apply mode first. > > To make the walreceiver call WaitLatchOrSocket(), we would need to > merge it and libpq_select() into one function. But the former is the backend > function and the latter is the frontend one. Now I have no good idea to > merge them cleanly. We can wait on the socket wherever it comes from. poll/select doesn't care how we got the socket. So we just need a common handler that calls either walreceiver/libpqwalreceiver function as required to handle the wakeup. > If we send back the reply as soon as the Apply pointer is changed, I'm > afraid quite lots of reply messages are sent frequently, which might > cause performance problem. This is also one of the reasons why I didn't > implement the quick-response feature. To address this problem, we might > need to change the master so that it sends the Wait pointer to the standby, > and change the standby so that it replies whenever the Apply pointer > catches up with the Wait one. This can reduce the number of useless > reply from the standby about the Apply pointer. We send back one reply per incoming message. The incoming messages don't know request state and checking that has a cost which I don't think is an appropriate payment since we only need this info when the link goes quiet. When the link goes quiet we still need to send replies if we have apply mode, but we only need to send apply messages if the lsn has changed because of a commit. That will considerably reduce the messages sent so I don't see a problem. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 23, 2012 at 6:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 23, 2012 at 9:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Mon, Jan 23, 2012 at 4:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On Mon, Jan 16, 2012 at 12:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> >>>>>> Please add the Apply mode. >>>>> >>>>> OK, will do. >>>> >>>> Done. Attached is the updated version of the patch. >>> >>> I notice that the Apply mode isn't fully implemented. I had in mind >>> that you would add the latch required to respond more quickly when >>> only the Apply pointer has changed. >>> >>> Is there a reason not to use WaitLatchOrSocket() in WALReceiver? Or >>> was there another reason for not implementing that? >> >> I agree that the feature you pointed is useful for the Apply mode. But >> I'm afraid that implementing that feature is not easy and would make >> the patch big and complicated, so I didn't implement the Apply mode first. >> >> To make the walreceiver call WaitLatchOrSocket(), we would need to >> merge it and libpq_select() into one function. But the former is the backend >> function and the latter is the frontend one. Now I have no good idea to >> merge them cleanly. > > We can wait on the socket wherever it comes from. poll/select doesn't > care how we got the socket. > > So we just need a common handler that calls either > walreceiver/libpqwalreceiver function as required to handle the > wakeup. I'm afraid I could not understand your idea. Could you explain it in more detail? >> If we send back the reply as soon as the Apply pointer is changed, I'm >> afraid quite lots of reply messages are sent frequently, which might >> cause performance problem. This is also one of the reasons why I didn't >> implement the quick-response feature. To address this problem, we might >> need to change the master so that it sends the Wait pointer to the standby, >> and change the standby so that it replies whenever the Apply pointer >> catches up with the Wait one. This can reduce the number of useless >> reply from the standby about the Apply pointer. > > We send back one reply per incoming message. The incoming messages > don't know request state and checking that has a cost which I don't > think is an appropriate payment since we only need this info when the > link goes quiet. > > When the link goes quiet we still need to send replies if we have > apply mode, but we only need to send apply messages if the lsn has > changed because of a commit. That will considerably reduce the > messages sent so I don't see a problem. You mean to change the meaning of apply_location? Currently it indicates the end + 1 of the last replayed WAL record, regardless of whether it's a commit record or not. So too many replies can be sent per incoming message because it might contain many WAL records. But you mean to change apply_location only when a commit record is replayed? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> To make the walreceiver call WaitLatchOrSocket(), we would need to >>> merge it and libpq_select() into one function. But the former is the backend >>> function and the latter is the frontend one. Now I have no good idea to >>> merge them cleanly. >> >> We can wait on the socket wherever it comes from. poll/select doesn't >> care how we got the socket. >> >> So we just need a common handler that calls either >> walreceiver/libpqwalreceiver function as required to handle the >> wakeup. > > I'm afraid I could not understand your idea. Could you explain it in > more detail? We either tell libpqwalreceiver about the latch, or we tell walreceiver about the socket used by libpqwalreceiver. In either case we share a pointer from one module to another. >>> If we send back the reply as soon as the Apply pointer is changed, I'm >>> afraid quite lots of reply messages are sent frequently, which might >>> cause performance problem. This is also one of the reasons why I didn't >>> implement the quick-response feature. To address this problem, we might >>> need to change the master so that it sends the Wait pointer to the standby, >>> and change the standby so that it replies whenever the Apply pointer >>> catches up with the Wait one. This can reduce the number of useless >>> reply from the standby about the Apply pointer. >> >> We send back one reply per incoming message. The incoming messages >> don't know request state and checking that has a cost which I don't >> think is an appropriate payment since we only need this info when the >> link goes quiet. >> >> When the link goes quiet we still need to send replies if we have >> apply mode, but we only need to send apply messages if the lsn has >> changed because of a commit. That will considerably reduce the >> messages sent so I don't see a problem. > > You mean to change the meaning of apply_location? Currently it indicates > the end + 1 of the last replayed WAL record, regardless of whether it's > a commit record or not. So too many replies can be sent per incoming > message because it might contain many WAL records. But you mean to > change apply_location only when a commit record is replayed? There is no change to the meaning of apply_location. The only change is that we send that message only when it has an updated value of committed lsn. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 23, 2012 at 9:53 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, Jan 23, 2012 at 10:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > >>>> To make the walreceiver call WaitLatchOrSocket(), we would need to >>>> merge it and libpq_select() into one function. But the former is the backend >>>> function and the latter is the frontend one. Now I have no good idea to >>>> merge them cleanly. >>> >>> We can wait on the socket wherever it comes from. poll/select doesn't >>> care how we got the socket. >>> >>> So we just need a common handler that calls either >>> walreceiver/libpqwalreceiver function as required to handle the >>> wakeup. >> >> I'm afraid I could not understand your idea. Could you explain it in >> more detail? > > We either tell libpqwalreceiver about the latch, or we tell > walreceiver about the socket used by libpqwalreceiver. > > In either case we share a pointer from one module to another. The former seems difficult because it's not easy to link libpqwalreceiver.so to the latch. I will consider about the latter. >>>> If we send back the reply as soon as the Apply pointer is changed, I'm >>>> afraid quite lots of reply messages are sent frequently, which might >>>> cause performance problem. This is also one of the reasons why I didn't >>>> implement the quick-response feature. To address this problem, we might >>>> need to change the master so that it sends the Wait pointer to the standby, >>>> and change the standby so that it replies whenever the Apply pointer >>>> catches up with the Wait one. This can reduce the number of useless >>>> reply from the standby about the Apply pointer. >>> >>> We send back one reply per incoming message. The incoming messages >>> don't know request state and checking that has a cost which I don't >>> think is an appropriate payment since we only need this info when the >>> link goes quiet. >>> >>> When the link goes quiet we still need to send replies if we have >>> apply mode, but we only need to send apply messages if the lsn has >>> changed because of a commit. That will considerably reduce the >>> messages sent so I don't see a problem. >> >> You mean to change the meaning of apply_location? Currently it indicates >> the end + 1 of the last replayed WAL record, regardless of whether it's >> a commit record or not. So too many replies can be sent per incoming >> message because it might contain many WAL records. But you mean to >> change apply_location only when a commit record is replayed? > > There is no change to the meaning of apply_location. The only change > is that we send that message only when it has an updated value of > committed lsn. This means that apply_location might return the different location from pg_last_xlog_replay_location() on the standby, though in 9.1 they return the same. Which might confuse a user. No? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 24, 2012 at 10:47 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> I'm afraid I could not understand your idea. Could you explain it in >>> more detail? >> >> We either tell libpqwalreceiver about the latch, or we tell >> walreceiver about the socket used by libpqwalreceiver. >> >> In either case we share a pointer from one module to another. > > The former seems difficult because it's not easy to link libpqwalreceiver.so > to the latch. I will consider about the latter. Yes, it might be too hard, but lets look. >>> You mean to change the meaning of apply_location? Currently it indicates >>> the end + 1 of the last replayed WAL record, regardless of whether it's >>> a commit record or not. So too many replies can be sent per incoming >>> message because it might contain many WAL records. But you mean to >>> change apply_location only when a commit record is replayed? >> >> There is no change to the meaning of apply_location. The only change >> is that we send that message only when it has an updated value of >> committed lsn. > > This means that apply_location might return the different location from > pg_last_xlog_replay_location() on the standby, though in 9.1 they return > the same. Which might confuse a user. No? The two values only match on a quiet system anyway, since both are moving forwards. They will still match on a quiet system. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Yes, it might be too hard, but lets look. Your committer has timed out.... ;-) committed write mode only -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jan 25, 2012 at 5:28 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, Jan 24, 2012 at 11:00 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Yes, it might be too hard, but lets look. > > Your committer has timed out.... ;-) > > committed write mode only Thanks for the commit! The apply mode is attractive, but I need more time to implement that completely. I might not be able to complete that within this CF. So committing the write mode only is right decision, I think. If I have time after all of the patches which I'm interested in will have been committed, I will try the apply mode again, but maybe for 9.3dev. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center