Thread: Could synchronous streaming replication really degrade the performance of the primary?
Could synchronous streaming replication really degrade the performance of the primary?
From
"MauMau"
Date:
Hello, I've heard from some people that synchronous streaming replication has severe performance impact on the primary. They said that the transaction throughput of TPC-C like benchmark (perhaps DBT-2) decreased by 50%. I'm sorry I haven't asked them about their testing environment, because they just gave me their experience. They think that this result is much worse than some commercial database. I'm surprised. I know that the amount of transaction logs of PostgreSQL is larger than other databases because it it logs the entire row for each update operation instead of just changed columns, and because of full page writes. But I can't (and don't want to) believe that those have such big negative impact. Does anyone have any experience of benchmarking synchronous streaming replication under TPC-C or similar write-heavy workload? Could anybody give me any performance evaluation result if you don't mind? Regards MauMau
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Merlin Moncure
Date:
On Wed, May 9, 2012 at 8:06 AM, MauMau <maumau307@gmail.com> wrote: > Hello, > > I've heard from some people that synchronous streaming replication has > severe performance impact on the primary. They said that the transaction > throughput of TPC-C like benchmark (perhaps DBT-2) decreased by 50%. I'm > sorry I haven't asked them about their testing environment, because they > just gave me their experience. They think that this result is much worse > than some commercial database. I can't speak for other databases, but it's only natural to assume that tps must drop. At minimum, you have to add the latency of communication and remote sync operation to your transaction time. For very short transactions this adds up to a lot of extra work relative to the transaction itself. merlin
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Robert Klemme
Date:
On Wed, May 9, 2012 at 3:58 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, May 9, 2012 at 8:06 AM, MauMau <maumau307@gmail.com> wrote: >> I've heard from some people that synchronous streaming replication has >> severe performance impact on the primary. They said that the transaction >> throughput of TPC-C like benchmark (perhaps DBT-2) decreased by 50%. I'm >> sorry I haven't asked them about their testing environment, because they >> just gave me their experience. They think that this result is much worse >> than some commercial database. > > I can't speak for other databases, but it's only natural to assume > that tps must drop. At minimum, you have to add the latency of > communication and remote sync operation to your transaction time. For > very short transactions this adds up to a lot of extra work relative > to the transaction itself. Actually I would expect 50% degradation if both databases run on identical hardware: the second instance needs to do the same work (i.e. write WAL AND ensure it reached the disk) before it can acknowledge. "When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the transaction log on disk of both the primary and standby server." http://www.postgresql.org/docs/9.1/static/warm-standby.html#SYNCHRONOUS-REPLICATION I am not sure whether the replicant can be triggered to commit to disk before the commit to disk on the master has succeeded; if that was the case there would be true serialization => 50%. This sounds like it could actually be the case (note the "after it commits"): "When synchronous replication is requested the transaction will wait after it commits until it receives confirmation that the transfer has been successful." http://wiki.postgresql.org/wiki/Synchronous_replication Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Claudio Freire
Date:
On Wed, May 9, 2012 at 12:41 PM, Robert Klemme <shortcutter@googlemail.com> wrote: > I am not sure whether the replicant can be triggered to commit to disk > before the commit to disk on the master has succeeded; if that was the > case there would be true serialization => 50%. > > This sounds like it could actually be the case (note the "after it commits"): > "When synchronous replication is requested the transaction will wait > after it commits until it receives confirmation that the transfer has > been successful." > http://wiki.postgresql.org/wiki/Synchronous_replication That should only happen for very short transactions. IIRC, WAL records can be sent to the slaves before the transaction in the master commits, so bigger transactions would see higher parallelism.
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Robert Klemme
Date:
On Wed, May 9, 2012 at 5:45 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Wed, May 9, 2012 at 12:41 PM, Robert Klemme > <shortcutter@googlemail.com> wrote: >> I am not sure whether the replicant can be triggered to commit to disk >> before the commit to disk on the master has succeeded; if that was the >> case there would be true serialization => 50%. >> >> This sounds like it could actually be the case (note the "after it commits"): >> "When synchronous replication is requested the transaction will wait >> after it commits until it receives confirmation that the transfer has >> been successful." >> http://wiki.postgresql.org/wiki/Synchronous_replication > > That should only happen for very short transactions. > IIRC, WAL records can be sent to the slaves before the transaction in > the master commits, so bigger transactions would see higher > parallelism. I considered that as well. But the question is: when are they written to disk in the slave? If they are in buffer cache until data is synched to disk then you only gain a bit of advantage by earlier sending (i.e. network latency). Assuming a high bandwidth and low latency network (which you want to have in this case anyway) that gain is probably not big compared to the time it takes to ensure WAL is written to disk. I do not know implementation details but *if* the server triggers sync only after its own sync has succeeded *then* you basically have serialization and you need to wait twice the time. For small TX OTOH network overhead might relatively large compared to WAL IO (for example with a battery backed cache in the controller) that it shows. Since we do not know the test cases which lead to the 50% statement we can probably only speculate. Ultimately each individual setup and workload has to be tested. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Merlin Moncure
Date:
On Wed, May 9, 2012 at 12:03 PM, Robert Klemme <shortcutter@googlemail.com> wrote: > On Wed, May 9, 2012 at 5:45 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Wed, May 9, 2012 at 12:41 PM, Robert Klemme >> <shortcutter@googlemail.com> wrote: >>> I am not sure whether the replicant can be triggered to commit to disk >>> before the commit to disk on the master has succeeded; if that was the >>> case there would be true serialization => 50%. >>> >>> This sounds like it could actually be the case (note the "after it commits"): >>> "When synchronous replication is requested the transaction will wait >>> after it commits until it receives confirmation that the transfer has >>> been successful." >>> http://wiki.postgresql.org/wiki/Synchronous_replication >> >> That should only happen for very short transactions. >> IIRC, WAL records can be sent to the slaves before the transaction in >> the master commits, so bigger transactions would see higher >> parallelism. > > I considered that as well. But the question is: when are they written > to disk in the slave? If they are in buffer cache until data is > synched to disk then you only gain a bit of advantage by earlier > sending (i.e. network latency). Assuming a high bandwidth and low > latency network (which you want to have in this case anyway) that gain > is probably not big compared to the time it takes to ensure WAL is > written to disk. I do not know implementation details but *if* the > server triggers sync only after its own sync has succeeded *then* you > basically have serialization and you need to wait twice the time. > > For small TX OTOH network overhead might relatively large compared to > WAL IO (for example with a battery backed cache in the controller) > that it shows. Since we do not know the test cases which lead to the > 50% statement we can probably only speculate. Ultimately each > individual setup and workload has to be tested. yeah. note the upcoming 9.2 synchronous_commit=remote_write setting is intended to improve this situation by letting the transaction go a bit earlier -- the slave basically only has to acknowledge receipt of the data. merlin
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
"MauMau"
Date:
From: "Merlin Moncure" <mmoncure@gmail.com> > On Wed, May 9, 2012 at 8:06 AM, MauMau <maumau307@gmail.com> wrote: >> Hello, >> >> I've heard from some people that synchronous streaming replication has >> severe performance impact on the primary. They said that the transaction >> throughput of TPC-C like benchmark (perhaps DBT-2) decreased by 50%. I'm >> sorry I haven't asked them about their testing environment, because they >> just gave me their experience. They think that this result is much worse >> than some commercial database. > > I can't speak for other databases, but it's only natural to assume > that tps must drop. At minimum, you have to add the latency of > communication and remote sync operation to your transaction time. For > very short transactions this adds up to a lot of extra work relative > to the transaction itself. Yes, I understand it is natural for the response time of each transaction to double or more. But I think the throughput drop would be amortized among multiple simultaneous transactions. So, 50% throughput decrease seems unreasonable. If this thinking is correct, and some could kindly share his/her past performance evaluation results (ideally of DBT-2), I want to say to my acquaintance "hey, community people experience better performance, so you may need to review your configuration." Regards MauMau
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Claudio Freire
Date:
On Wed, May 9, 2012 at 7:34 PM, MauMau <maumau307@gmail.com> wrote: >> I can't speak for other databases, but it's only natural to assume >> that tps must drop. At minimum, you have to add the latency of >> communication and remote sync operation to your transaction time. For >> very short transactions this adds up to a lot of extra work relative >> to the transaction itself. > > > Yes, I understand it is natural for the response time of each transaction to > double or more. But I think the throughput drop would be amortized among > multiple simultaneous transactions. So, 50% throughput decrease seems > unreasonable. I'm pretty sure it depends a lot on the workload. Knowing the methodology used that arrived to those figures is critical. Was the thoughput decrease measured against no replication, or asynchronous replication? How many clients were used? What was the workload like? Was it CPU bound? I/O bound? Read-mostly? We have asynchronous replication in production and thoughput has not changed relative to no replication. I cannot see how making it synchronous would change thoughput, as it only induces waiting time on the clients, but no extra work. I can only assume the test didn't use enough clients to saturate the hardware under high-latency situations, or clients were somehow experiencing application-specific contention. I don't know the code, but knowing how synchronous replication works, I would say any such drop under high concurrency would be a bug, contention among waiting processes or something like that, that needs to be fixed.
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
"MauMau"
Date:
From: "Claudio Freire" <klaussfreire@gmail.com> On Wed, May 9, 2012 at 7:34 PM, MauMau <maumau307@gmail.com> wrote: >> Yes, I understand it is natural for the response time of each transaction >> to >> double or more. But I think the throughput drop would be amortized among >> multiple simultaneous transactions. So, 50% throughput decrease seems >> unreasonable. > I'm pretty sure it depends a lot on the workload. Knowing the > methodology used that arrived to those figures is critical. Was the > thoughput decrease measured against no replication, or asynchronous > replication? How many clients were used? What was the workload like? > Was it CPU bound? I/O bound? Read-mostly? > We have asynchronous replication in production and thoughput has not > changed relative to no replication. I cannot see how making it > synchronous would change thoughput, as it only induces waiting time on > the clients, but no extra work. I can only assume the test didn't use > enough clients to saturate the hardware under high-latency situations, > or clients were somehow experiencing application-specific contention. Thank you for your experience and opinion. The workload is TPC-C-like write-heavy one; DBT-2. They compared the throughput of synchronous replication case against that of no replication case. Today, they told me that they ran the test on two virtual machines on a single physical machine. They also used pgpool-II in both cases. In addition, they may have ran the applications and pgpool-II on the same virtual machine as the database server. It sounded to me that the resource is so scarce that concurrency was low, or your assumption may be correct. I'll hear more about their environment from them. BTW it's pity that I cannot find any case study of performance of the flagship feature of PostgreSQL 9.0/9.1, streaming replication... Regards MauMau
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Thomas Kellerer
Date:
MauMau, 10.05.2012 13:34: > Today, they told me that they ran the test on two virtual machines on > a single physical machine. Which means that both databases shared the same I/O system (harddisks). Thererfor it's not really surprising that the overall performance goes down if you increase the I/O load. A more realistic test (at least in my opinion) would have been to have two separate computers with two separate I/O systems
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
"Tomas Vondra"
Date:
On 10 Květen 2012, 13:34, MauMau wrote: > The workload is TPC-C-like write-heavy one; DBT-2. They compared the > throughput of synchronous replication case against that of no replication > case. > > Today, they told me that they ran the test on two virtual machines on a > single physical machine. They also used pgpool-II in both cases. In > addition, they may have ran the applications and pgpool-II on the same > virtual machine as the database server. So they've run a test that is usually I/O bound on a single machine? If they've used the same I/O devices, I'm surprised the degradation was just 50%. If you have a system that can handle X IOPS, and you run two instances there, each will get ~X/2 IOPS. No magic can help here. Even if they used separate I/O devices, there are probably many things that are shared and can become a bottleneck in a virtualized environment. The setup is definitely very suspicious. > It sounded to me that the resource is so scarce that concurrency was low, > or > your assumption may be correct. I'll hear more about their environment > from > them. > > BTW it's pity that I cannot find any case study of performance of the > flagship feature of PostgreSQL 9.0/9.1, streaming replication... There were some nice talks about performance impact of sync rep, for example this one: http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/SyncRepDurability.pdf There's also a video: http://www.youtube.com/watch?v=XL7j8hTd6R8 Tomas
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Merlin Moncure
Date:
On Wed, May 9, 2012 at 5:34 PM, MauMau <maumau307@gmail.com> wrote: > Yes, I understand it is natural for the response time of each transaction to > double or more. But I think the throughput drop would be amortized among > multiple simultaneous transactions. So, 50% throughput decrease seems > unreasonable. > > If this thinking is correct, and some could kindly share his/her past > performance evaluation results (ideally of DBT-2), I want to say to my > acquaintance "hey, community people experience better performance, so you > may need to review your configuration." It seems theoretically possible to interleave the processing on both sides but 50% reduction in throughput for latency bound transactions seems to be broadly advertised as what to reasonably expect for sync rep with 9.1. 9.2 beta is arriving shortly and when it does I suggest experimenting with the new remote_write feature of sync_rep over non-production workloads. merlin
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
"MauMau"
Date:
From: "Tomas Vondra" <tv@fuzzy.cz> > There were some nice talks about performance impact of sync rep, for > example this one: > > > http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/SyncRepDurability.pdf > > There's also a video: > > http://www.youtube.com/watch?v=XL7j8hTd6R8 Thanks. The video is especially interesting. I'll tell my aquaintance to check it, too. Regards MauMau
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
Fujii Masao
Date:
On Thu, May 10, 2012 at 8:34 PM, MauMau <maumau307@gmail.com> wrote: > Today, they told me that they ran the test on two virtual machines on a > single physical machine. They also used pgpool-II in both cases. In > addition, they may have ran the applications and pgpool-II on the same > virtual machine as the database server. So they compared the throughput of one server running on a single machine (non replication case) with that of two servers (i.e., master and standby) running on the same single machine (sync rep case)? The amount of CPU/Mem/IO resource available per server is not the same between those two cases. So ISTM it's very unfair for sync rep case. In this situation, I'm not surprised if I see 50% performance degradation in sync rep case. > It sounded to me that the resource is so scarce that concurrency was low, or > your assumption may be correct. I'll hear more about their environment from > them. > > BTW it's pity that I cannot find any case study of performance of the > flagship feature of PostgreSQL 9.0/9.1, streaming replication... Though I cannot show the detail for some reasons, as far as I measured the performance overhead of sync rep by using pgbench, the overhead of throughput was less than 10%. When measuring sync rep, I used two set of physical machine and storage for the master and standby, and used 1Gbps network between them. Regards, -- Fujii Masao
Re: Could synchronous streaming replication really degrade the performance of the primary?
From
"MauMau"
Date:
From: "Fujii Masao" <masao.fujii@gmail.com> > Though I cannot show the detail for some reasons, as far as I measured > the performance overhead of sync rep by using pgbench, the overhead of > throughput was less than 10%. When measuring sync rep, I used two > set of physical machine and storage for the master and standby, and > used 1Gbps network between them. Fujii-san, thanks a million. That's valuable information. The overhead less than 10% under perhaps high concurrency and write heavy workload exceeds my expectation. Great! Though I couldn't contact the testers today, I'll tell this to them next week. Regards MauMau