Re: Speed up Clog Access by increasing CLOG buffers - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Speed up Clog Access by increasing CLOG buffers |
Date | |
Msg-id | 4a52a34f-57fa-7bcf-d34c-c15db40f0361@2ndquadrant.com Whole thread Raw |
In response to | Re: Speed up Clog Access by increasing CLOG buffers (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Speed up Clog Access by increasing CLOG buffers
|
List | pgsql-hackers |
On 10/25/2016 06:10 AM, Amit Kapila wrote: > On Mon, Oct 24, 2016 at 2:48 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> On Fri, Oct 21, 2016 at 7:57 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >>> On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra >>> <tomas.vondra@2ndquadrant.com> wrote: >>> >>>> In the results you've posted on 10/12, you've mentioned a regression with 32 >>>> clients, where you got 52k tps on master but only 48k tps with the patch (so >>>> ~10% difference). I have no idea what scale was used for those tests, >>> >>> That test was with scale factor 300 on POWER 4 socket machine. I think >>> I need to repeat this test with multiple reading to confirm it was >>> regression or run to run variation. I will do that soon and post the >>> results. >> >> As promised, I have rerun my test (3 times), and I did not see any regression. >> > > Thanks Tomas and Dilip for doing detailed performance tests for this > patch. I would like to summarise the performance testing results. > > 1. With update intensive workload, we are seeing gains from 23%~192% > at client count >=64 with group_update patch [1]. > 2. With tpc-b pgbench workload (at 1000 scale factor), we are seeing > gains from 12% to ~70% at client count >=64 [2]. Tests are done on > 8-socket intel m/c. > 3. With pgbench workload (both simple-update and tpc-b at 300 scale > factor), we are seeing gain 10% to > 50% at client count >=64 [3]. > Tests are done on 8-socket intel m/c. > 4. To see why the patch only helps at higher client count, we have > done wait event testing for various workloads [4], [5] and the results > indicate that at lower clients, the waits are mostly due to > transactionid or clientread. At client-counts where contention due to > CLOGControlLock is significant, this patch helps a lot to reduce that > contention. These tests are done on on 8-socket intel m/c and > 4-socket power m/c > 5. With pgbench workload (unlogged tables), we are seeing gains from > 15% to > 300% at client count >=72 [6]. > It's not entirely clear which of the above tests were done on unlogged tables, and I don't see that in the referenced e-mails. That would be an interesting thing to mention in the summary, I think. > There are many more tests done for the proposed patches where gains > are either or similar lines as above or are neutral. We do see > regression in some cases. > > 1. When data doesn't fit in shared buffers, there is regression at > some client counts [7], but on analysis it has been found that it is > mainly due to the shift in contention from CLOGControlLock to > WALWriteLock and or other locks. The questions is why shifting the lock contention to WALWriteLock should cause such significant performance drop, particularly when the test was done on unlogged tables. Or, if that's the case, how it makes the performance drop less problematic / acceptable. FWIW I plan to run the same test with logged tables - if it shows similar regression, I'll be much more worried, because that's a fairly typical scenario (logged tables, data set > shared buffers), and we surely can't just go and break that. > 2. We do see in some cases that granular_locking and no_content_lock > patches has shown significant increase in contention on > CLOGControlLock. I have already shared my analysis for same upthread > [8]. I do agree that some cases this significantly reduces contention on the CLogControlLock. I do however think that currently the performance gains are limited almost exclusively to cases on unlogged tables, and some logged+async cases. On logged tables it usually looks like this (i.e. modest increase for high client counts at the expense of significantly higher variability): http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64 or like this (i.e. only partial recovery for the drop above 36 clients): http://tvondra.bitbucket.org/#pgbench-3000-logged-async-skip-64 And of course, there are cases like this: http://tvondra.bitbucket.org/#dilip-300-logged-async I'd really like to understand why the patched results behave that differently depending on client count. >> Attached is the latest group update clog patch.> How is that different from the previous versions? > > In last commit fest, the patch was returned with feedback to evaluate > the cases where it can show win and I think above results indicates > that the patch has significant benefit on various workloads. What I > think is pending at this stage is the either one of the committer or > the reviewers of this patch needs to provide feedback on my analysis > [8] for the cases where patches are not showing win. > > Thoughts? > I do agree the patch(es) significantly reduce CLogControlLock, although with WAL logging enabled (which is what matters for most production deployments) it pretty much only shifts the contention to a different lock (so the immediate performance benefit is 0). Which raises the question why to commit this patch now, before we have a patch addressing the WAL locks. I realize this is a chicken-egg problem, but my worry is that the increased WALWriteLock contention will cause regressions in current workloads. BTW I've ran some tests with the number of clog buffers increases to 512, and it seems like a fairly positive. Compare for example these two results: http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip-clog-512 The first one is with the default 128 buffers, the other one is with 512 buffers. The impact on master is pretty obvious - for 72 clients the tps jumps from 160k to 197k, and for higher client counts it gives us about +50k tps (typically increase from ~80k to ~130k tps). And the tps variability is significantly reduced. For the other workload, the results are less convincing though: http://tvondra.bitbucket.org/#dilip-300-unlogged-sync http://tvondra.bitbucket.org/#dilip-300-unlogged-sync-clog-512 Interesting that the master adopts the zig-zag patter, but shifted. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: