Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Scaling shared buffer eviction
Date
Msg-id 20141001185439.GD7158@awork2.anarazel.de
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Scaling shared buffer eviction
List pgsql-hackers
On 2014-09-25 16:50:44 +0200, Andres Freund wrote:
> On 2014-09-25 10:44:40 -0400, Robert Haas wrote:
> > On Thu, Sep 25, 2014 at 10:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Thu, Sep 25, 2014 at 10:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > >> On 2014-09-25 10:22:47 -0400, Robert Haas wrote:
> > >>> On Thu, Sep 25, 2014 at 10:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > >>> > That leads me to wonder: Have you measured different, lower, number of
> > >>> > buffer mapping locks? 128 locks is, if we'd as we should align them
> > >>> > properly, 8KB of memory. Common L1 cache sizes are around 32k...
> > >>>
> > >>> Amit has some results upthread showing 64 being good, but not as good
> > >>> as 128.  I haven't verified that myself, but have no reason to doubt
> > >>> it.
> > >>
> > >> How about you push the spinlock change and I crosscheck the partition
> > >> number on a multi socket x86 machine? Seems worthwile to make sure that
> > >> it doesn't cause problems on x86. I seriously doubt it'll, but ...
> > >
> > > OK.
> > 
> > Another thought is that we should test what impact your atomics-based
> > lwlocks have on this.
> 
> Yes, I'd planned to test that as well. I think that it will noticeably
> reduce the need to increase the number of partitions for workloads that
> fit into shared_buffers. But it won't do much about exclusive
> acquirations of the buffer mapping locks. So I think there's independent
> benefit of increasing the number.

Here we go.

Postgres was configured with.-c shared_buffers=8GB \-c log_line_prefix="[%m %p] " \-c log_min_messages=debug1 \-p 5440
\-ccheckpoint_segments=600-c max_connections=200
 

Each individual measurement (#TPS) is the result of a
pgbench -h /tmp/ -p 5440 postgres -n -M prepared -c $clients -j $clients -S -T 10
run.

Master is as of ef8863844bb0b0dab7b92c5f278302a42b4bf05a.

First, a scale 200 run. That fits entirely into shared_buffers:

#scale          #client         #partitions             #TPS
200             1               16                      8353.547724     8145.296655     8263.295459
200             16              16                      171014.763118   193971.091518   133992.128348
200             32              16                      259119.988034   234619.421322   201879.618322
200             64              16                      178909.038670   179425.091562   181391.354613
200             96              16                      141402.895201   138392.705402   137216.416951
200             128             16                      125643.089677   124465.288860   122527.209125

(other runs here stricken, they were contorted due some concurrent
activity. But nothing interesting).

So, there's quite some variation in here. Not very surprising given the
short runtimes, but still.

Looking at a profile nearly all the contention is around
GetSnapshotData(). That might hide the interesting scalability effects
of the partition number. So I next tried my rwlock-contention branch.

#scale          #client         #partitions             #TPS
200             1               1                       8540.390223     8285.628397     8497.022656
200             16              1                       136875.484896   164302.769380   172053.413980
200             32              1                       308624.650724   240502.019046   260825.231470
200             64              1                       453004.188676   406226.943046   406973.325822
200             96              1                       442608.459701   450185.431848   445549.710907
200             128             1                       487138.077973   496233.594356   457877.992783

200             1               16                      9477.217454     8181.098317     8457.276961
200             16              16                      154224.573476   170238.637315   182941.035416
200             32              16                      302230.215403   285124.708236   265917.729628
200             64              16                      405151.647136   443473.797835   456072.782722
200             96              16                      443360.377281   457164.981119   474049.685940
200             128             16                      490616.257063   458273.380238   466429.948417

200             1               64                      8410.981874     11554.708966    8359.294710
200             16              64                      139378.312883   168398.919590   166184.744944
200             32              64                      288657.701012   283588.901083   302241.706222
200             64              64                      424838.919754   416926.779367   436848.292520
200             96              64                      462352.017671   446384.114441   483332.592663
200             128             64                      471578.594596   488862.395621   466692.726385

200             1               128                     8350.274549     8140.699687     8305.975703
200             16              128                     144553.966808   154711.927715   202437.837908
200             32              128                     290193.349170   213242.292597   261016.779185
200             64              128                     413792.389493   431267.716855   456587.450294
200             96              128                     490459.212833   456375.442210   496430.996055
200             128             128                     470067.179360   464513.801884   483485.000502

Not much there either.

So, on to the next scale, 1000. That doesn't fit into s_b anymore.

master:
#scale          #client         #partitions             #TPS
1000            1               1                       7378.370717     7110.988121     7164.977746
1000            16              1                       66439.037413    85151.814130    85047.296626
1000            32              1                       71505.487093    75687.291060    69803.895496
1000            64              1                       42148.071099    41934.631603    43253.528849
1000            96              1                       33760.812746    33969.800564    33598.640121
1000            128             1                       30382.414165    30047.284982    30144.576494

1000            1               16                      7228.883843     9479.793813     7217.657145
1000            16              16                      105203.710528   112375.187471   110919.986283
1000            32              16                      146294.286762   145391.938025   144620.709764
1000            64              16                      134411.772164   134536.943367   136196.793573
1000            96              16                      107626.878208   105289.783922   96480.468107
1000            128             16                      92597.909379    86128.040557    92417.727720

1000            1               64                      7130.392436     12801.641683    7019.999330
1000            16              64                      120180.196384   125319.373819   126137.930478
1000            32              64                      181876.697461   190578.106760   189412.973015
1000            64              64                      216233.590299   222561.774501   225802.194056
1000            96              64                      171928.358031   165922.395721   168283.712990
1000            128             64                      139303.139631   137564.877450   141534.449640

1000            1               128                     8215.702354     7209.520152     7026.888706
1000            16              128                     116196.740200   123018.284948   127045.761518
1000            32              128                     183391.488566   185428.757458   185732.926794
1000            64              128                     218547.133675   218096.002473   208679.436158
1000            96              128                     155209.830821   156327.200412   157542.582637
1000            128             128                     131127.769076   132084.933955   124706.336737

rwlock:
#scale          #client         #partitions             #TPS
1000            1               1                       7377.270393     7494.260136     7207.898866
1000            16              1                       79289.755569    88032.480145    86810.772569
1000            32              1                       83006.336151    88961.964680    88508.832253
1000            64              1                       44135.036648    46582.727314    45119.421278
1000            96              1                       35036.174438    35687.025568    35469.127697
1000            128             1                       30597.870830    30782.335225    30342.454439

1000            1               16                      7114.602838     7265.863826     7205.225737
1000            16              16                      128507.292054   131868.678603   124507.097065
1000            32              16                      212779.122153   185666.608338   210714.373254
1000            64              16                      239776.079534   239923.393293   242476.922423
1000            96              16                      169240.934839   166021.430680   169187.643644
1000            128             16                      136601.409985   139340.961857   141731.068752

1000            1               64                      13271.722885    11348.028311    12531.188689
1000            16              64                      129074.053482   125334.720264   125140.499619
1000            32              64                      198405.463848   196605.923684   198354.818005
1000            64              64                      250463.474112   249543.622897   251517.159399
1000            96              64                      251715.751133   254168.028451   251502.783058
1000            128             64                      243596.368933   234671.592026   239123.259642

1000            1               128                     7376.371403     7301.077478     7240.526379
1000            16              128                     127992.070372   133537.637394   123382.418747
1000            32              128                     185807.703422   194303.674428   184919.586634
1000            64              128                     270233.496350   271576.483715   262281.662510
1000            96              128                     266023.529574   272484.352878   271921.597420
1000            128             128                     260004.301457   266710.469926   263713.245868


Based on this I think we can fairly conclude that increasing the number
of partitions is quite the win on larger x86 machines too. Independent
of the rwlock patch, although it moves the contention points to some
degree.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: pg_receivexlog and replication slots
Next
From: Peter Geoghegan
Date:
Subject: Re: "Value locking" Wiki page