Re: Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance

From Jignesh K. Shah
Subject Re: Proposal of tunable fix for scalability of 8.4
Date
Msg-id 49BA8CF6.6090604@sun.com
Whole thread Raw
In response to Re: Proposal of tunable fix for scalability of 8.4  ("Jignesh K. Shah" <J.K.Shah@Sun.COM>)
Responses Re: Proposal of tunable fix for scalability of 8.4
Re: Proposal of tunable fix for scalability of 8.4
List pgsql-performance

Now with a modified Fix (not the original one that I proposed but
something that works like a heart valve : Opens and shuts to minimum
default way thus  controlling how many waiters are waked up )

Time:Users:throughput: Reponse
60: 8: Medium Throughput: 7774.000 Avg Medium Resp: 0.004
120: 16: Medium Throughput: 16874.000 Avg Medium Resp: 0.004
180: 24: Medium Throughput: 25159.000 Avg Medium Resp: 0.004
240: 32: Medium Throughput: 33216.000 Avg Medium Resp: 0.005
300: 40: Medium Throughput: 42418.000 Avg Medium Resp: 0.005
360: 48: Medium Throughput: 49655.000 Avg Medium Resp: 0.005
420: 56: Medium Throughput: 58149.000 Avg Medium Resp: 0.005
480: 64: Medium Throughput: 66558.000 Avg Medium Resp: 0.005
540: 72: Medium Throughput: 74474.000 Avg Medium Resp: 0.005
600: 80: Medium Throughput: 82241.000 Avg Medium Resp: 0.005
660: 88: Medium Throughput: 90336.000 Avg Medium Resp: 0.005
720: 96: Medium Throughput: 99101.000 Avg Medium Resp: 0.006
780: 104: Medium Throughput: 106028.000 Avg Medium Resp: 0.006
840: 112: Medium Throughput: 113196.000 Avg Medium Resp: 0.006
900: 120: Medium Throughput: 119174.000 Avg Medium Resp: 0.006
960: 128: Medium Throughput: 129408.000 Avg Medium Resp: 0.006
1020: 136: Medium Throughput: 134433.000 Avg Medium Resp: 0.007
1080: 144: Medium Throughput: 143121.000 Avg Medium Resp: 0.007
1140: 152: Medium Throughput: 144603.000 Avg Medium Resp: 0.007
1200: 160: Medium Throughput: 148604.000 Avg Medium Resp: 0.008
1260: 168: Medium Throughput: 150274.000 Avg Medium Resp: 0.009
1320: 176: Medium Throughput: 150581.000 Avg Medium Resp: 0.010
1380: 184: Medium Throughput: 146912.000 Avg Medium Resp: 0.012
1440: 192: Medium Throughput: 143945.000 Avg Medium Resp: 0.013
1500: 200: Medium Throughput: 144029.000 Avg Medium Resp: 0.015
1560: 208: Medium Throughput: 143468.000 Avg Medium Resp: 0.016
1620: 216: Medium Throughput: 144367.000 Avg Medium Resp: 0.017
1680: 224: Medium Throughput: 148340.000 Avg Medium Resp: 0.017
1740: 232: Medium Throughput: 148842.000 Avg Medium Resp: 0.018
1800: 240: Medium Throughput: 149533.000 Avg Medium Resp: 0.019
1860: 248: Medium Throughput: 152334.000 Avg Medium Resp: 0.019
1920: 256: Medium Throughput: 151521.000 Avg Medium Resp: 0.020
1980: 264: Medium Throughput: 148961.000 Avg Medium Resp: 0.022
2040: 272: Medium Throughput: 151270.000 Avg Medium Resp: 0.022
2100: 280: Medium Throughput: 149783.000 Avg Medium Resp: 0.024
2160: 288: Medium Throughput: 151743.000 Avg Medium Resp: 0.024
2220: 296: Medium Throughput: 155190.000 Avg Medium Resp: 0.026
2280: 304: Medium Throughput: 150955.000 Avg Medium Resp: 0.027
2340: 312: Medium Throughput: 147118.000 Avg Medium Resp: 0.029
2400: 320: Medium Throughput: 152768.000 Avg Medium Resp: 0.029
2460: 328: Medium Throughput: 161044.000 Avg Medium Resp: 0.028
2520: 336: Medium Throughput: 157926.000 Avg Medium Resp: 0.029
2580: 344: Medium Throughput: 161005.000 Avg Medium Resp: 0.029
2640: 352: Medium Throughput: 167274.000 Avg Medium Resp: 0.029
2700: 360: Medium Throughput: 168253.000 Avg Medium Resp: 0.031


With final vmstats improving but still far from 100%
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs us
sy id
 38 0 0 46052840 39345096 0 11 0 0 0  0  0  0  0  0  0 134137 290703
303518 40 14 45
 43 0 0 45656456 38882912 23 77 0 0 0 0  0  0  0  0  0 135820 272899
300749 40 15 45
 38 0 0 45650488 38816984 23 80 0 0 0 0  0  0  0  0  0 135009 272767
300192 39 15 46
 47 0 0 46020792 39187688 0 5 0 0  0  0  0  0  0  0  0 140473 285445
312826 40 14 46
 24 0 0 46143984 39326848 9 61 0 0 0  0  0  0  0  0  0 146194 308590
328241 40 15 45
 37 0 0 45465256 38757000 22 74 0 0 0 0  0  0  0  0  0 136835 293971
301433 38 14 48
 35 0 0 46017544 39308072 12 61 0 0 0 0  0  0  0  0  0 142749 312355
320592 42 15 43
 36 0 0 45456000 38744688 11 24 0 0 0 0  0  0  0  0  0 143566 303461
317683 41 15 43
 23 0 0 46007408 39291312 2 22 0 0 0  0  0  0  0  0  0 140246 300061
316663 42 15 43
 20 0 0 46029656 39281704 10 25 0 0 0 0  0  0  0  0  0 147787 291825
326387 43 15 42
 24 0 0 46131016 39288528 2 21 0 0 0  0  0  0  0  0  0 150796 310697
335791 43 15 42
 20 0 0 46109448 39269392 16 67 0 0 0 0  0  0  0  0  0 150075 315517
332881 43 16 41
 30 0 0 45540928 38710376 9 27 0 0 0  0  0  0  0  0  0 155214 316448
341472 43 16 40
 14 0 0 45987496 39270016 0 5 0 0  0  0  0  0  0  0  0 155028 333711
344207 44 16 40
 25 0 0 45981136 39263008 0 10 0 0 0  0  0  0  0  0  0 153968 327343
343776 45 16 39
 54 0 0 46062984 39259936 0 7 0 0  0  0  0  0  0  0  0 153721 315839
344732 45 16 39
 42 0 0 46099704 39252920 0 15 0 0 0  0  0  0  0  0  0 154629 323125
348798 45 16 39
 54 0 0 46068944 39230808 0 8 0 0  0  0  0  0  0  0  0 157166 340265
354135 46 17 37

But the real winner shows up in lockstat where it seems to indicate that
stress on Waiting from ProcArrayLock is relieved (thought shifting
somewhere else which is how lock works):

# ./84_lwlock.d 8042

               Lock Id            Mode           State           Count
          WALWriteLock       Exclusive        Acquired               1
            XidGenLock       Exclusive         Waiting               3
       CLogControlLock          Shared         Waiting              11
         ProcArrayLock          Shared         Waiting              39
       CLogControlLock       Exclusive         Waiting              52
         WALInsertLock       Exclusive         Waiting              73
       CLogControlLock          Shared        Acquired              91
         ProcArrayLock       Exclusive        Acquired              96
            XidGenLock       Exclusive        Acquired              96
         ProcArrayLock       Exclusive         Waiting             121
       CLogControlLock       Exclusive        Acquired             199
         WALInsertLock       Exclusive        Acquired             310
   FirstBufMappingLock          Shared        Acquired             408
      FirstLockMgrLock          Shared        Acquired             618
         ProcArrayLock          Shared        Acquired             746
        SInvalReadLock          Shared        Acquired            1542

               Lock Id            Mode           State   Combined Time (ns)
         WALInsertLock                        Acquired               118673
       CLogControlLock                        Acquired               172130
   FirstBufMappingLock                        Acquired               177196
          WALWriteLock       Exclusive        Acquired               208403
            XidGenLock       Exclusive         Waiting               325989
      FirstLockMgrLock                        Acquired              2667351
         ProcArrayLock                        Acquired              8179335
            XidGenLock       Exclusive        Acquired              8896177
       CLogControlLock          Shared         Waiting              9680401
       CLogControlLock       Exclusive         Waiting             19105179
       CLogControlLock       Exclusive        Acquired             27484249
        SInvalReadLock                        Acquired             43026960
   FirstBufMappingLock       Exclusive        Acquired             45232906
         ProcArrayLock          Shared         Waiting             46741660
         WALInsertLock       Exclusive         Waiting             50912148
      FirstLockMgrLock       Exclusive        Acquired             58789829
         WALInsertLock       Exclusive        Acquired             86653791
         ProcArrayLock       Exclusive         Waiting            213980787
         ProcArrayLock       Exclusive        Acquired            270028367
        SInvalReadLock       Exclusive        Acquired            303044735




SET minf mjf xcal   intr  ithr  csw   icsw   migr  smtx   srw syscl  usr
sys  wt idl sze
  0    1   0 147238 159453 8806 370676 89236 71258 98435    0 380008
47  17   0  35  64
  0    6   0 132463 143446 7975 331685 80847 64746 86578    0 329315
44  16   0  41  64
  0   16   0 146655 158621 8987 366866 90756 69953 93786    0 349346
49  17   0  34  64
  0   18   0 151326 163492 8992 377634 92860 72406 98968    4 365121
49  17   0  33  64
  0    2   0 142914 154169 8243 352104 81385 69598 91260    0 340887
42  16   0  42  64
  0   16   0 156755 168962 9080 386475 93072 74775 101465    0 379250
47  18   0  36  64
  0    1   0 152807 165134 8880 379521 90671 75073 99692    0 380412
48  18   0  35  64
  0    1   0 134778 146041 8122 339137 79888 66633 89220    0 342600
43  16   0  41  64
  0   16   0 153014 164789 8834 376117 93000 72743 97644    0 371792
48  18   0  35  64


Not sure what SInvalReadLock does.. need to read up on that..


-Jignesh

>
> 1200: 160: Medium Throughput: 130487.000 Avg Medium Resp: 0.011
> 1260: 168: Medium Throughput: 123368.000 Avg Medium Resp: 0.013
> 1320: 176: Medium Throughput: 134649.000 Avg Medium Resp: 0.012
> 1380: 184: Medium Throughput: 136272.000 Avg Medium Resp: 0.013
>
>
> kthr      memory            page            disk          faults      cpu
> r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 sd   in   sy   cs
> us sy id
> 3 0 0 44710008 39603320 0 135 0 0 0  0  0  0  0  0  0 110564 145678
> 211567 33 10 57
> 5 0 0 44663368 39595008 0 6 0  0  0  0  0  0  0  0  0 108891 143083
> 208389 33 10 58
> 3 0 0 44753496 39593824 0 132 0 0 0  0  0  0  0  0  0 109922 126865
> 209869 33 9 57
> 4 0 0 44788368 39588528 0 7 0  0  0  0  0  0  0  0  0 108680 129073
> 208068 33 10 57
> 2 0 0 44767920 39570592 0 147 0 0 0  0  0  0  0  0  0 106671 142403
> 204724 33 10 58
> 4 0 0 44762656 39563256 0 11 0 0  0  0  0  0  0  0  0 106185 130328
> 204551 34 10 57
> 2 0 0 44674584 39560912 0 148 0 0 0  0  0  0  0  0  0 104757 139147
> 201448 32 10 58
> 1 0 0 44619824 39551024 0 9 0  0  0  0  0  0  0  0  0 103653 142125
> 199896 32 10 58
> 2 0 0 44622480 39552432 0 141 0 0 0  0  0  0  0  0  0 101373 134547
> 195553 32 9 58
> 1 0 0 44739936 39552312 0 11 0 0  0  0  0  0  0  0  0 102932 121742
> 198205 33 9 58
>
>
> And lock stats are as follows at about 280 users sampling for a single
> backend process:
> # ./84_lwlock.d 29405
>
>               Lock Id            Mode           State           Count
>          WALWriteLock       Exclusive        Acquired               1
>            XidGenLock       Exclusive         Waiting               1
>       CLogControlLock          Shared         Waiting               3
>         ProcArrayLock          Shared         Waiting               7
>       CLogControlLock       Exclusive         Waiting               9
>         WALInsertLock       Exclusive         Waiting              45
>       CLogControlLock          Shared        Acquired              52
>         ProcArrayLock       Exclusive         Waiting              61
>            XidGenLock       Exclusive        Acquired              96
>         ProcArrayLock       Exclusive        Acquired              97
>       CLogControlLock       Exclusive        Acquired             152
>         WALInsertLock       Exclusive        Acquired             302
>         ProcArrayLock          Shared        Acquired             729
>      FirstLockMgrLock          Shared        Acquired             812
>   FirstBufMappingLock          Shared        Acquired             857
>        SInvalReadLock          Shared        Acquired            1551
>
>               Lock Id            Mode           State   Combined Time
> (ns)
>         WALInsertLock                        Acquired
> 89909
>            XidGenLock       Exclusive         Waiting
> 101488
>          WALWriteLock       Exclusive        Acquired
> 140563
>       CLogControlLock          Shared         Waiting
> 354756
>   FirstBufMappingLock                        Acquired
> 471438
>      FirstLockMgrLock                        Acquired
> 2907141
>            XidGenLock       Exclusive        Acquired
> 7450934
>       CLogControlLock       Exclusive         Waiting
> 11094716
>         ProcArrayLock                        Acquired
> 15495229
>         WALInsertLock       Exclusive         Waiting
> 20801169
>       CLogControlLock       Exclusive        Acquired
> 21339264
>        SInvalReadLock                        Acquired
> 24309991
>      FirstLockMgrLock       Exclusive        Acquired
> 39904071
>   FirstBufMappingLock       Exclusive        Acquired
> 40826435
>         ProcArrayLock          Shared         Waiting
> 86352947
>         WALInsertLock       Exclusive        Acquired
> 89336432
>        SInvalReadLock       Exclusive        Acquired
> 252574515
>         ProcArrayLock       Exclusive        Acquired
> 315064347
>         ProcArrayLock       Exclusive         Waiting
> 847806215
>
> mpstat outputs is too much so I am doing aggegation by procesor set
> which is all 64 cpus
>
> -bash-3.2$ mpstat -a 10
>
> SET minf mjf xcal   intr  ithr  csw    icsw  migr  smtx  srw syscl
> usr sys  wt idl sze
>  0  370   0 118649 127575 7595 244456 43931 62166 8700    0 158929
> 38  11   0  50  64
>  0  167   0 119668 128704 7644 246389 43287 62357 8816    0 161006
> 38  11   0  51  64
>  0   27   0 109461 117433 6997 224514 38562 56446 8171    0 148322
> 34  10   0  56  64
>  0    2   0 122368 131549 7871 250237 39620 61478 9082    0 165995
> 36  11   0  52  64
>  0    0   0 122025 131380 7973 249429 37292 59863 8922    0 166319
> 35  11   0  54  64
>
> (quick overview of columns )
>     SET    Processor set
>     minf          minor faults
>     mjf           major faults
>     xcal          inter-processor cross-calls
>     intr          interrupts
>     ithr          interrupts  as  threads  (not  counting  clock
>                   interrupt)
>     csw           context switches
>     icsw          involuntary context switches
>     migr          thread migrations (to another processor)
>     smtx          spins on mutexes (lock not acquired  on  first
>                   try)
>     srw           spins  on  readers/writer  locks   (lock   not
>                   acquired on first try)
>     syscl         system calls
>     usr           percent user time
>     sys           percent system time
>     wt            the I/O wait time is no longer calculated as a
>                   percentage  of  CPU  time,  and this statistic
>                   will always return zero.
>     idl           percent idle time
>     sze           number of processors in the requested  proces-
>                   sor set
>
>
> -Jignesh
>
>

--
Jignesh Shah           http://blogs.sun.com/jkshah
The New Sun Microsystems,Inc   http://sun.com/postgresql


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Proposal of tunable fix for scalability of 8.4
Next
From: Scott Carey
Date:
Subject: Re: Proposal of tunable fix for scalability of 8.4