ProcArrayLock (The Saga continues) - Mailing list pgsql-performance

From Jignesh K. Shah
Subject ProcArrayLock (The Saga continues)
Date
Msg-id 483F294A.9040108@sun.com
Whole thread Raw
Responses Re: ProcArrayLock (The Saga continues)
List pgsql-performance
Based on feedback after the sessions I did few more tests which might be
useful to share

One point that was suggested to get each clients do more work and reduce
the number of clients.. The igen benchmarks was flexible and what I did
was remove all think time from it and repeated the test till the
scalability stops (This was done with CVS downloaded yesterday)

Note with this no think time concept, each clients can be about 75% CPU
busy from what I observed. running it I found the clients scaling up
saturates at about 60  now (compared to 500 from the original test). The
peak throughput was at about 50 users (using synchrnous_commit=off)

Here is the interesting DTrace Lock Ouput state (lock id, mode of lock
and time in ns spent waiting for lock in a 10-sec snapshot (Just taking
the last few top ones in ascending order):

With less than 20 users it is WALInsert at the top:
52 Exclusive  721950129
4  Exclusive  768537190
46 Exclusive  842063837
7  Exclusive 1031851713

With 35 Users:
52 Exclusive 2599074739
4 Exclusive  2647927574
46 Exclusive 2789581991
7 Exclusive  3220008691

At the peak at about 50 users that I saw earlier (PEAK Throughput):
46 Exclusive  3669210393
4  Exclusive  6024966938
52 Exclusive  6529168107
7  Exclusive  9408290367

With about 60 users where the throughput actually starts to drop
(throughput drops)
41 Exclusive   4570660567
52 Exclusive  10706741643
46 Exclusive  13152005125
 4 Exclusive  13550187806
 7 Exclusive  22146882562


With about 100 users   ( below the peak value)
42 Exclusive    4238582775
46 Exclusive    6773515243
7  Exclusive    7467346038
52 Exclusive    9846216440
4  Shared      22528501166
4  Exclusive  223043774037

So it seems when both shared and exclusive time for ProcArrayLock wait
are the top 2 it is basically saturated in terms of throughput it can
handle.

Optimizing wait queues will help improve shared which might help
Exclusive a bit but eventually Exclusive for ProcArray will limit
scaling with as few as 60-70 users.


Lock hold times are below (though taken from different run)
with 30 users:

             Lock Id            Mode   Combined Time (ns)
             1616992       Exclusive           1199791629
                   4       Exclusive           1399371867
                  34       Exclusive           1426153620
             1616978       Exclusive           1528327035
             1616990       Exclusive           1546374298
             1616988       Exclusive           1553461559
                   5       Exclusive           2477558484

With 50+ users
             Lock Id            Mode   Combined Time (ns)
                   4       Exclusive           1438509198
             1616992       Exclusive           1450973466
             1616978       Exclusive           1505626978
             1616990       Exclusive           1850432217
             1616988       Exclusive           2033226225
                  34       Exclusive           2098542547
                   5       Exclusive           3280151374

With 100 users

             Lock Id            Mode   Combined Time (ns)
             1616992       Exclusive           1206516505
             1616988       Exclusive           1486704087
             1616990       Exclusive           1521900997
                  34       Exclusive           1532815803
             1616978       Exclusive           1541986895
                   5       Exclusive           2179043424
                   5                           2395098279

(Why 5 was printing with blank??)
Rerunning it with slight variation of the script


             Lock Id            Mode   Combined Time (ns)
             1616996               0           1167708953
                  36               0           1291958451
                   5      4299305160           1344486968
                   4               0           1347557908
             1616978               0           1377931882
                  34               0           1724752938
                   5               0           2079012548

Looks like trend of 4's hold time looks similar to previous ones..
though the new kid is 5 with mode <> 0,1 .. not sure if that is causing
problems..What mode is "4299305160" for Lock 5 (SInvalLock) ? Anyway at
this point the wait time for 4 increases to a point where the database
is not scaling anymore

any thoughts?


-Jignesh




pgsql-performance by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: 2GB or not 2GB
Next
From: Gregory Stark
Date:
Subject: Re: ProcArrayLock (The Saga continues)