Thread: Why are we waiting?

Why are we waiting?

From
Simon Riggs
Date:
We've got various facilities now for looking at LWLock waits, but I'd
like to have more information about the *reasons* for lock waits.

I know its possible to get backtraces in Dtrace at various tracepoints
but that can be fairly hard to interpret.

I'm thinking of adding an extra parameter onto every call to
LockBuffer() and LWLockAcquire() to explain the reason for the lock
request.

Then when we analyze lock waits we can apportion the lock wait reason
correctly and determine not just where the waits are happening, but
*why* the waits are happening. Activity Based Costing the beanies call
it.

This will be especially helpful with transitory events that might (or
might not) occur fairly frequently, yet the point of contention moves
around within the server. There's a few hotspot reasons that move
around, plus I'm certain there's one or two we have no idea about.

This won't help much for highly contended locks where a lock queue of
100 might have 99 fast lock holders and 1 slow one. But it will help for
the transitory locking where a lock is frequently not held, yet we want
to explain what was happening when the lock *was* held.

Reason would be an enum value and reporting would probably be covered by
LWLOCK_STATS.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> I'm thinking of adding an extra parameter onto every call to
> LockBuffer() and LWLockAcquire() to explain the reason for the lock
> request.

Maybe I'm missing something, but I don't see what this would buy us,
except for being able to track which call site resulted in a wait;
which can already be measured with dtrace, no?

I'm hesitant to decorate such widely-used functions with extra tracing
information.  You'd be breaking every third-party module and pending
patch that uses either function, and imposing some performance penalty
(probably not much, but it's hard to be sure) into the indefinite
future, for a performance measurement need that might be fleeting.

Basically I'd rather try to attack the problem with dtrace ...
        regards, tom lane


Re: Why are we waiting?

From
Simon Riggs
Date:
On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:

> Basically I'd rather try to attack the problem with dtrace ...

OK. I'll switch to Solaris. Or do you something I don't about dtrace on
linux?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
"Heikki Linnakangas"
Date:
Simon Riggs wrote:
> On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:
> 
>> Basically I'd rather try to attack the problem with dtrace ...
> 
> OK. I'll switch to Solaris. Or do you something I don't about dtrace on
> linux?

One idea would be to add new arguments to LWLockAcquire as you suggest, 
but instead of modifying all call sites, decorate it with a macro that 
passes __FILE__ and __LINE__ as the extra arguments. The good thing 
about that is that it's a relatively small patch and you can easily 
switch it on/off with a #ifdef. And there's no need to push for 
inclusion of that into CVS, because it would be an easy patch to 
maintain yourself.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Why are we waiting?

From
Simon Riggs
Date:
On Tue, 2008-02-05 at 14:14 +0000, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:
> > 
> >> Basically I'd rather try to attack the problem with dtrace ...
> > 
> > OK. I'll switch to Solaris. Or do you something I don't about dtrace on
> > linux?
> 
> One idea would be to add new arguments to LWLockAcquire as you suggest, 
> but instead of modifying all call sites, decorate it with a macro that 
> passes __FILE__ and __LINE__ as the extra arguments. The good thing 
> about that is that it's a relatively small patch and you can easily 
> switch it on/off with a #ifdef. And there's no need to push for 
> inclusion of that into CVS, because it would be an easy patch to 
> maintain yourself.

Thanks for the idea. It had occurred to me to make a private patch, but
I prefer my patches to be open, so they're easier to discuss results
arising from them.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote:
>> Basically I'd rather try to attack the problem with dtrace ...

> OK. I'll switch to Solaris. Or do you something I don't about dtrace on
> linux?

Nope :-(.  The SystemTap guys keep promising support for userspace
probes but it's not there yet.
        regards, tom lane


Re: Why are we waiting?

From
Staale Smedseng
Date:
On Mon, 2008-02-04 at 19:46, Simon Riggs wrote:
> We've got various facilities now for looking at LWLock waits, but I'd
> like to have more information about the *reasons* for lock waits.
> 
> I know its possible to get backtraces in Dtrace at various tracepoints
> but that can be fairly hard to interpret.

Simon,

A couple of guys here at Sun have started looking at trying to improve
the scaling on Sun's T2000 Niagara servers (single core, 32 HW-threads).

We mostly use various DTrace scripts to monitor locking usage. We'd be
happy to share results, probes and DTrace scripts for monitoring if you
like.

So far we've found lock contention (especially for the ProcArrayLock) to
be the likely source for our inability to saturate the CPU with a TPC-C
like OLTP load with >1000 users.

The following shows average time spent in and waiting for exclusive
ProcArrayLock vs the time used in the transaction through commit, (i.e.,
up until the exclusive ProcArrayLock acquire to update the PGPROC
setting transaction no longer running):
 # ./pg-lwlock-procarray.d $(pgrep -n postgres)
             postgres`LWLockAcquire             postgres`ProcArrayEndTransaction+0x10
postgres`CommitTransaction+0xf0            postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60            postgres`exec_execute_message+0x3b4
postgres`PostgresMain+0x13a0            postgres`BackendRun+0x27c             postgres`BackendStartup+0xe0
postgres`ServerLoop+0x1a0            postgres`PostmasterMain+0xcbc             postgres`main+0x1d8
postgres`_start+0x108             23 avg lwlock acquire service time [ns]                 193051989 transaction-commit
[count]                                 23 transaction-start [count]                                   23 avg
transactiontime [ns]                             12763079
 

The stack trace shows that the only time the lock is acquired
exclusively is from the call to ProcArrayEndTransaction() in
CommitTransaction().

Also, an interesting observation is that the hot locks seem to have
changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
the following outputs:

PostgreSQL 8.2 (32-bit):
 -bash-3.00# ./825_lwlock_wait.d
            Lock Id            Mode           Count   FirstLockMgrLock          Shared               2
ProcArrayLock         Shared             209         XidGenLock       Exclusive            1030         XidGenLock
   Shared            1215      WALInsertLock       Exclusive            3942    CLogControlLock          Shared
  4113      ProcArrayLock       Exclusive            6929       WALWriteLock       Exclusive           17155
CLogControlLock      Exclusive          128182
 
            Lock Id   Combined Time (ns)   FirstLockMgrLock                24705      WALInsertLock
79644210        XidGenLock            179886846      ProcArrayLock           1486950738       WALWriteLock
18425400504   CLogControlLock        1507388036453
 



PostgreSQL 8.3 (64-bit):
 -bash-3.00# ./83_lwlock_wait.d
            Lock Id            Mode           Count         SInvalLock       Exclusive               1
WALWriteLock      Exclusive               1         SInvalLock          Shared              20    CLogControlLock
  Shared             534    CLogControlLock       Exclusive             845         XidGenLock       Exclusive
 1140      ProcArrayLock          Shared            1821      WALInsertLock       Exclusive           17043
ProcArrayLock      Exclusive           49762
 
            Lock Id            Mode   Combined Time (ns)         SInvalLock       Exclusive                17216
SInvalLock          Shared               531129       WALWriteLock       Exclusive              2003064
CLogControlLock         Shared             61325819    CLogControlLock       Exclusive             73509195
XidGenLock      Exclusive            929477929      WALInsertLock       Exclusive          17941476941
ProcArrayLock         Shared          31114676303      ProcArrayLock       Exclusive         888356047549
 



Regards,
Staale Smedseng
Database Technology Group, Sun Microsystems




Re: Why are we waiting?

From
Simon Riggs
Date:
On Wed, 2008-02-06 at 15:30 +0100, Staale Smedseng wrote:
> On Mon, 2008-02-04 at 19:46, Simon Riggs wrote:
> > We've got various facilities now for looking at LWLock waits, but I'd
> > like to have more information about the *reasons* for lock waits.
> > 
> > I know its possible to get backtraces in Dtrace at various tracepoints
> > but that can be fairly hard to interpret.

Thanks for this Staale.

>              Lock Id            Mode           Count
>        ProcArrayLock          Shared            1821
>        ProcArrayLock       Exclusive           49762
> 
>              Lock Id            Mode   Combined Time (ns)
>        ProcArrayLock          Shared          31114676303
>        ProcArrayLock       Exclusive         888356047549

Which looks like a mean service time of 17ms for X lock waiters.

What is the frequency distribution of lock wait time on ProcArrayLock?
Does the distribution vary over time?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Gregory Stark
Date:
"Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:

> The stack trace shows that the only time the lock is acquired
> exclusively is from the call to ProcArrayEndTransaction() in
> CommitTransaction().

I'm not sure but I think that's only true in 8.3. As I understood it in 8.2
transaction start also needed the exclusive lock.

> Also, an interesting observation is that the hot locks seem to have
> changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
> the following outputs:
>
> PostgreSQL 8.2 (32-bit):
>...
> PostgreSQL 8.3 (64-bit):
>...

I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
could have a problem with cache line aliasing on only one or the other for
example.

But that is a pretty striking difference. Does the 8.3 run complete more
transactions in that time?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: Why are we waiting?

From
Staale Smedseng
Date:
> What is the frequency distribution of lock wait time on ProcArrayLock?

See below for wait time distributions for ProcArrayLock (both shared and
exclusive). The time measured is from entry into LWLockAcquire() until
return. I've recorded the same data in two different resolutions (ms,
and us for the lower part of the distribution). The DTrace script is at
the bottom.

These results are for 1000 TPC-C like clients, and measured over the
1000 PostgreSQL processes over a period of 10 seconds.

> Does the distribution vary over time?

Hmm... I will have to get back to you on that one.

Staale

CPU     ID                    FUNCTION:NAME 6  71245                      :tick-10sec 
 Total LW_EXCLUSIVE                                            25178 Total Transaction Starts
          25679 Total LW_SHARED                                              107211 LW_SHARED [ms]
                       value  ------------- Distribution ------------- count                < 0 |
                 0                      0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   100565                10 |
                              1                     20 |                                         0
30|                                         0                     40 |                                         0
            50 |                                         0                     60 |
   0                     70 |                                         6                     80 |
                304                   90 |@                                        1370                 100 |@
                             2685                 110 |@                                        1731
120|                                         307                  130 |                                         13
            140 |                                         0                    150 |
    0                    160 |                                         5                    170 |
                 4                    180 |                                         26                   190 |
                              36                   200 |                                         24
210|                                         61                   220 |                                         49
            230 |                                         15                   240 |
    0                    250 |                                         0                    260 |
                 0                    270 |                                         0                    280 |
                              2                    290 |                                         2
300|                                         0                    310 |                                         2
            320 |                                         1                    330 |
    1                    340 |                                         0                    350 |
                 0                    360 |                                         0                    370 |
                              0                    380 |                                         0
390|                                         0                    400 |                                         1
            410 |                                         0        
 
 LW_EXCLUSIVE [ms]                                           value  ------------- Distribution ------------- count
         < 0 |                                         0                      0 |@@
 1565                  10 |                                         0                     20 |
              0                     30 |                                         0                     40 |
                           0                     50 |                                         0                     60
|                                        0                     70 |                                         16
         80 |@                                        894                   90 |@@@@@@@
4108                 100 |@@@@@@@@@@@@@                            8090                 110 |@@@@@@@@
             4863                 120 |@                                        595                  130 |
                          64                   140 |                                         14                   150 |
                                       0                    160 |                                         4
      170 |                                         93                   180 |
262                 190 |@                                        549                  200 |@
            854                  210 |@@                                       976                  220 |@
                         770                  230 |                                         268                  240 |
                                      40                   250 |                                         2
     260 |                                         9                    270 |
29                  280 |                                         46                   290 |
            94                   300 |                                         134                  310 |
                         169                  320 |                                         164                  330 |
                                      146                  340 |                                         94
     350 |                                         13                   360 |                                         2
                  370 |                                         8                    380 |
          15                   390 |                                         17                   400 |
                       20                   410 |                                         37                   420 |
                                    27                   430 |                                         32
   440 |                                         29                   450 |                                         8
                460 |                                         4                    470 |
        2                    480 |                                         1                    490 |
                     3                 >= 500 |                                         48       
 
 LW_SHARED [us]                                              value  ------------- Distribution ------------- count
           0 |                                         0                     10 |
 39                    20 |@@@@@@@@@@@@                             31038                 30 |@@@@@@@@@@@@@@@@@@@@@@@@@
              66010                 40 |@                                        2409                  50 |
                           103                   60 |                                         145                   70
|                                        85                    80 |                                         68
         90 |                                         57                   100 |
40                   110 |                                         33                   120 |
             37                   130 |                                         24                   140 |
                          15                   150 |                                         35                   160 |
                                       26                   170 |                                         35
      180 |                                         24                   190 |
16                  200 |                                         12                   210 |
            13                   220 |                                         13                   230 |
                         8                    240 |                                         9                    250 |
                                      3                    260 |                                         4
     270 |                                         3                    280 |                                         2
                  290 |                                         4                    300 |
          3                    310 |                                         3                    320 |
                       3                    330 |                                         2                    340 |
                                    0                    350 |                                         1
   360 |                                         2                    370 |                                         0
                380 |                                         0                    390 |
        2                    400 |                                         1                    410 |
                     2                    420 |                                         0                    430 |
                                  0                    440 |                                         1
 450 |                                         0                    460 |                                         5
              470 |                                         1                    480 |
      1                    490 |                                         4                    500 |
                   6                    510 |                                         1                    520 |
                                1                    530 |                                         0
540|                                         1                    550 |                                         1
            560 |                                         2                    570 |
    0                    580 |                                         0                    590 |
                 0                    600 |                                         0                    610 |
                              1                    620 |                                         0
630|                                         0                    640 |                                         0
            650 |                                         0                    660 |
    0                    670 |                                         1                    680 |
                 0                    690 |                                         2                    700 |
                              1                    710 |                                         1
720|                                         3                    730 |                                         0
            740 |                                         1                    750 |
    0                    760 |                                         0                    770 |
                 0                    780 |                                         0                    790 |
                              0                    800 |                                         0
810|                                         0                    820 |                                         0
            830 |                                         0                    840 |
    0                    850 |                                         0                    860 |
                 1                    870 |                                         0                    880 |
                              0                    890 |                                         0
900|                                         0                    910 |                                         0
            920 |                                         1                    930 |
    0                    940 |                                         0                    950 |
                 2                    960 |                                         0                    970 |
                              0                    980 |                                         1
990|                                         0                >= 1000 |@@@                                      6843

 
 LW_EXCLUSIVE [us]                                           value  ------------- Distribution ------------- count
          10 |                                         0                     20 |@
 357                   30 |@@                                       1166                  40 |
              35                    50 |                                         1                     60 |
                           1                     70 |                                         1                     80
|                                        1                     90 |                                         0
        100 |                                         0                    110 |
0                    120 |                                         0                    130 |
             1                    140 |                                         0                    150 |
                          0                    160 |                                         0                    170 |
                                       0                    180 |                                         0
      190 |                                         0                    200 |
0                   210 |                                         0                    220 |
            0                    230 |                                         0                    240 |
                         0                    250 |                                         0                    260 |
                                      0                    270 |                                         0
     280 |                                         0                    290 |                                         0
                  300 |                                         0                    310 |
          0                    320 |                                         0                    330 |
                       0                    340 |                                         0                    350 |
                                    0                    360 |                                         0
   370 |                                         0                    380 |                                         0
                390 |                                         0                    400 |
        0                    410 |                                         0                    420 |
                     0                    430 |                                         0                    440 |
                                  0                    450 |                                         0
 460 |                                         0                    470 |                                         0
              480 |                                         0                    490 |
      0                    500 |                                         0                    510 |
                   0                    520 |                                         0                    530 |
                                0                    540 |                                         0
550|                                         0                    560 |                                         0
            570 |                                         0                    580 |
    0                    590 |                                         0                    600 |
                 0                    610 |                                         0                    620 |
                              0                    630 |                                         0
640|                                         0                    650 |                                         0
            660 |                                         0                    670 |
    0                    680 |                                         0                    690 |
                 0                    700 |                                         0                    710 |
                              0                    720 |                                         0
730|                                         0                    740 |                                         0
            750 |                                         0                    760 |
    0                    770 |                                         0                    780 |
                 0                    790 |                                         0                    800 |
                              0                    810 |                                         0
820|                                         0                    830 |                                         0
            840 |                                         0                    850 |
    0                    860 |                                         0                    870 |
                 0                    880 |                                         0                    890 |
                              0                    900 |                                         0
910|                                         0                    920 |                                         0
            930 |                                         0                    940 |
    0                    950 |                                         0                    960 |
                 0                    970 |                                         0                    980 |
                              0                    990 |                                         0                >=
1000|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   23615    
 


--------------------------------------------------------------

#!/usr/sbin/dtrace -s

/* * Show ProcArrayLock wait time ** $Id$*/

BEGIN
{       self->ts = 0;
}

tick-10sec
{       exit(0);
}


/** Accumulate average time spent from start of LWLockAcquire()* until return from LWLockAcquire().*/
postgresql$1:::lwlock-acquire-begin
/ arg0 == 4 /
{       self->ts = timestamp;       self->mode = arg1;
}

postgresql$1:::lwlock-acquire-end
/ self->ts /
{       @acqcount[self->mode==0?"Total LW_EXCLUSIVE":"Total LW_SHARED"]            = count();
       /* whole frequency distribution 0-500ms */       @wholedist[self->mode==0?"LW_EXCLUSIVE [ms]":"LW_SHARED [ms]"]
         = lquantize((timestamp - self->ts)/1000000, 0, 500, 10);
 
       /* the lower part of the distribution in detail 0-1000us */       @lowdist[self->mode==0?"LW_EXCLUSIVE
[us]":"LW_SHARED[us]"]            = lquantize((timestamp - self->ts)/1000, 0, 1000, 10);
 
       self->ts = 0;
}

postgresql$1:::transaction-start
{       @acqcount["Total Transaction Starts"] = count();
}




Re: Why are we waiting?

From
Simon Riggs
Date:
On Wed, 2008-02-06 at 18:44 +0100, Staale Smedseng wrote:
> > What is the frequency distribution of lock wait time on ProcArrayLock?
> 
> See below for wait time distributions for ProcArrayLock (both shared and
> exclusive). The time measured is from entry into LWLockAcquire() until
> return. I've recorded the same data in two different resolutions (ms,
> and us for the lower part of the distribution). The DTrace script is at
> the bottom.
> 
> These results are for 1000 TPC-C like clients, and measured over the
> 1000 PostgreSQL processes over a period of 10 seconds.

Thanks! Interesting resonance pattern.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Staale Smedseng
Date:
> I'm not sure 32-bit and 64-bit cases are going to be directly  
> comparable. We
> could have a problem with cache line aliasing on only one or the  
> other for
> example.

Agreed, this is likely comparing apples and oranges. I'll see if I can  
get a one-to-one comparison done (these were the numbers I had close  
by when writing the email).

> But that is a pretty striking difference. Does the 8.3 run complete  
> more
> transactions in that time?

I'll make sure to include those numbers as well. :)

Staale


Re: Why are we waiting?

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:
>> Also, an interesting observation is that the hot locks seem to have
>> changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
>> the following outputs:
>> 
>> PostgreSQL 8.2 (32-bit):
>> ...
>> PostgreSQL 8.3 (64-bit):
>> ...

> I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
> could have a problem with cache line aliasing on only one or the other for
> example.

Yeah, I find these numbers highly dubious.  AFAIR we didn't do anything
that would have reduced CLogControlLock contention, and we definitely
did work to reduce ProcArrayLock contention, so the claimed results seem
directly opposite to expectation.  I am wondering if the waits are being
attributed to the right locks --- I remember such an error in a previous
set of dtrace results, and some of the other details such as claiming
shared lock delays but no exclusive lock delays for FirstLockMgrLock
seem less than credible as well.
        regards, tom lane


Re: Why are we waiting?

From
Simon Riggs
Date:
On Wed, 2008-02-06 at 13:55 -0500, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Staale Smedseng" <Staale.Smedseng@Sun.COM> writes:
> >> Also, an interesting observation is that the hot locks seem to have
> >> changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
> >> the following outputs:
> >> 
> >> PostgreSQL 8.2 (32-bit):
> >> ...
> >> PostgreSQL 8.3 (64-bit):
> >> ...
> 
> > I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We
> > could have a problem with cache line aliasing on only one or the other for
> > example.
> 
> Yeah, I find these numbers highly dubious.  AFAIR we didn't do anything
> that would have reduced CLogControlLock contention, and we definitely
> did work to reduce ProcArrayLock contention, so the claimed results seem
> directly opposite to expectation.  I am wondering if the waits are being
> attributed to the right locks --- I remember such an error in a previous
> set of dtrace results, and some of the other details such as claiming
> shared lock delays but no exclusive lock delays for FirstLockMgrLock
> seem less than credible as well.

There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
seems believable that there were 0 lock delays in EXCLUSIVE mode.

I assumed that Staale is running with clog buffers tweaked? Can you
confirm Staale?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
> seems believable that there were 0 lock delays in EXCLUSIVE mode.

Not really, considering the extremely limited use of LW_SHARED in lock.c
(GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
GetLockStatusData only by the pg_locks view).  For the type of benchmark
that I gather this is, there should be *zero* LW_SHARED acquisitions at
all.  And even if there are some, they could only be blocking against
the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
very credible that there is zero contention among the LW_EXCLUSIVE locks
yet a few shared acquirers manage to get burnt.
        regards, tom lane


Re: Why are we waiting?

From
Simon Riggs
Date:
On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it
> > seems believable that there were 0 lock delays in EXCLUSIVE mode.
> 
> Not really, considering the extremely limited use of LW_SHARED in lock.c
> (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
> GetLockStatusData only by the pg_locks view).  For the type of benchmark
> that I gather this is, there should be *zero* LW_SHARED acquisitions at
> all.  And even if there are some, they could only be blocking against
> the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
> very credible that there is zero contention among the LW_EXCLUSIVE locks
> yet a few shared acquirers manage to get burnt.

...but the total wait time on those lock waits was 24 microseconds. I
hardly call that burnt.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote:
>> Not really, considering the extremely limited use of LW_SHARED in lock.c
>> (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and
>> GetLockStatusData only by the pg_locks view).  For the type of benchmark
>> that I gather this is, there should be *zero* LW_SHARED acquisitions at
>> all.  And even if there are some, they could only be blocking against
>> the (undoubtedly much more frequent) LW_EXCLUSIVE acquisitions; it's not
>> very credible that there is zero contention among the LW_EXCLUSIVE locks
>> yet a few shared acquirers manage to get burnt.

> ...but the total wait time on those lock waits was 24 microseconds. I
> hardly call that burnt.

What you are failing to grasp is that the data is simply not credible
(unless perhaps Staale fesses up that his benchmark includes a whole lot
of pg_locks monitoring, in which case I'd want to see it redone without
anyway).
        regards, tom lane


Re: Why are we waiting?

From
Staale Smedseng
Date:
On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
> I am wondering if the waits are being
> attributed to the right locks --- I remember such an error in a previous
> set of dtrace results, and some of the other details such as claiming
> shared lock delays but no exclusive lock delays for FirstLockMgrLock
> seem less than credible as well.

Good catch. We've checked the DTrace scripts against the respective
versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
the results for FirstBufMappingLock). 

However, this is the last lock in the enum that we trace, the other
lower-numbered lock enums were correctly mapped. (In particular the
ProcArrayLock which we've been puzzled by.)

And also, there was no clog buffer twaking or pg_lock monitoring done
when this benchmark was run, AFAIK.

We'll redo the benchmarks and post new scripts and results.

Staale




Re: Why are we waiting?

From
Tom Lane
Date:
Staale Smedseng <Staale.Smedseng@Sun.COM> writes:
> Good catch. We've checked the DTrace scripts against the respective
> versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
> the results for FirstBufMappingLock). 

> However, this is the last lock in the enum that we trace, the other
> lower-numbered lock enums were correctly mapped. (In particular the
> ProcArrayLock which we've been puzzled by.)

Hmm, do you mean that you're just neglecting to collect any stats about
all the dynamically assigned locks?  That doesn't seem like it's going
to offer a very good view of what's going on.  I think the most useful
stats would include aggregate wait data for all the lockmgr locks,
all the bufmapping locks, all the buffer content locks, and all the
buffer I/O locks.  We'd like to think that contention for those would
be low because of the partitioning ... but the point of measuring is
to find out, not just hope.
        regards, tom lane


Re: Why are we waiting?

From
Simon Riggs
Date:
On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote:
> On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
> > I am wondering if the waits are being
> > attributed to the right locks --- I remember such an error in a previous
> > set of dtrace results, and some of the other details such as claiming
> > shared lock delays but no exclusive lock delays for FirstLockMgrLock
> > seem less than credible as well.
> 
> Good catch. We've checked the DTrace scripts against the respective
> versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
> the results for FirstBufMappingLock). 

I just realised you are using a lookup to get the text for the name of
the lock. You used the same lookup table for both releases?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: Why are we waiting?

From
"Jignesh K. Shah"
Date:
I dont think my earlier message got through..

We use separate lookup tables for 825 and 83 based on the respective 
lwlock.h for that version.


-Jignesh


Simon Riggs wrote:
> On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote:
>   
>> On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
>>     
>>> I am wondering if the waits are being
>>> attributed to the right locks --- I remember such an error in a previous
>>> set of dtrace results, and some of the other details such as claiming
>>> shared lock delays but no exclusive lock delays for FirstLockMgrLock
>>> seem less than credible as well.
>>>       
>> Good catch. We've checked the DTrace scripts against the respective
>> versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
>> the results for FirstBufMappingLock). 
>>     
>
> I just realised you are using a lookup to get the text for the name of
> the lock. You used the same lookup table for both releases?
>
>   


Re: Why are we waiting?

From
Staale Smedseng
Date:
On Thu, 2008-02-07 at 18:12, Simon Riggs wrote:

> I just realised you are using a lookup to get the text for the name of
> the lock. You used the same lookup table for both releases?

Oh, it wasn't quite that bad. :-) The two DTrace scripts had been
revised to correspond with the two different declarations of LWLockId in
lwlock.h (for 8.2.5 and 8.3 respectively). But somehow the value for the
last lock in our lookup table corresponding to
 FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,

in the enum got turned into n+1 instead of n+NUM_BUFFER_PARTITIONS. The
other locks should have been output correctly, however.

But as Tom pointed out, the dynamic locks were not in the equation. So
now we're measuring all lock waits instead of assuming. :-)

Staale




Re: Why are we waiting?

From
"Jignesh K. Shah"
Date:
Last try for the script/results (truncating less significant portions of 
output which are too big)



Staale Smedseng wrote:
>
> her locks should have been output correctly, however.
>
> But as Tom pointed out, the dynamic locks were not in the equation. So
> now we're measuring all lock waits instead of assuming. :-)
>
> Staale
>
>
>   


Here are some recent runs on 4-quadcore xeons  using PostgreSQL 8.3.0 
with 10-second runs

Hope it gets through this time

-Jignesh

for about 500users :          Lock Id            Mode           Count               64          Shared             754
            53       Exclusive             846               43       Exclusive            1379               11
 Shared            2038                3       Exclusive            2376               47       Exclusive
2900               4          Shared            3194               42       Exclusive            3389               50
    Exclusive            3689               41       Exclusive            5280               11       Exclusive
  6560               48       Exclusive           10837                7       Exclusive           20599
44      Exclusive           29193               46       Exclusive           30299                4       Exclusive
     66303
 
          Lock Id            Mode   Combined Time (ns)               43       Exclusive           3440644758
  47       Exclusive           6392480847               50       Exclusive           7199215298               42
Exclusive          7865126413               11       Exclusive           8503940437                4          Shared
      9142071412               61          Shared          10613733376               41       Exclusive
14380504651              48       Exclusive          16631996177                7       Exclusive          17485716438
            46       Exclusive          61584135053               44       Exclusive          68889040292
4       Exclusive         177458486936
 



For about 700 Users:

          Lock Id            Mode           Count                3       Exclusive            2873                4
    Shared            3436               47       Exclusive            3555               11          Shared
4098              50       Exclusive            4392               42       Exclusive            4573               11
    Exclusive            6209               41       Exclusive            7552               48       Exclusive
 12335                7       Exclusive           22469               44       Exclusive           36987
46      Exclusive           38467                4       Exclusive           81453
 
          Lock Id            Mode   Combined Time (ns)                7       Exclusive          12033761450
  43       Exclusive          12217953092               50       Exclusive          15454797539               47
Exclusive         15684068953               42       Exclusive          19365161311                4          Shared
     27038955376               48       Exclusive          32990318986               41       Exclusive
48350689916              46       Exclusive         193727946721               44       Exclusive         212025745038
             4       Exclusive         713263386624
 

At 1000 users

          Lock Id            Mode           Count               59          Shared           10475               58
    Shared           11062               57          Shared           13726                7       Exclusive
18548              44       Exclusive           29714               46       Exclusive           33886                4
     Exclusive           74773               11       Exclusive           79596
 
          Lock Id            Mode   Combined Time (ns)               43       Exclusive           9067359926
  47       Exclusive          12259067669               50       Exclusive          13239372833               42
Exclusive         16534292830                7       Exclusive          23505490039               48       Exclusive
     24991948402               41       Exclusive          33874749560               11          Shared
43963854482               4          Shared          64098606143               63          Shared         130988696365
            64          Shared         137865936811               61          Shared         140978086498
59         Shared         161661023016               62          Shared         163857754020               58
Shared        167946299990               44       Exclusive         220719773416               57          Shared
 245170386594               46       Exclusive         326272389569               11       Exclusive
1312009781095               4       Exclusive        2403450038504
 



4 is ProcArray..
ClogControlLock is not completely out but I dont see it consistently 
like ProcArrayLock.  It would show up in few cases if I continue taking 
different snapshots of the output (maybe when Checkpoint kicks in or 
something , I dont know..)

          Lock Id            Mode           Count                3       Exclusive             160                7
 Exclusive             396               11          Shared             828               44       Exclusive
991               46       Exclusive            1093                4       Exclusive            1550               62
       Shared           20332               57          Shared           21244               64          Shared
 21492               63          Shared           21791               61          Shared           22132
58         Shared           23747               60          Shared           26230               11       Exclusive
    158168
 
          Lock Id            Mode   Combined Time (ns)            64864          Shared           3743818402
10830         Shared           3835685469            89688          Shared           4164484448            84354
 Shared           4268747714              870          Shared           4376161411            17078          Shared
     4712788740               11          Shared          17718901870               62          Shared
331016914134              57          Shared         340929880655               64          Shared         355793689545
             63          Shared         362464073193               61          Shared         362561852786
58          Shared         388597307969               60          Shared         430378269549               11
Exclusive       2641836280591
 



Now the script


# cat 83_all_wait.d
#!/usr/sbin/dtrace -qs

/*
* Usage      : 83_all_wait.d
* Description: Show number and total time of LWLock waits by lock id and 
mode
* Author     : Jignesh Shah
*/

dtrace:::BEGIN
{     lckmode[0] = "Exclusive";     lckmode[1] = "Shared";
/* Only for reference not used... Refer to src/include/storage/lwlock.h */     lckid[0] = "BufFreelistLock";
lckid[1]= "ShmemIndexLock";     lckid[2] = "OidGenLock";     lckid[3] = "XidGenLock";     lckid[4] = "ProcArrayLock";
 lckid[5] = "SInvalLock";     lckid[6] = "FreeSpaceLock";     lckid[7] = "WALInsertLock";     lckid[8] =
"WALWriteLock";    lckid[9] = "ControlFileLock";     lckid[10] = "CheckpointLock";     lckid[11] = "CLogControlLock";
 lckid[12] = "SubtransControlLock";     lckid[13] = "MultiXactGenLock";     lckid[14] = "MultiXactOffsetControlLock";
 lckid[15] = "MultiXactMemberControlLock";     lckid[16] = "RelCacheInitLock";     lckid[17] = "BgWriterCommLock";
lckid[18]= "TwoPhaseStateLock";     lckid[19] = "TablespaceCreateLock";     lckid[20] = "BtreeVacuumLock";
lckid[21]= "AddinShmemInitLock";     lckid[22] = "AutovacuumLock";     lckid[23] = "AutovacuumScheduleLock";
lckid[24]= "SyncScanLock";     lckid[25] = "FirstBufMappingLock";     lckid[26] = "FirstLockMgrLock";
 

}

postgresql*:::lwlock-startwait
{     self->ts[arg0]=timestamp;     @count[arg0, lckmode[arg1]] = count();
}

postgresql*:::lwlock-endwait
/self->ts[arg0]/
{ @time[arg0 ,lckmode[arg1]] = sum (timestamp - self->ts[arg0]); self->ts[arg0]=0;
}

dtrace:::END {     printf("\n%20s %15s %15s\n", "Lock Id", "Mode", "Count");     printa("%20d %15s %@15d\n",@count);
     printf("\n%20s %15s %20s\n", "Lock Id","Mode", "Combined Time (ns)");     printa("%20d %15s %@20d\n",@time);
}

tick-10sec
{     exit(0);
}










Re: Why are we waiting?

From
Gregory Stark
Date:
"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:

> for about 500users :
> For about 700 Users:
> At 1000 users

This is a tangent but are these actual Postgres processes? What's the logic
behind trying to run a 1,000 processes on a box with 16 cpus? They're all just
going to be queuing up for i/o requests or sitting runnable waiting for a
timeslice.

Was this with your patch to raise the size of the clog lru?

What is MaxBackends actually set to for the runs. Is it the same even when
you're actually running with fewer backends?


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


Re: Why are we waiting?

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> This is a tangent but are these actual Postgres processes? What's the logic
> behind trying to run a 1,000 processes on a box with 16 cpus?

We should certainly be careful about trying to eliminate contention in
this scenario at the cost of making things slower in more normal cases,
but it seems interesting to stress the system just to see what happens.

> Was this with your patch to raise the size of the clog lru?

That's an important question.

> What is MaxBackends actually set to for the runs.

That I think is not.  I'm fairly sure there are no performance-relevant
paths in which cost is driven by MaxBackends rather than the actual
current number of live backends.  Certainly nothing in or around the
ProcArray would act that way.
        regards, tom lane


Re: Why are we waiting?

From
"Jignesh K. Shah"
Date:

Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>   
>> This is a tangent but are these actual Postgres processes? What's the logic
>> behind trying to run a 1,000 processes on a box with 16 cpus?
>>     
>
> We should certainly be careful about trying to eliminate contention in
> this scenario at the cost of making things slower in more normal cases,
> but it seems interesting to stress the system just to see what happens.
>
>   
>> Was this with your patch to raise the size of the clog lru?
>>     
>
> That's an important question.
>
>   
>> What is MaxBackends actually set to for the runs.
>>     
>
> That I think is not.  I'm fairly sure there are no performance-relevant
> paths in which cost is driven by MaxBackends rather than the actual
> current number of live backends.  Certainly nothing in or around the
> ProcArray would act that way.
>
>   
>             regards, tom lane
>   


I guess I was not clear.. It was PostgreSQL 8.3.0 (with no source code 
change)
I had compiled it 64-bit with DTRACE enabled.
max-backend was set to 1500 But I dont think that causes any thing to 
work slow. But yes the connections are "pre-opened" in the sense when 
500 users are actively doing work there are about 1006 postgresql 
processes running.

Yes I think I am taking the database to the extreme. But generally there 
is some THINK time of 50ms involved so there are time slices available 
for other users. Yes Commercial DB can also do pretty well on such 
systems so its not unrealistic to expect that PostgreSQL cannot perform 
here.

The old idea of stress testing it is to prove that it can go beyond 
these 16cores infact our target is about 64-cores soon.

Regards,
Jignesh