Thread: LWLOCK_STATS

LWLOCK_STATS

From
Robert Haas
Date:
It's been a while since I did any testing with LWLOCK_STATS defined,
so I thought it might be about time to do that again and see how
things look.  Here's how they looked back in July:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php

Here are the results from a test I ran today on latest sources, again
on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
permanent tables, my usual config settings.  Somewhat depressingly,
virtually all of the interesting activity still centers around the
same three locks that were problematic back then, which means that -
although overall performance has improved quite a bit - we've not
really delivered any knockout punches.  Here are the perpetrators:

lwlock 4: shacq 26160717 exacq 2690379 blk 1129763
lwlock 11: shacq 97074202 exacq 2699639 blk 1482737
lwlock 7: shacq 0 exacq 16522284 blk 2926957
grand total: shacq 225919534 exacq 77179954 blk 6218570

There is some change in how the contention is distributed.  Taking the
number of times a request for each lock blocked as a percentage of the
total number of lock acquisitions that blocked, we get this:

WALInsertLock - July 2011: 35%, January 2012: 47%
CLogControLock - July 2011: 23%, January 2012: 24%
ProcArrayLock - July 2011: 32%, January 2012: 18%

Since there's been some change to the test configuration over the last
six months, this has to be taken with a grain of salt, but in broad
strokes it makes sense given what's been done - ProcArrayLock
contention is down significantly (due to Pavan's patch, and followup
tweaks), and the other locks are under correspondingly more pressure.
We've done enough work on CLogControlLock (today's change, and Simon's
prior patch to wake up the WAL writer more aggressively and thus allow
hint bits to be set sooner) to allow it to keep pace, so it's only up
slightly, but we haven't done anything about WALInsertLock and it's
therefore grown from just over a third of the blocks to almost half.

The top locks in terms of number of shared acquisitions are
CLogControlLock, which accounts for 47% of the shared lock
acquisitions in the system all by itself, followed by ProcArrayLock,
which accounts for another 12%.  The buffer mapping locks make up
another 23% in total, with the busiest one having about 3.5x the
traffic of the least busy one. Even when these shared acquisitions are
mostly uncontended at the lwlock level, the spinlock can still be a
contention point, and thus these are possible future targets for
further reducing our synchronization overhead despite the fact that
(on this test) there's not much blocking on, say, the buffer mapping
locks.

Note that this fits in shared buffers; on a larger test case, there
would be much more blocking on the buffer mapping locks (and
presumably BufFreelistLock would be a big problem, too).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: LWLOCK_STATS

From
Jeff Janes
Date:
On Fri, Jan 6, 2012 at 2:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> It's been a while since I did any testing with LWLOCK_STATS defined,
> so I thought it might be about time to do that again and see how
> things look.  Here's how they looked back in July:
>
> http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php
>
> Here are the results from a test I ran today on latest sources, again
> on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
> permanent tables, my usual config settings.

What was the tps/or and number of transactions?

I assume this was -c80 -j40?

Also, do you know what percent of CPU time was spend idle during the test?

If the very little time is spend sleeping on lwlocks (i.e. CPU time
near 100%), it doesn't matter much how that waiting is distributed.

Also, was there a big difference in tps between LWLOCK_STATS defined
and not defined (i.e. the overhead of doing the accounting)?

> Somewhat depressingly,
> virtually all of the interesting activity still centers around the
> same three locks that were problematic back then, which means that -
> although overall performance has improved quite a bit - we've not
> really delivered any knockout punches.  Here are the perpetrators:

I don't think that is depressing at all.  Certain locks needs to exist
to protect certain things, and a benchmark which tests those things is
going to take those locks rather than some other set of locks.  X
times faster is still X times faster, even if the bottleneck hasn't
move to some other part of the code.


> ....but we haven't done anything about WALInsertLock and it's
> therefore grown from just over a third of the blocks to almost half.

But not all blocks are for the same length of time.  Do we know how
much time is spent blocking?  I've seen some code around that tries to
instrument that, but on my machine of the time it added a lot of
overhead so it couldn't be used effectively.  I can try to dig it up
and update it to git-head if you want to try it and aren't already
aware of it.  (My code was based on something I found somewhere in
this list.)

Also, I assume this is without the recent "Moving more work outside
WALInsertLock" applied?

Cheers,

Jeff


Re: LWLOCK_STATS

From
Robert Haas
Date:
On Fri, Jan 6, 2012 at 9:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Fri, Jan 6, 2012 at 2:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It's been a while since I did any testing with LWLOCK_STATS defined,
>> so I thought it might be about time to do that again and see how
>> things look.  Here's how they looked back in July:
>>
>> http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php
>>
>> Here are the results from a test I ran today on latest sources, again
>> on Nate Boley's machine.  Five-minute pgbench run, scale factor 100,
>> permanent tables, my usual config settings.
>
> What was the tps/or and number of transactions?
>
> I assume this was -c80 -j40?

Sorry.  -c 32 -j 32.  tps was 9-10k, but I don't take it too seriously
with LWLOCK_STATS defined, because it has some performance impact.

> Also, do you know what percent of CPU time was spend idle during the test?

Sorry, no.

> If the very little time is spend sleeping on lwlocks (i.e. CPU time
> near 100%), it doesn't matter much how that waiting is distributed.

Well, clearly, there is clearly a pretty big impact, because unlogged
tables are much faster than regular tables.  See for example:

http://archives.postgresql.org/pgsql-hackers/2011-12/msg00095.php

...where the comparable result on slightly older sources are:

8 CLOG buffers, permanent tables: tps = 10025.079556 (including
connections establishing)
32 CLOG buffers, permanent tables: tps = 11247.358762 (including
connections establishing)
8 CLOG buffers, unlogged tables: tps = 16999.301828 (including
connections establishing)
32 CLOG buffers, permanent tables: tps = 19653.023856 (including
connections establishing)

As of today, you get 32 CLOG buffers without patching the source code.That test was also done before commits
d573e239f03506920938bf0be56c868d9c3416da and
0d76b60db4684d3487223b003833828fe9655fe2, which further optimized
ProcArrayLock.

> Also, was there a big difference in tps between LWLOCK_STATS defined
> and not defined (i.e. the overhead of doing the accounting)?

Yes, see notes above.

>> Somewhat depressingly,
>> virtually all of the interesting activity still centers around the
>> same three locks that were problematic back then, which means that -
>> although overall performance has improved quite a bit - we've not
>> really delivered any knockout punches.  Here are the perpetrators:
>
> I don't think that is depressing at all.  Certain locks needs to exist
> to protect certain things, and a benchmark which tests those things is
> going to take those locks rather than some other set of locks.  X
> times faster is still X times faster, even if the bottleneck hasn't
> move to some other part of the code.

True.  What I don't like is: I think we've really only pushed the
bottleneck out a few cores.  Throw a 64-core machine at it and we're
going to have all these same problems over again.  I'd like to find
solutions that change the dynamic in a more fundamental way, so that
we buy a little more.  But I'm not going to complain too much; the
performance gains we've gotten with these techniques are obviously
quite substantial, even though they're not a total solution.

>> ....but we haven't done anything about WALInsertLock and it's
>> therefore grown from just over a third of the blocks to almost half.
>
> But not all blocks are for the same length of time.  Do we know how
> much time is spent blocking?  I've seen some code around that tries to
> instrument that, but on my machine of the time it added a lot of
> overhead so it couldn't be used effectively.  I can try to dig it up
> and update it to git-head if you want to try it and aren't already
> aware of it.  (My code was based on something I found somewhere in
> this list.)

I haven't tried it for reasons of overhead, but I'm aware of the problem.

> Also, I assume this is without the recent "Moving more work outside
> WALInsertLock" applied?

Right.  If we can get that done for 9.2, we'll be cooking with gas -
on my tests that was a big improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: LWLOCK_STATS

From
Heikki Linnakangas
Date:
On 07.01.2012 00:24, Robert Haas wrote:
> It's been a while since I did any testing with LWLOCK_STATS defined,
> so I thought it might be about time to do that again and see how
> things look.  Here's how they looked back in July:
>
> http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php

Interesting.

A couple of weeks ago I wrote a little patch that's similar to 
LWLOCK_STATS, but it prints out % of wallclock time that is spent 
acquiring, releasing, or waiting for a lock. I find that more useful 
than the counters.

Here's the patch, I hope it's useful to others. It uses timer_create() 
and timer_settime(), so it probably won't work on all platforms, and 
requires linking with -lrt.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: LWLOCK_STATS

From
Heikki Linnakangas
Date:
On 07.01.2012 09:58, Heikki Linnakangas wrote:
> Here's the patch,

*sigh*, and here's the forgotten attachment.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: LWLOCK_STATS

From
Simon Riggs
Date:
On Fri, Jan 6, 2012 at 10:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

>  Five-minute pgbench run, scale factor 100,
> permanent tables, my usual config settings.  Somewhat depressingly,
> virtually all of the interesting activity still centers around the
> same three locks

We've seen clear evidence that the performance profile changes over
time, with certain artifacts becoming more prominent.

Running exactly the same tests repeatedly is useful to derive the
historical perspectives, but we need a wider spread of tests to be
certain that the work done is generally applicable.

I'd be interested to see results from a 30 minute run, focusing on
what happens in minutes 10-30, if you have time.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: LWLOCK_STATS

From
"Satoshi Nagayasu / Uptime Technologies, LLC."
Date:
2012/01/07 16:58, Heikki Linnakangas wrote:
> On 07.01.2012 00:24, Robert Haas wrote:
>> It's been a while since I did any testing with LWLOCK_STATS defined,
>> so I thought it might be about time to do that again and see how
>> things look. Here's how they looked back in July:
>>
>> http://archives.postgresql.org/pgsql-hackers/2011-07/msg01373.php
>
> Interesting.
>
> A couple of weeks ago I wrote a little patch that's similar to LWLOCK_STATS, but it prints out % of wallclock time
thatis spent acquiring, releasing, or waiting for a lock. I find that more useful than the counters.
 
>
> Here's the patch, I hope it's useful to others. It uses timer_create() and timer_settime(), so it probably won't work
onall platforms, and requires linking with -lrt.
 

I have just written up a systemtap script to observe
lock statistics.

It shows acquire counts, wait counts and
total waiting time for each lwlock every 5 seconds.

Screenshot here:
http://twitpic.com/83p2cz

Is this useful for pg developers?

-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp


Re: LWLOCK_STATS

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> A couple of weeks ago I wrote a little patch that's similar to 
> LWLOCK_STATS, but it prints out % of wallclock time that is spent 
> acquiring, releasing, or waiting for a lock. I find that more useful 
> than the counters.

I would think that the measurement overhead required to obtain two
wall-clock values for every LWLock touch would be so high as to render
any results from this quite suspect.
        regards, tom lane


Re: LWLOCK_STATS

From
Heikki Linnakangas
Date:
On 07.01.2012 19:18, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> A couple of weeks ago I wrote a little patch that's similar to
>> LWLOCK_STATS, but it prints out % of wallclock time that is spent
>> acquiring, releasing, or waiting for a lock. I find that more useful
>> than the counters.
>
> I would think that the measurement overhead required to obtain two
> wall-clock values for every LWLock touch would be so high as to render
> any results from this quite suspect.

It's based on sampling. The timer calls a callback every X ms, which 
checks if it's waiting for any lock at that moment, and bumps a counter 
if so. In LWLockAcquire/Release you just set/reset a global status variable.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: LWLOCK_STATS

From
Robert Haas
Date:
On Sat, Jan 7, 2012 at 5:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 6, 2012 at 10:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>  Five-minute pgbench run, scale factor 100,
>> permanent tables, my usual config settings.  Somewhat depressingly,
>> virtually all of the interesting activity still centers around the
>> same three locks
>
> We've seen clear evidence that the performance profile changes over
> time, with certain artifacts becoming more prominent.
>
> Running exactly the same tests repeatedly is useful to derive the
> historical perspectives, but we need a wider spread of tests to be
> certain that the work done is generally applicable.
>
> I'd be interested to see results from a 30 minute run, focusing on
> what happens in minutes 10-30, if you have time.

Yeah, that seems like a good test to run.  I do have time, but Nate
Boley's test machine is currently otherwise occupied, so I can't run
that test just now.  I will run it when/if an opportunity presents
itself...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: LWLOCK_STATS

From
Jim Nasby
Date:
On Jan 6, 2012, at 8:40 PM, Robert Haas wrote:
>>>  Somewhat depressingly,
>>> virtually all of the interesting activity still centers around the
>>> same three locks that were problematic back then, which means that -
>>> although overall performance has improved quite a bit - we've not
>>> really delivered any knockout punches.  Here are the perpetrators:
>>
>> I don't think that is depressing at all.  Certain locks needs to exist
>> to protect certain things, and a benchmark which tests those things is
>> going to take those locks rather than some other set of locks.  X
>> times faster is still X times faster, even if the bottleneck hasn't
>> move to some other part of the code.
>
> True.  What I don't like is: I think we've really only pushed the
> bottleneck out a few cores.  Throw a 64-core machine at it and we're
> going to have all these same problems over again.  I'd like to find
> solutions that change the dynamic in a more fundamental way, so that
> we buy a little more.  But I'm not going to complain too much; the
> performance gains we've gotten with these techniques are obviously
> quite substantial, even though they're not a total solution.

IIRC, pg_bench is *extremely* write-heavy. There's probably not that many systems that operate that way. I suspect that
mostOLTP systems read more than they write, and some probably have as much as a 10-1 ratio. 

So... it might be interesting to run a more balanced pg_bench as well...
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: LWLOCK_STATS

From
Robert Haas
Date:
On Mon, Jan 9, 2012 at 7:24 PM, Jim Nasby <jim@nasby.net> wrote:
> On Jan 6, 2012, at 8:40 PM, Robert Haas wrote:
>>>>  Somewhat depressingly,
>>>> virtually all of the interesting activity still centers around the
>>>> same three locks that were problematic back then, which means that -
>>>> although overall performance has improved quite a bit - we've not
>>>> really delivered any knockout punches.  Here are the perpetrators:
>>>
>>> I don't think that is depressing at all.  Certain locks needs to exist
>>> to protect certain things, and a benchmark which tests those things is
>>> going to take those locks rather than some other set of locks.  X
>>> times faster is still X times faster, even if the bottleneck hasn't
>>> move to some other part of the code.
>>
>> True.  What I don't like is: I think we've really only pushed the
>> bottleneck out a few cores.  Throw a 64-core machine at it and we're
>> going to have all these same problems over again.  I'd like to find
>> solutions that change the dynamic in a more fundamental way, so that
>> we buy a little more.  But I'm not going to complain too much; the
>> performance gains we've gotten with these techniques are obviously
>> quite substantial, even though they're not a total solution.
>
> IIRC, pg_bench is *extremely* write-heavy. There's probably not that many systems that operate that way. I suspect
thatmost OLTP systems read more than they write, and some probably have as much as a 10-1 ratio. 
>
> So... it might be interesting to run a more balanced pg_bench as well...

Yeah, maybe.  I've run read-only tests as well, and the tps rate is
~10x higher.  So, certainly, you're going to do better if you have
more read-only transactions relative to write transactions.  Not sure
what the exact shape of the curve is, but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: LWLOCK_STATS

From
Simon Riggs
Date:
On Tue, Jan 10, 2012 at 12:24 AM, Jim Nasby <jim@nasby.net> wrote:

> IIRC, pg_bench is *extremely* write-heavy. There's probably not that many systems that operate that way. I suspect
thatmost OLTP systems read more than they write, and some probably have as much as a 10-1 ratio. 

IMHO the main PostgreSQL design objective is doing a flexible, general
purpose 100% write workload. Which is why Hot Standby and
LISTEN/NOTIFY are so important as mechanisms for offloading read
traffic to other places, so we can scale the total solution beyond 1
node without giving up the power of SQL.

So benchmarking write-heavy workloads and separately benchmarking
read-only workloads is more representative.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: LWLOCK_STATS

From
Merlin Moncure
Date:
On Tue, Jan 10, 2012 at 3:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> So benchmarking write-heavy workloads and separately benchmarking
> read-only workloads is more representative.

Absolutely.  High write activity applications are much more difficult
to optimize with simple tricks like client side caching.  Also,
storage is finally moving out of the dark ages so that high write
transaction rate servers are no longer necessarily i/o bound without
on reasonable hardware.

merlin


Re: LWLOCK_STATS

From
Jim Nasby
Date:
On Jan 10, 2012, at 3:16 AM, Simon Riggs wrote:
> On Tue, Jan 10, 2012 at 12:24 AM, Jim Nasby <jim@nasby.net> wrote:
>> IIRC, pg_bench is *extremely* write-heavy. There's probably not that many systems that operate that way. I suspect
thatmost OLTP systems read more than they write, and some probably have as much as a 10-1 ratio. 
>
> IMHO the main PostgreSQL design objective is doing a flexible, general
> purpose 100% write workload. Which is why Hot Standby and
> LISTEN/NOTIFY are so important as mechanisms for offloading read
> traffic to other places, so we can scale the total solution beyond 1
> node without giving up the power of SQL.

There's a problem with that theory though... in an actual OLTP system it can be extremely difficult to effectively
splitread and write workloads unless you've got some really easy way to know that you're not reading data that was just
modified.I realize that there are caching and some other tricks that can help here, but AFAICT they all have some
prettysignificant drawbacks that can easily limit where they can be used. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net