Thread: User concurrency thresholding: where do I look?

User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

19 July 2007, 12:28:16

Folks,

I've run into this a number of times with various PostgreSQL users, so we
tested it at Sun.  What seems to be happening is that at some specific number
of connections average throughput drops 30% and response time quadruples or
worse.  The amount seems to vary per machine; I've seen it as variously 95,
1050, 1700 or 2800 connections.  Tinkering with postgresql.conf parameters
doesn't seem to affect this threshold.

As an example of this behavior:

Users    Txn/User  Resp. Time
50    105.38    0.01
100    113.05    0.01
150    114.05    0.01
200    113.51    0.01
250    113.38    0.01
300    112.14    0.01
350    112.26    0.01
400    111.43    0.01
450    110.72    0.01
500    110.44    0.01
550    109.36    0.01
600    107.01    0.02
650    105.71    0.02
700    106.95    0.02
750    107.69    0.02
800    106.78    0.02
850    108.59    0.02
900    106.03    0.02
950    106.13    0.02
1000    64.58    0.15
1050    52.32    0.23
1100    49.79    0.25

Tinkering with shared_buffers has had no effect on this threholding (the above
was with 3gb to 6gb of shared_buffers).   Any ideas on where we should look
for the source of the bottleneck?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: User concurrency thresholding: where do I look?

From

"Joshua D. Drake"

Date:

19 July 2007, 12:42:39

Josh Berkus wrote:
> Folks,
>

> 650    105.71    0.02
> 700    106.95    0.02
> 750    107.69    0.02
> 800    106.78    0.02
> 850    108.59    0.02
> 900    106.03    0.02
> 950    106.13    0.02
> 1000    64.58    0.15
> 1050    52.32    0.23
> 1100    49.79    0.25
>
> Tinkering with shared_buffers has had no effect on this threholding (the above
> was with 3gb to 6gb of shared_buffers).   Any ideas on where we should look
> for the source of the bottleneck?

I have seen this as well. I always knocked it up to PG having to
managing so many connections but there are some interesting evidences to
review.

The amount of memory "each" connection takes up. Consider 4-11 meg per
connection depending on various things like number of prepared queries.

Number of CPUs. Obviously 500 connections over 4 CPUS isn't the same as
500 connections over 8 CPUS.

That number of connections generally means a higher velocity, a higher
velocity means more checkpoint segments. Wrong settings with your
checkpoint segments, bgwriter and checkpoint will cause you to start
falling down.

I would also note that our experience is that PG falls down a little
higher, more toward 2500 connections last time I checked, but this was
likely on different hardware.

Joshua D. Drake

--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Re: User concurrency thresholding: where do I look?

From

Alvaro Herrera

Date:

19 July 2007, 12:49:22

Josh Berkus wrote:
> Folks,
>
> I've run into this a number of times with various PostgreSQL users, so we
> tested it at Sun.  What seems to be happening is that at some specific number
> of connections average throughput drops 30% and response time quadruples or
> worse.  The amount seems to vary per machine; I've seen it as variously 95,
> 1050, 1700 or 2800 connections.  Tinkering with postgresql.conf parameters
> doesn't seem to affect this threshold.
>
> As an example of this behavior:
>
> Users    Txn/User  Resp. Time
> 50    105.38    0.01
> 100    113.05    0.01
> 150    114.05    0.01
> 200    113.51    0.01
> 250    113.38    0.01
> 300    112.14    0.01
> 350    112.26    0.01
> 400    111.43    0.01
> 450    110.72    0.01
> 500    110.44    0.01
> 550    109.36    0.01
> 600    107.01    0.02
> 650    105.71    0.02
> 700    106.95    0.02
> 750    107.69    0.02
> 800    106.78    0.02
> 850    108.59    0.02
> 900    106.03    0.02
> 950    106.13    0.02
> 1000    64.58    0.15
> 1050    52.32    0.23
> 1100    49.79    0.25
>
> Tinkering with shared_buffers has had no effect on this threholding (the above
> was with 3gb to 6gb of shared_buffers).   Any ideas on where we should look
> for the source of the bottleneck?

Have you messed with max_connections and/or max_locks_per_transaction
while testing this?  The lock table is sized to max_locks_per_xact times
max_connections, and shared memory hash tables get slower when they are
full.  Of course, the saturation point would depend on the avg number of
locks acquired per user, which would explain why you are seeing a lower
number for some users and higher for others (simpler/more complex
queries).

This is just a guess though.  No profiling or measuring at all, really.

--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"How amazing is that? I call it a night and come back to find that a bug has
been identified and patched while I sleep."                (Robert Davidson)
               http://archives.postgresql.org/pgsql-sql/2006-03/msg00378.php

Re: User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

19 July 2007, 13:21:36

Alvaro,

> Have you messed with max_connections and/or max_locks_per_transaction
> while testing this?  The lock table is sized to max_locks_per_xact times
> max_connections, and shared memory hash tables get slower when they are
> full.  Of course, the saturation point would depend on the avg number of
> locks acquired per user, which would explain why you are seeing a lower
> number for some users and higher for others (simpler/more complex
> queries).

That's an interesting thought.  Let me check lock counts and see if this is
possibly the case.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

19 July 2007, 14:25:48

Josh Berkus <josh@agliodbs.com> writes:
> Alvaro,
>> Have you messed with max_connections and/or max_locks_per_transaction
>> while testing this?  The lock table is sized to max_locks_per_xact times
>> max_connections, and shared memory hash tables get slower when they are
>> full.  Of course, the saturation point would depend on the avg number of
>> locks acquired per user, which would explain why you are seeing a lower
>> number for some users and higher for others (simpler/more complex
>> queries).

> That's an interesting thought.  Let me check lock counts and see if this is
> possibly the case.

AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
space entirely; and the fact that you can continue the curve upwards
says that you're not on the edge of running out.  However I agree that
it's worth experimenting with those two parameters to see if the curve
moves around at all.

Another resource that might be interesting is the number of open files.

Also, have you tried watching vmstat or local equivalent to confirm that
the machine's not starting to swap?

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

Alvaro Herrera

Date:

19 July 2007, 14:37:20

Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
> > Alvaro,
> >> Have you messed with max_connections and/or max_locks_per_transaction
> >> while testing this?  The lock table is sized to max_locks_per_xact times
> >> max_connections, and shared memory hash tables get slower when they are
> >> full.  Of course, the saturation point would depend on the avg number of
> >> locks acquired per user, which would explain why you are seeing a lower
> >> number for some users and higher for others (simpler/more complex
> >> queries).
>
> > That's an interesting thought.  Let me check lock counts and see if this is
> > possibly the case.
>
> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
> space entirely;

Well, if there still is shared memory available, the lock hash can
continue to grow, but it would slow down according to this comment in
ShmemInitHash:

 * max_size is the estimated maximum number of hashtable entries.  This is
 * not a hard limit, but the access efficiency will degrade if it is
 * exceeded substantially (since it's used to compute directory size and
 * the hash table buckets will get overfull).

For the lock hash tables this max_size is
(MaxBackends+max_prepared_xacts) * max_locks_per_xact.

So maybe this does not make much sense in normal operation, thus not
applicable to what Josh Berkus is reporting.

However I was talking to Josh Drake yesterday and he told me that
pg_dump was spending some significant amount of time in LOCK TABLE when
there are lots of tables (say 300k).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

19 July 2007, 14:45:23

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
>> space entirely;

> Well, if there still is shared memory available, the lock hash can
> continue to grow, but it would slow down according to this comment in
> ShmemInitHash:

Right, but there's not an enormous amount of headroom in shared memory
beyond the intended size of the hash tables.  I'd think that you'd start
seeing hard failures not very far beyond the point at which performance
impacts became visible.  Of course this is all speculation; I quite
agree with varying the table-size parameters to see if it makes a
difference.

Josh, what sort of workload is being tested here --- read-mostly,
write-mostly, a mixture?

> However I was talking to Josh Drake yesterday and he told me that
> pg_dump was spending some significant amount of time in LOCK TABLE when
> there are lots of tables (say 300k).

I wouldn't be too surprised if there's some O(N^2) effects when a single
transaction holds that many locks, because of the linked-list proclock
data structures.  This would not be relevant to Josh's case though.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Joshua D. Drake"

Date:

19 July 2007, 15:27:56

Alvaro Herrera wrote:
> Tom Lane wrote:
>> Josh Berkus <josh@agliodbs.com> writes:

> So maybe this does not make much sense in normal operation, thus not
> applicable to what Josh Berkus is reporting.
>
> However I was talking to Josh Drake yesterday and he told me that
> pg_dump was spending some significant amount of time in LOCK TABLE when
> there are lots of tables (say 300k).

Less, 128k

Joshua D. Drake


>

Re: User concurrency thresholding: where do I look?

From

Greg Smith

Date:

19 July 2007, 16:04:48

On Thu, 19 Jul 2007, Josh Berkus wrote:

> What seems to be happening is that at some specific number of
> connections average throughput drops 30% and response time quadruples or
> worse.

Could you characterize what each connection is doing and how you're
generating the load?  I don't know how productive speculating about the
cause here will be until there's a test script available so other people
can see where the tipping point is on their system.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

19 July 2007, 22:18:28

Tom, all:

> Also, have you tried watching vmstat or local equivalent to confirm that
> the machine's not starting to swap?

We're not swapping.

> Josh, what sort of workload is being tested here --- read-mostly,
> write-mostly, a mixture?

It's a TPCC-like workload, so heavy single-row updates, and the
updates/inserts are what's being measured.  For that matter, when I've seen
this before it was with heavy-write workloads and we were measuring the
number of updates/inserts and not the number of reads.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

19 July 2007, 22:45:34

Josh Berkus <josh@agliodbs.com> writes:
>> Josh, what sort of workload is being tested here --- read-mostly,
>> write-mostly, a mixture?

> It's a TPCC-like workload, so heavy single-row updates, and the
> updates/inserts are what's being measured.  For that matter, when I've seen
> this before it was with heavy-write workloads and we were measuring the
> number of updates/inserts and not the number of reads.

Well, if the load is a lot of short writing transactions then you'd
expect the throughput to depend on how fast stuff can be pushed down to
WAL.  What have you got wal_buffers set to?  Are you using a commit
delay?  What's the I/O system anyway (any BB write cache on the WAL
disk?) and what wal sync method are you using?

While I'm asking questions, exactly what were the data columns you
presented?  Txn/User doesn't make much sense to me, and I'm not sure
what "response time" you were measuring either.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

20 July 2007, 01:37:52

Tom,

> Well, if the load is a lot of short writing transactions then you'd
> expect the throughput to depend on how fast stuff can be pushed down to
> WAL.  What have you got wal_buffers set to?  Are you using a commit
> delay?  What's the I/O system anyway (any BB write cache on the WAL
> disk?) and what wal sync method are you using?

You know, I think Jignesh needs to me on this list so I can stop relaying
questions on a workload I didn't design.  Let me get him.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: User concurrency thresholding: where do I look?

From

Gregory Stark

Date:

20 July 2007, 08:46:14

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Josh Berkus <josh@agliodbs.com> writes:
>
>> That's an interesting thought.  Let me check lock counts and see if this is
>> possibly the case.
>
> AFAIK you'd get hard failures, not slowdowns, if you ran out of lock
> space entirely

I assume you've checked the server logs and are sure that you aren't in fact
getting errors. I could, for example, envision a situation where a fraction of
the transactions are getting some error and those transactions are therefore
not being counted against the txn/s result.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

Greg Smith

Date:

20 July 2007, 14:05:58

On Thu, 19 Jul 2007, Josh Berkus wrote:

> It's a TPCC-like workload, so heavy single-row updates, and the
> updates/inserts are what's being measured.

There's so much going on with a TPC-C kind of workload.  Has anyone ever
looked into quantifying scaling for more fundamental operations?  There
are so many places a complicated workload could get caught up that
starting there is hard.  I've found it's helpful to see the breaking
points for simpler operations, then compare how things change as each new
transaction element is introduced.

As an example, take a look at the MySQL SysBench tool:
http://sysbench.sourceforge.net/docs/

Specifically their "oltp" tests.  Note how you can get a handle on how
simple selects scale, then simple inserts, then updates, and so on.  The
only thing I've thought of they missed is testing a trivial operation that
doesn't even touch the buffer cache ('SELECT 1'?) that would let you
quantify just general connection scaling issues.

It seems to me that you could narrow the list of possible causes here much
more quickly if you had a good handle on the upper concurrency of
lower-level operations.

[Note:  it's possible to run SysBench against a PG database, but the code
is very immature.  Last time I tried there were plenty of crashes and
there seemed to be some transaction wrapping issues that caused deadlocks
with some tests.]

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

20 July 2007, 14:18:06

Greg,

> There's so much going on with a TPC-C kind of workload.  Has anyone ever
> looked into quantifying scaling for more fundamental operations?  There
> are so many places a complicated workload could get caught up that
> starting there is hard.  I've found it's helpful to see the breaking
> points for simpler operations, then compare how things change as each
> new transaction element is introduced.

... eagerly awaiting Michael Doilson's PgUnitTest ....

--Josh

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 15:09:01

Awww Josh,

I was just enjoying the chat on the picket fence! :-)

Anyway the workload is mixed (reads,writes) with simple to medium
queries. The workload is known to scale well. But inorder to provide
substantial input I am still trying to eliminate things that can
bottleneck.  Currently I have eliminated CPU (plenty free) , RAM
(memory  is 48GB RAM in this server for  a 32-bit postgresql instance),
IO Storage (used the free ram to do /tmp database to eliminate IO)  and
am still trying to eliminate any network bottlenecks to say for sure we
have a problem in PostgreSQL. But yes till that final thing is confirmed
(network which can very well be the case) it could be a problem
somewhere else. However the thing that worries me is more of the big
drop instead of remaining constant out there..

Anyway more on this within a day or two once I add more network nics
between the systems to eliminate network problems (even though stats
dont show them as problems right now) and also reduce malloc lock
penalties if any.

As for other questions:

max_locks_per_transactions is set to default (10 I believe) increasing
it still seems to degrade overall throughput number.

max_connections is set to 1500 for now till I get decent scaling till
1400-1500 users.

There are no hard failures reported anywhere. Log min durations does
show that queries are now slowing down and taking longer.

OS is not swapping and also eliminated IO  by putting the whole database
on /tmp

So while I finish adding more network connections between the two
systems (need to get cards) do enjoy the following URL :-)

http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html

Of course all postgresql.conf still remains from the old test so no
flames on that one again :-)

Regards,
Jignesh

Josh Berkus wrote:
> Tom,
>
>
>> Well, if the load is a lot of short writing transactions then you'd
>> expect the throughput to depend on how fast stuff can be pushed down to
>> WAL.  What have you got wal_buffers set to?  Are you using a commit
>> delay?  What's the I/O system anyway (any BB write cache on the WAL
>> disk?) and what wal sync method are you using?
>>
>
> You know, I think Jignesh needs to me on this list so I can stop relaying
> questions on a workload I didn't design.  Let me get him.
>
>

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 15:18:12

I forgot to add one more piece of information.. I also tried the same
test with 64-bit postgresql with 6GB shared_buffers and results are the
same it drops around the same point which to me sounds like a bottleneck..

More later

-Jignesh


Jignesh K. Shah wrote:
> Awww Josh,
>
> I was just enjoying the chat on the picket fence! :-)
>
> Anyway the workload is mixed (reads,writes) with simple to medium
> queries. The workload is known to scale well. But inorder to provide
> substantial input I am still trying to eliminate things that can
> bottleneck.  Currently I have eliminated CPU (plenty free) , RAM
> (memory  is 48GB RAM in this server for  a 32-bit postgresql
> instance), IO Storage (used the free ram to do /tmp database to
> eliminate IO)  and am still trying to eliminate any network
> bottlenecks to say for sure we have a problem in PostgreSQL. But yes
> till that final thing is confirmed (network which can very well be the
> case) it could be a problem somewhere else. However the thing that
> worries me is more of the big drop instead of remaining constant out
> there..
>
> Anyway more on this within a day or two once I add more network nics
> between the systems to eliminate network problems (even though stats
> dont show them as problems right now) and also reduce malloc lock
> penalties if any.
>
> As for other questions:
>
> max_locks_per_transactions is set to default (10 I believe) increasing
> it still seems to degrade overall throughput number.
>
> max_connections is set to 1500 for now till I get decent scaling till
> 1400-1500 users.
>
> There are no hard failures reported anywhere. Log min durations does
> show that queries are now slowing down and taking longer.
>
> OS is not swapping and also eliminated IO  by putting the whole
> database on /tmp
>
> So while I finish adding more network connections between the two
> systems (need to get cards) do enjoy the following URL :-)
>
> http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070703-00073.html
>
>
> Of course all postgresql.conf still remains from the old test so no
> flames on that one again :-)
>
> Regards,
> Jignesh
>
>
>
>
> Josh Berkus wrote:
>> Tom,
>>
>>
>>> Well, if the load is a lot of short writing transactions then you'd
>>> expect the throughput to depend on how fast stuff can be pushed down to
>>> WAL.  What have you got wal_buffers set to?  Are you using a commit
>>> delay?  What's the I/O system anyway (any BB write cache on the WAL
>>> disk?) and what wal sync method are you using?
>>>
>>
>> You know, I think Jignesh needs to me on this list so I can stop
>> relaying questions on a workload I didn't design.  Let me get him.
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

20 July 2007, 15:24:06

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> There are no hard failures reported anywhere. Log min durations does
> show that queries are now slowing down and taking longer.
> OS is not swapping and also eliminated IO  by putting the whole database
> on /tmp

Hmm.  Do you see any evidence of a context swap storm (ie, a drastic
increase in the context swaps/second reading reported by vmstat)?

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 15:56:39

Yes I did see increase in context switches and CPU migrations at that
point using mpstat.

Regards,
Jignesh


Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> There are no hard failures reported anywhere. Log min durations does
>> show that queries are now slowing down and taking longer.
>> OS is not swapping and also eliminated IO  by putting the whole database
>> on /tmp
>>
>
> Hmm.  Do you see any evidence of a context swap storm (ie, a drastic
> increase in the context swaps/second reading reported by vmstat)?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

20 July 2007, 16:13:29

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Yes I did see increase in context switches and CPU migrations at that
> point using mpstat.

So follow that up --- try to determine which lock is being contended
for.  There's some very crude code in the sources that you can enable
with -DLWLOCK_STATS, but probably DTrace would be a better tool.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 16:46:39

Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>> Yes I did see increase in context switches and CPU migrations at that
>> point using mpstat.
>
> So follow that up --- try to determine which lock is being contended
> for.  There's some very crude code in the sources that you can enable
> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
>
>             regards, tom lane

Using plockstat -A -s 5 -p $pid

on bgwriter:  doesnt report anything

On one of the many connections:

This one is hard to read easily
By default, plockstat monitors all lock con-
tention events, gathers  frequency  and  timing  data  about
those  events, and displays the data in decreasing frequency
order, so that the most common events appear first.


^Cbash-3.00# plockstat -A -s 5  -p 6401
^C
Mutex hold

-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   59   186888 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
     16384 |                        |     1 libumem.so.1`process_free+0x12c
     32768 |@@@@@                   |    14 postgres`AllocSetDelete+0x98
     65536 |@@                      |     5
postgres`MemoryContextDelete+0x78
    131072 |                        |     0 postgres`CommitTransaction+0x240
    262144 |@@@@@@@@@@@@@@@         |    39
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  530    12226 0x10059e280
libumem.so.1`umem_cache_alloc+0x200

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@         |   338
libumem.so.1`umem_cache_alloc+0x200
      8192 |@                       |    24 libumem.so.1`umem_alloc+0x5c
     16384 |@                       |    37 libumem.so.1`malloc+0x40
     32768 |@@@@@                   |   131 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  324    10214 0x100578030                  libumem.so.1`vmem_xfree+0x164

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@          |   192 libumem.so.1`vmem_xfree+0x164
      8192 |@@@@                    |    56 libumem.so.1`process_free+0x12c
     16384 |@                       |    26 postgres`AllocSetDelete+0x98
     32768 |@@@                     |    50
postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  161    13585 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   118 libumem.so.1`process_free+0x12c
      8192 |                        |     4 postgres`AllocSetDelete+0x98
     16384 |@                       |    10
postgres`MemoryContextDelete+0x78
     32768 |@@@                     |    24 postgres`PortalDrop+0x160
     65536 |                        |     3
    131072 |                        |     0
    262144 |                        |     2
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     6081 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xalloc+0x630

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@            |   170 libumem.so.1`vmem_xalloc+0x630
      8192 |@@@@@@@@@@@             |   155 libumem.so.1`vmem_alloc+0x1f8
     16384 |                        |     1 libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     5867 libumem.so.1`vmem0+0x30      libumem.so.1`vmem_alloc+0x248

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |   185 libumem.so.1`vmem_alloc+0x248
      8192 |@@@@@@@@@@              |   141
libumem.so.1`vmem_sbrk_alloc+0x30
                                            libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  318     5873 0x100578030                  libumem.so.1`vmem_alloc+0x1d0

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   228 libumem.so.1`vmem_alloc+0x1d0
      8192 |@@@@@                   |    78 libumem.so.1`umem_alloc+0xec
     16384 |                        |     6 libumem.so.1`malloc+0x40
     32768 |                        |     6 postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     5591 0x100578030                  libumem.so.1`vmem_xalloc+0x630

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@         |   213 libumem.so.1`vmem_xalloc+0x630
      8192 |@@@@@@@@                |   112 libumem.so.1`vmem_alloc+0x1f8
     16384 |                        |     0 libumem.so.1`umem_alloc+0xec
     32768 |                        |     1 libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  324     5208 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xfree+0x164

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   236 libumem.so.1`vmem_xfree+0x164
      8192 |@@@@@@                  |    88 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4108 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_alloc+0x1d0

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   325 libumem.so.1`vmem_alloc+0x1d0
      8192 |                        |     1 libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`umem_alloc+0xec
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4108 0x100578030                  libumem.so.1`vmem_xalloc+0x50c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   325 libumem.so.1`vmem_xalloc+0x50c
      8192 |                        |     1 libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`umem_alloc+0xec
                                            libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  326     4096 libumem.so.1`vmem0+0xc38     libumem.so.1`vmem_xalloc+0x50c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   326 libumem.so.1`vmem_xalloc+0x50c
                                            libumem.so.1`vmem_alloc+0x1f8
                                            libumem.so.1`vmem_xalloc+0x524
                                            libumem.so.1`vmem_alloc+0x1f8
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  240     5444 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@        |   167 libumem.so.1`process_free+0x12c
      8192 |@@@@@@@                 |    72 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |                        |     1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  123     9057 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@             |    60
libumem.so.1`umem_depot_alloc+0xb8
      8192 |@@@@                    |    24
libumem.so.1`umem_cache_free+0xc4
     16384 |@@@@@@@                 |    37 libumem.so.1`process_free+0x12c
     32768 |                        |     2 postgres`AllocSetDelete+0x98
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  200     4935 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@  |   185 libumem.so.1`process_free+0x12c
      8192 |                        |     4 postgres`AllocSetDelete+0x98
     16384 |@                       |    10
postgres`MemoryContextDelete+0x78
     32768 |                        |     1 postgres`ExecutorEnd+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  164     5219 0x100595700
libumem.so.1`umem_cache_alloc+0x200

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@       |   121
libumem.so.1`umem_cache_alloc+0x200
      8192 |@@@@@@                  |    42 libumem.so.1`umem_alloc+0x5c
     16384 |                        |     1 libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  122     6748 0x10059e1d0
libumem.so.1`umem_depot_alloc+0xb8

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@                |    43
libumem.so.1`umem_depot_alloc+0xb8
      8192 |@@@@@@@@@@@@@@@         |    79
libumem.so.1`umem_cache_alloc+0xa0
                                            libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  163     4146 0x100595700                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |   161 libumem.so.1`process_free+0x12c
      8192 |                        |     2 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
                                            postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   50    12615 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |    28 libumem.so.1`process_free+0x12c
      8192 |@                       |     3 postgres`AllocSetDelete+0x98
     16384 |@@@                     |     8
postgres`MemoryContextDelete+0x78
     32768 |@@@@@                   |    11 postgres`FreeExecutorState+0x6c
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  123     4096 0x10059e1d0
libumem.so.1`umem_cache_free+0xec

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   123
libumem.so.1`umem_cache_free+0xec
                                            libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
  122     4096 0x10059e1d0
libumem.so.1`umem_cache_alloc+0xc4

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|   122
libumem.so.1`umem_cache_alloc+0xc4
                                            libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x1c4
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     7970 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@                       |     2 libumem.so.1`process_free+0x12c
      8192 |@@@@@@@@@@@@@@@@@@@@@@  |    35 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78

postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     5867 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@   |    33 libumem.so.1`process_free+0x12c
      8192 |@                       |     2 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |@                       |     2
postgres`exec_parse_message+0x130
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   39     4516 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@   |    35 libumem.so.1`process_free+0x12c
      8192 |@@                      |     4 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
                                            postgres`PortalDrop+0x160
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     4428 0x10058b700
libumem.so.1`umem_cache_alloc+0x200

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@  |    34
libumem.so.1`umem_cache_alloc+0x200
      8192 |@                       |     3 libumem.so.1`umem_alloc+0x5c
                                            libumem.so.1`malloc+0x40

postgres`base_yy_scan_buffer+0x38
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   37     4206 0x10058b700                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@ |    36 libumem.so.1`process_free+0x12c
      8192 |                        |     1 postgres`scanner_finish+0x50
                                            postgres`raw_parser+0x3c
                                            postgres`pg_parse_query+0x54
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   11    10426 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@           |     6 libumem.so.1`process_free+0x12c
      8192 |@@@@@@                  |     3 postgres`AllocSetDelete+0x98
     16384 |                        |     0
postgres`MemoryContextDelete+0x78
     32768 |@@@@                    |     2 postgres`ExecEndAgg+0x68
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     5120 libumem.so.1`vmem0+0x30      libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@      |     6 libumem.so.1`process_free+0x12c
      8192 |@@@@@@                  |     2 postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
                                            postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     4096 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     8 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
                                            postgres`ExecEndSort+0x24
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    8     4096 0x100578030                  libumem.so.1`vmem_alloc+0x1d0

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     8 libumem.so.1`vmem_alloc+0x1d0
                                            libumem.so.1`umem_alloc+0xec
                                            libumem.so.1`malloc+0x40
                                            postgres`AllocSetAlloc+0x314
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    3     4096 0x10059e280                  libumem.so.1`process_free+0x12c

      nsec ---- Time Distribution --- count Stack
      4096 |@@@@@@@@@@@@@@@@@@@@@@@@|     3 libumem.so.1`process_free+0x12c
                                            postgres`AllocSetDelete+0x98

postgres`MemoryContextDelete+0x78
                                            postgres`tbm_free+0x10
bash-3.00#

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

20 July 2007, 17:16:10

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Tom Lane wrote:
>> So follow that up --- try to determine which lock is being contended
>> for.  There's some very crude code in the sources that you can enable
>> with -DLWLOCK_STATS, but probably DTrace would be a better tool.

> Using plockstat -A -s 5 -p $pid

I don't know what that is, but it doesn't appear to have anything to do
with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
Try asking Robert Lor about this --- IIRC he had some dtrace probes to
work with our locks.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 17:51:46

sorry..

The are solaris mutex locks used by the postgresql process.

What its saying is that there are holds/waits in trying to get locks
which are locked at Solaris user library levels called from the
postgresql functions:
For example both the following functions are hitting on the same mutex
lock  0x10059e280  in Solaris Library call:
postgres`AllocSetDelete+0x98
postgres`AllocSetAlloc+0x1c4


I need to enable the DTrace probes on my builds

-Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> Tom Lane wrote:
>>
>>> So follow that up --- try to determine which lock is being contended
>>> for.  There's some very crude code in the sources that you can enable
>>> with -DLWLOCK_STATS, but probably DTrace would be a better tool.
>>>
>
>
>> Using plockstat -A -s 5 -p $pid
>>
>
> I don't know what that is, but it doesn't appear to have anything to do
> with Postgres LWLocks or spinlocks, which are the locks I was thinking of.
> Try asking Robert Lor about this --- IIRC he had some dtrace probes to
> work with our locks.
>
>             regards, tom lane
>

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

20 July 2007, 17:57:41

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> What its saying is that there are holds/waits in trying to get locks
> which are locked at Solaris user library levels called from the
> postgresql functions:
> For example both the following functions are hitting on the same mutex
> lock  0x10059e280  in Solaris Library call:
> postgres`AllocSetDelete+0x98
> postgres`AllocSetAlloc+0x1c4

That's a perfect example of the sort of useless overhead that I was
complaining of just now in pgsql-patches.  Having malloc/free use
an internal mutex is necessary in multi-threaded programs, but the
backend isn't multi-threaded.  And yet, apparently you can't turn
that off in Solaris.

(Fortunately, the palloc layer is probably insulating us from malloc's
performance enough that this isn't a huge deal.  But it's annoying.)

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

20 July 2007, 18:24:42

True you cant switch off the locks since libthread has been folded into
libc in Solaris 10.

Anyway just to give you an idea of the increase in context switching at
the break point here are the mpstat (taken at 10 second interval) on
this 8-socket Sun Fire V890.

The low icsw (Involuntary Context Switches) is about 950-1000 user mark
after which a context switch storm starts at users above 1000-1050 mark
and drops in total throughput drops about 30% instantaneously.. I will
try rebuilding the postgresql with dtrace probes to get more clues.
(NOTE you will see 1 cpu (cpuid:22) doing more system work... thats the
one doing handling the network interrupts)


CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   57   0   27   108    6 4072   98 1749  416    1  7763   47  13   0  40
  1   46   0   24    22    6 4198   11 1826  427    0  7547   45  13   0  42
  2   42   0   34   104    8 4103   91 1682  424    1  7797   46  13   0  41
  3   51   0   22    21    6 4125   10 1734  435    0  7399   45  13   0  43
  4   65   0   27    19    6 4015    8 1706  411    0  7292   44  15   0  41
  5   54   0   21    21    6 4297   10 1702  464    0  7708   45  13   0  42
  6   36   0   16    66   47 4218   12 1713  426    0  7685   47  11   0  42
  7   40   0  100   318  206 3699   10 1534  585    0  6851   45  14   0  41
 16   41   0   30    87    5 3780   78 1509  401    1  7604   45  13   0  42
 17   39   0   24    22    5 3970   12 1631  408    0  7265   44  12   0  44
 18   42   0   24    99    5 3829   89 1519  401    1  7343   45  12   0  43
 19   39   0   31 78830    5 3588    8 1509  400    0  6629   43  13   0  44
 20   22   0   20    19    6 3925    9 1577  419    0  7364   44  12   0  44
 21   38   0   31    23    5 3792   13 1566  407    0  7133   45  12   0  44
 22    8   0  110  7053 7045 1641    8  728  838    0  2917   16  50   0  33
 23   62   0   29    21    5 3985   10 1579  449    0  7368   44  12   0  44
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   13   0   27   123    6 4228  113 1820  433    1  8084   49  13   0  38
  1   16   0   63    26    6 4253   15 1875  420    0  7754   47  14   0  39
  2   11   0   31   110    8 4178   97 1741  425    1  8095   48  14   0  38
  3    8   0   24    20    6 4257    9 1818  444    0  7807   47  13   0  40
  4   13   0   54    28    6 4145   17 1774  426    1  7732   46  16   0  38
  5   12   0   35    23    6 4412   12 1775  447    0  8249   48  13   0  39
  6    8   0   24    38   15 4323   14 1760  422    0  8016   49  11   0  39
  7    8   0  120   323  206 3801   15 1599  635    0  7290   47  15   0  38
 16   11   0   44   107    5 3896   98 1582  393    1  7997   47  15   0  39
 17   15   0   29    24    5 4120   14 1716  416    0  7648   46  13   0  41
 18    9   0   35   113    5 3933  103 1594  399    1  7714   47  13   0  40
 19    8   0   34 83271    5 3702   12 1564  403    0  7010   45  14   0  41
 20    7   0   28    27    6 3997   16 1624  400    0  7676   46  13   0  41
 21    8   0   28    25    5 3997   15 1664  402    0  7658   47  12   0  41
 22    4   0   97  7741 7731 1586   11  704  906    0  2933   17  51   0  32
 23   13   0   28    25    5 4144   15 1658  437    0  7810   47  12   0  41
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  141   315    6 9262  301 2812  330    0 10905   49  16   0  35
  1    1   0  153   199    6 9400  186 2808  312    0 11066   48  16   0  37
  2    0   0  140   256    8 8798  242 2592  310    0 10111   47  15   0  38
  3    1   0  141   189    6 8803  172 2592  314    0 10171   47  15   0  39
  4    0   0  120   214    6 9540  207 2801  322    0 10531   46  17   0  36
  5    1   0  152   180    6 8764  161 2564  342    0  9904   47  15   0  38
  6    1   0  107   344  148 8180  181 2512  290    0  9314   51  14   0  35
  7    0   0  665   443  204 8733  153 2574  404    0  9892   43  21   0  37
 16    0   0  113   217    5 6446  201 1975  265    0  7552   45  12   0  44
 17    0   0  107   153    5 6568  140 2021  274    0  7586   44  11   0  45
 18    0   0  121   215    5 6072  201 1789  276    1  7690   44  12   0  44
 19    1   0  102 47142    5 6123  126 1829  262    0  7185   43  12   0  45
 20    0   0  102   143    6 6451  129 1939  262    0  7450   43  13   0  44
 21    1   0  106   150    5 6538  133 1997  285    0  7425   44  11   0  44
 22    0   0  494  5949 5876 3586   73 1040  399    0  4058   26  39   0  34
 23    0   0  102   159    5 6393  142 1942  324    0  7226   43  12   0  46
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  217   441    7 10763  426 3234  363    0 12449   47  18
0  35
  1    0   0  210   322    7 11113  309 3273  351    0 12527   46  17
0  37
  2    1   0  212   387    8 10306  370 2977  354    0 11320   45  16
0  38
  3    0   0  230   276    7 10332  257 2947  341    0 11901   43  16
0  40
  4    0   0  234   306    7 11324  290 3265  352    0 12805   45  18
0  37
  5    0   0  212   284    7 10590  262 3042  388    0 11789   44  17
0  39
  6    1   0  154   307   48 9583  241 2903  324    0 10564   50  15   0  35
  7    0   0  840   535  206 10354  247 3035  428    0 11700   42  22
0  37
 16    0   0  169   303    5 7446  286 2250  290    0  8361   42  13   0  45
 17    0   0  173   240    5 7640  225 2288  295    0  8674   41  13   0  47
 18    0   0  170   289    5 7445  270 2108  286    0  8167   41  12   0  47
 19    0   0  176 51118    5 7365  197 2138  288    0  7934   40  13   0  47
 20    1   0  172   222    6 7835  204 2323  298    0  8759   40  14   0  46
 21    0   0  167   233    5 7749  218 2339  326    0  8264   42  13   0  46
 22    0   0  749  6612 6516 4173   97 1166  421    0  4741   23  44   0  33
 23    0   0  181   239    6 7709  219 2258  383    0  8402   41  12   0  47
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  198   439    6 10364  417 3113  327    0 11962   49  17
0  34
  1    0   0  210   299    6 10655  282 3135  346    0 12463   47  17
0  36
  2    0   0  202   352    8 9960  332 2890  320    0 11261   47  16   0  37
  3    0   0  182   276    6 9950  255 2857  334    0 11021   46  16   0  38
  4    0   0  200   305    6 10841  286 3127  325    0 12440   48  18
0  35
  5    0   0  240   286    6 9983  272 2912  358    0 11450   46  16   0  37
  6    0   0  153   323   81 9062  233 2767  300    0  9675   49  18   0  33
  7    0   0  850   556  206 10027  271 2910  415    0 11048   43  22
0  35
 16    0   0  152   306    5 7261  291 2216  266    0  8055   44  12   0  44
 17    0   0  151   236    5 7193  217 2170  283    0  8099   43  12   0  45
 18    0   0  170   263    5 7008  246 2009  254    0  7836   43  12   0  46
 19    0   0  165 47738    5 6824  197 1989  273    0  7663   42  12   0  46
 20    0   0  188   217    6 7496  197 2222  280    0  8435   43  13   0  44
 21    0   0  179   248    5 7352  234 2233  309    0  8237   43  12   0  44
 22    0   0  813  6041 5963 4006   82 1125  448    0  4442   25  42   0  33
 23    0   0  162   241    5 7364  225 2170  355    0  7720   43  11   0  45




Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> What its saying is that there are holds/waits in trying to get locks
>> which are locked at Solaris user library levels called from the
>> postgresql functions:
>> For example both the following functions are hitting on the same mutex
>> lock  0x10059e280  in Solaris Library call:
>> postgres`AllocSetDelete+0x98
>> postgres`AllocSetAlloc+0x1c4
>>
>
> That's a perfect example of the sort of useless overhead that I was
> complaining of just now in pgsql-patches.  Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded.  And yet, apparently you can't turn
> that off in Solaris.
>
> (Fortunately, the palloc layer is probably insulating us from malloc's
> performance enough that this isn't a huge deal.  But it's annoying.)
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: User concurrency thresholding: where do I look?

From

David Boreham

Date:

21 July 2007, 00:57:15

Tom Lane wrote:
> Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded.
>
Hmm...confused. I'm not following why then there is contention for the
mutex.
Surely this has to be some other mutex that is in contention, not a heap
lock ?

It'd be handy to see the call stack for the wait state -- if the thing
is spending
a significant proportion of its time in contention it should be easy to
get that with
a simple tool such as pstack or a debugger.

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

21 July 2007, 01:26:33

David Boreham <david_list@boreham.org> writes:
> Tom Lane wrote:
>> Having malloc/free use
>> an internal mutex is necessary in multi-threaded programs, but the
>> backend isn't multi-threaded.

> Hmm...confused. I'm not following why then there is contention for the
> mutex.

There isn't any contention for that mutex; Jignesh's results merely show
that it was taken and released a lot of times.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 05:50:32

On Fri, 2007-07-20 at 16:57 -0400, Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> > What its saying is that there are holds/waits in trying to get locks
> > which are locked at Solaris user library levels called from the
> > postgresql functions:
> > For example both the following functions are hitting on the same mutex
> > lock  0x10059e280  in Solaris Library call:
> > postgres`AllocSetDelete+0x98
> > postgres`AllocSetAlloc+0x1c4
>
> That's a perfect example of the sort of useless overhead that I was
> complaining of just now in pgsql-patches.  Having malloc/free use
> an internal mutex is necessary in multi-threaded programs, but the
> backend isn't multi-threaded.  And yet, apparently you can't turn
> that off in Solaris.
>
> (Fortunately, the palloc layer is probably insulating us from malloc's
> performance enough that this isn't a huge deal.  But it's annoying.)

There is one thing that the palloc layer doesn't handle: EState. All
other memory contexts have a very well chosen initial allocation that
prevents mallocs during low-medium complexity OLTP workloads.

EState is about 8300 bytes, so just above the large allocation limit.
This means that every time we request an EState, i.e. at least once per
statement we need to malloc() and then later free().

Would it be worth a special case in the palloc system to avoid having to
repeatedly issue external memory allocation calls?

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

23 July 2007, 11:13:52

"Simon Riggs" <simon@2ndquadrant.com> writes:
> EState is about 8300 bytes,

What?

(gdb) p sizeof(EState)
$1 = 112

This is on a 32-bit machine, but even on 64-bit it wouldn't be more than
double that.

> Would it be worth a special case in the palloc system to avoid having to
> repeatedly issue external memory allocation calls?

The appropriate hack would be to change the AllocSetContextCreate
initial-size parameter for the containing context.  But I really have
no idea what you're on about.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 11:38:43

On Mon, 2007-07-23 at 10:11 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > EState is about 8300 bytes,
>
> What?
>
> (gdb) p sizeof(EState)
> $1 = 112
>
> This is on a 32-bit machine, but even on 64-bit it wouldn't be more than
> double that.
>
> > Would it be worth a special case in the palloc system to avoid having to
> > repeatedly issue external memory allocation calls?
>
> The appropriate hack would be to change the AllocSetContextCreate
> initial-size parameter for the containing context.  But I really have
> no idea what you're on about.

I looked at this last May and my notes say "ExecutorState". I guess that
was wrong, but my analysis showed there was a single malloc of 8228
bytes happening once per query during my tests.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

23 July 2007, 11:55:09

"Simon Riggs" <simon@2ndquadrant.com> writes:
> I looked at this last May and my notes say "ExecutorState". I guess that
> was wrong, but my analysis showed there was a single malloc of 8228
> bytes happening once per query during my tests.

Well, if you can track down where it's coming from, we could certainly
hack the containing context's parameters.  But EState's not it.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 12:47:28

On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > I looked at this last May and my notes say "ExecutorState". I guess that
> > was wrong, but my analysis showed there was a single malloc of 8228
> > bytes happening once per query during my tests.
>
> Well, if you can track down where it's coming from, we could certainly
> hack the containing context's parameters.  But EState's not it.

Well, I discover there is an allocation of 8232 (inflation...) made once
per statement by a memory context called... ExecutorState. Still not
sure exactly which allocation this is, but its definitely once per
statement on pgbench, which should narrow it down. Plan, query etc?

I don't see a way to hack the allocation, since the max chunk size is
8K.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

23 July 2007, 13:36:02

"Simon Riggs" <simon@2ndquadrant.com> writes:
> Well, I discover there is an allocation of 8232 (inflation...) made once
> per statement by a memory context called... ExecutorState. Still not
> sure exactly which allocation this is, but its definitely once per
> statement on pgbench, which should narrow it down. Plan, query etc?

Are you working with stock sources?  The only allocation exceeding 1K
that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes.
(Checked by setting a conditional breakpoint on AllocSetAlloc.)  The
path that allocates a single-chunk block is never taken at all.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 14:50:17

On Mon, 2007-07-23 at 16:48 +0100, Simon Riggs wrote:
> On Mon, 2007-07-23 at 10:54 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> > > I looked at this last May and my notes say "ExecutorState". I guess that
> > > was wrong, but my analysis showed there was a single malloc of 8228
> > > bytes happening once per query during my tests.
> >
> > Well, if you can track down where it's coming from, we could certainly
> > hack the containing context's parameters.  But EState's not it.
>
> Well, I discover there is an allocation of 8232 (inflation...) made once
> per statement by a memory context called... ExecutorState. Still not
> sure exactly which allocation this is, but its definitely once per
> statement on pgbench, which should narrow it down. Plan, query etc?
>
> I don't see a way to hack the allocation, since the max chunk size is
> 8K.

It is the allocation of BTScanOpaqueData called from btrescan() in
nbtree.c

currPos and markPos are defined as BTScanPosData, which is an array of
BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
seems wasteful since markPos is only ever used during merge joins. Most
of that space isn't even used during merge joins either, we just do that
to slightly optimise the speed of the restore during merge joins.

Seems like we should allocate the memory when we do the first mark.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 14:50:22

On Mon, 2007-07-23 at 12:35 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Well, I discover there is an allocation of 8232 (inflation...) made once
> > per statement by a memory context called... ExecutorState. Still not
> > sure exactly which allocation this is, but its definitely once per
> > statement on pgbench, which should narrow it down. Plan, query etc?
>
> Are you working with stock sources?  The only allocation exceeding 1K
> that I can see during pgbench is BTScanOpaqueData, which is 6600 bytes.
> (Checked by setting a conditional breakpoint on AllocSetAlloc.)  The
> path that allocates a single-chunk block is never taken at all.

I do have the bitmap patch currently applied, but it doesn't touch that
part of the code.

(gdb) p size
$1 = 8232

(gdb) p sizeof(int)
$2 = 4

(gdb) p sizeof(BTScanPosData)
$3 = 4104

Since my notes say I got 8228 last year, seems reasonable.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

23 July 2007, 15:19:34

"Simon Riggs" <simon@2ndquadrant.com> writes:
> currPos and markPos are defined as BTScanPosData, which is an array of
> BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
> seems wasteful since markPos is only ever used during merge joins. Most
> of that space isn't even used during merge joins either, we just do that
> to slightly optimise the speed of the restore during merge joins.

Ah.  I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but
I forgot that both of those architectures have MAXALIGN = 8.  On a
MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly
larger, leading to larger BTScanPosData.

Not sure it's worth fooling with, given that these days almost everyone
who's seriously concerned about performance is probably using 64bit
hardware.  One less malloc cycle per indexscan is never going to be a
measurable savings anyway...

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

23 July 2007, 15:30:21

On Mon, 2007-07-23 at 14:19 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > currPos and markPos are defined as BTScanPosData, which is an array of
> > BTScanPosItems. That makes BTScanOpaqueData up to 8232 bytes, which
> > seems wasteful since markPos is only ever used during merge joins. Most
> > of that space isn't even used during merge joins either, we just do that
> > to slightly optimise the speed of the restore during merge joins.
>
> Ah.  I was seeing it as 6600 bytes on HPPA and 6608 on x86_64, but
> I forgot that both of those architectures have MAXALIGN = 8.  On a
> MAXALIGN = 4 machine, MaxIndexTuplesPerPage will be significantly
> larger, leading to larger BTScanPosData.
>
> Not sure it's worth fooling with, given that these days almost everyone
> who's seriously concerned about performance is probably using 64bit
> hardware.  One less malloc cycle per indexscan is never going to be a
> measurable savings anyway...

Oh sure, I was thinking to avoid Solaris' mutex by avoiding malloc()
completely.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

25 July 2007, 19:59:03

Here is how I got the numbers..
I had about 1600 users login into postgresql. Then started the run with
500 users and using DTrace I started tracking Postgresql  Locking "as
viewed from one user/connection". Echo statements indicate how many
users were active at that point and how was throughput performing. All
IO is done on /tmp which means on a RAM disk.

bash-3.00# echo 500 users - baseline number
500 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
    FirstLockMgrLock       Exclusive               1
    RelCacheInitLock       Exclusive               2
          SInvalLock       Exclusive               2
       WALInsertLock       Exclusive              10
      BufMappingLock       Exclusive              12
      CheckpointLock          Shared              29
 CheckpointStartLock          Shared              29
          OidGenLock       Exclusive              29
          XidGenLock       Exclusive              29
    FirstLockMgrLock          Shared              33
 CheckpointStartLock       Exclusive              78
       FreeSpaceLock       Exclusive             114
          OidGenLock          Shared             126
          XidGenLock          Shared             152
       ProcArrayLock          Shared             482

             Lock Id   Combined Time (ns)
          SInvalLock                29800
    RelCacheInitLock                30300
      BufMappingLock               168800
    FirstLockMgrLock               414300
       FreeSpaceLock              1281700
       ProcArrayLock              7869900
       WALInsertLock             11113200
 CheckpointStartLock             13494700
          OidGenLock             25719100
          XidGenLock             26443300
      CheckpointLock            194267800

bash-3.00# echo 600 users - Throughput rising
600 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
      BufMappingLock       Exclusive               2
     CLogControlLock       Exclusive               2
       WALInsertLock       Exclusive              11
    FirstLockMgrLock          Shared              20
      CheckpointLock          Shared              24
 CheckpointStartLock          Shared              24
          OidGenLock       Exclusive              24
          XidGenLock       Exclusive              24
 CheckpointStartLock       Exclusive              72
       FreeSpaceLock       Exclusive             102
          OidGenLock          Shared             106
          XidGenLock          Shared             128
       ProcArrayLock          Shared             394

             Lock Id   Combined Time (ns)
          SInvalLock                15600
    RelCacheInitLock                15700
      BufMappingLock                31000
     CLogControlLock                41500
    FirstLockMgrLock               289000
       FreeSpaceLock              3045400
 CheckpointStartLock              7371800
       WALInsertLock              9383200
       ProcArrayLock             10457900
          OidGenLock             20005900
          XidGenLock             20331500
      CheckpointLock            187067900

bash-3.00# echo 700 users  - Throughput rising
700 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
      BufMappingLock       Exclusive               2
       WALInsertLock       Exclusive              17
      CheckpointLock          Shared              33
 CheckpointStartLock          Shared              33
          OidGenLock       Exclusive              33
          XidGenLock       Exclusive              33
    FirstLockMgrLock          Shared              81
 CheckpointStartLock       Exclusive              87
       FreeSpaceLock       Exclusive             124
          OidGenLock          Shared             125
          XidGenLock          Shared             150
       ProcArrayLock          Shared             500

             Lock Id   Combined Time (ns)
    RelCacheInitLock                15100
          SInvalLock                15400
      BufMappingLock                47400
    FirstLockMgrLock              3021000
       FreeSpaceLock              3794300
       WALInsertLock              7567300
          XidGenLock             18427400
       ProcArrayLock             20884000
 CheckpointStartLock             24084900
          OidGenLock             26399500
      CheckpointLock            256549800

bash-3.00# echo 800 users  - Throughput rising
800 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive              11
      CheckpointLock          Shared              27
 CheckpointStartLock          Shared              27
          OidGenLock       Exclusive              27
          XidGenLock       Exclusive              27
    FirstLockMgrLock          Shared              32
 CheckpointStartLock       Exclusive              73
       FreeSpaceLock       Exclusive             110
          OidGenLock          Shared             118
          XidGenLock          Shared             140
       ProcArrayLock          Shared             442

             Lock Id   Combined Time (ns)
        WALWriteLock                13900
          SInvalLock                15000
    RelCacheInitLock                15500
      BufMappingLock                18600
    FirstLockMgrLock               391100
       WALInsertLock              3953700
       FreeSpaceLock              4801300
 CheckpointStartLock             13131800
       ProcArrayLock             14480500
          OidGenLock             17736500
          XidGenLock             21723100
      CheckpointLock            206423500

bash-3.00# echo 850 users - SLIGHT DROP in throughput
850 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
    FirstLockMgrLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
        WALWriteLock       Exclusive               1
      BufMappingLock       Exclusive               3
       WALInsertLock       Exclusive               7
      CheckpointLock          Shared              39
 CheckpointStartLock          Shared              39
          OidGenLock       Exclusive              39
          XidGenLock       Exclusive              39
    FirstLockMgrLock          Shared              47
 CheckpointStartLock       Exclusive             113
       FreeSpaceLock       Exclusive             152
          OidGenLock          Shared             162
          XidGenLock          Shared             194
       ProcArrayLock          Shared             621

             Lock Id   Combined Time (ns)
        WALWriteLock                14200
    RelCacheInitLock                15100
          SInvalLock                15600
      BufMappingLock                64100
       WALInsertLock              2073200
    FirstLockMgrLock              3040300
       FreeSpaceLock              7329500
          OidGenLock             21619100
 CheckpointStartLock             23261300
       ProcArrayLock             23917500
          XidGenLock             24873100
      CheckpointLock            309221200

bash-3.00# echo 900 users - ANOTHER SLIGHT DROP IN THROUGPUT
900 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive               7
 CheckpointStartLock          Shared              13
          OidGenLock       Exclusive              13
      CheckpointLock          Shared              14
          XidGenLock       Exclusive              14
    FirstLockMgrLock          Shared              15
       FreeSpaceLock       Exclusive              51
          OidGenLock          Shared              51
          XidGenLock          Shared              62
 CheckpointStartLock       Exclusive             170
       ProcArrayLock          Shared             202

             Lock Id   Combined Time (ns)
        WALWriteLock                16800
    FirstLockMgrLock               170300
       FreeSpaceLock               601500
       ProcArrayLock              3971300
       WALInsertLock              7757200
          OidGenLock              8261900
          XidGenLock             18450900
 CheckpointStartLock             39155100
      CheckpointLock            143751500

bash-3.00# echo 950 users  - BIG DROP IN THROUGHPUT
950 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
       WALInsertLock       Exclusive               3
    FirstLockMgrLock          Shared               4
      CheckpointLock          Shared               7
 CheckpointStartLock          Shared               7
          OidGenLock       Exclusive               7
          XidGenLock       Exclusive               7
       FreeSpaceLock       Exclusive              29
          OidGenLock          Shared              30
          XidGenLock          Shared              36
       ProcArrayLock          Shared             115
 CheckpointStartLock       Exclusive             134

             Lock Id   Combined Time (ns)
    FirstLockMgrLock                64400
       FreeSpaceLock               342300
       WALInsertLock              1759600
          OidGenLock              4276900
       ProcArrayLock              6234300
          XidGenLock              6865000
 CheckpointStartLock             37590800
      CheckpointLock             58994300

bash-3.00# echo 1000 users - STEADY AT PREVIOUS LOW VALUE
1000 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
    RelCacheInitLock       Exclusive               1
          SInvalLock       Exclusive               1
       WALInsertLock       Exclusive               3
      CheckpointLock          Shared               9
 CheckpointStartLock          Shared               9
          OidGenLock       Exclusive               9
          XidGenLock       Exclusive               9
    FirstLockMgrLock          Shared              14
       FreeSpaceLock       Exclusive              33
          OidGenLock          Shared              37
          XidGenLock          Shared              44
 CheckpointStartLock       Exclusive             122
       ProcArrayLock          Shared             145

             Lock Id   Combined Time (ns)
    RelCacheInitLock                14300
          SInvalLock                15600
      BufMappingLock                21400
    FirstLockMgrLock               184000
       FreeSpaceLock               366200
       WALInsertLock              1769500
       ProcArrayLock              5076500
          XidGenLock              5898400
          OidGenLock              9244800
 CheckpointStartLock             31077500
      CheckpointLock             91861900

bash-3.00# echo 1050 users - SMALL  INCREASE
1050 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
      BufMappingLock       Exclusive               2
       WALInsertLock       Exclusive               9
      CheckpointLock          Shared              24
          XidGenLock       Exclusive              24
 CheckpointStartLock          Shared              25
          OidGenLock       Exclusive              25
    FirstLockMgrLock          Shared              30
       FreeSpaceLock       Exclusive             100
          OidGenLock          Shared             107
          XidGenLock          Shared             129
 CheckpointStartLock       Exclusive             153
       ProcArrayLock          Shared             400

             Lock Id   Combined Time (ns)
      BufMappingLock                36600
    FirstLockMgrLock               420600
       FreeSpaceLock              2998400
       WALInsertLock              3818300
       ProcArrayLock              8986900
          OidGenLock             18127200
          XidGenLock             18569200
 CheckpointStartLock             44795700
      CheckpointLock            206488400

bash-3.00# echo 1100 users - SMALL DROP AGAIN
1100 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
      BufMappingLock       Exclusive               1
       WALInsertLock       Exclusive               6
      CheckpointLock          Shared              11
          XidGenLock       Exclusive              11
 CheckpointStartLock          Shared              12
          OidGenLock       Exclusive              12
    FirstLockMgrLock          Shared              24
       FreeSpaceLock       Exclusive              39
          OidGenLock          Shared              44
          XidGenLock          Shared              51
 CheckpointStartLock       Exclusive              88
       ProcArrayLock          Shared             171

             Lock Id   Combined Time (ns)
      BufMappingLock                19500
    FirstLockMgrLock               302700
       FreeSpaceLock               511200
       ProcArrayLock              5042300
       WALInsertLock              5592800
 CheckpointStartLock             25009900
          OidGenLock             25231600
          XidGenLock            108045300
      CheckpointLock            379734000

bash-3.00# echo 1150 users  - STEADY AT LOW VALUE
1150 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
       WALInsertLock       Exclusive               2
      CheckpointLock          Shared               5
 CheckpointStartLock          Shared               6
          OidGenLock       Exclusive               6
          XidGenLock       Exclusive               6
    FirstLockMgrLock          Shared               8
       FreeSpaceLock       Exclusive              21
          OidGenLock          Shared              26
          XidGenLock          Shared              31
       ProcArrayLock          Shared              93
 CheckpointStartLock       Exclusive             122

             Lock Id   Combined Time (ns)
        WALWriteLock                14900
       WALInsertLock               116900
    FirstLockMgrLock               120600
       FreeSpaceLock              2177800
          XidGenLock              4899200
       ProcArrayLock             20721700
 CheckpointStartLock             27805200
      CheckpointLock             76369300
          OidGenLock            470145800

bash-3.00# echo 1250 users - STEADY AT LOW VALUE
1250 users
bash-3.00# ./3_lwlock_acquires.d 19178

             Lock Id            Mode           Count
      CheckpointLock          Shared               2
 CheckpointStartLock          Shared               2
          OidGenLock       Exclusive               2
       WALInsertLock       Exclusive               2
          XidGenLock       Exclusive               2
       FreeSpaceLock       Exclusive               9
          OidGenLock          Shared              10
          XidGenLock          Shared              12
       ProcArrayLock          Shared              36
 CheckpointStartLock       Exclusive             135

             Lock Id   Combined Time (ns)
       WALInsertLock                39500
       FreeSpaceLock                98600
       ProcArrayLock               318800
          XidGenLock              1379900
          OidGenLock              3437700
      CheckpointLock              9565200
 CheckpointStartLock             56547900

bash-3.00#

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

25 July 2007, 20:24:36

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> Here is how I got the numbers..
> I had about 1600 users login into postgresql. Then started the run with
> 500 users and using DTrace I started tracking Postgresql  Locking "as
> viewed from one user/connection". Echo statements indicate how many
> users were active at that point and how was throughput performing. All
> IO is done on /tmp which means on a RAM disk.

> bash-3.00# echo 500 users - baseline number
> 500 users
> bash-3.00# ./3_lwlock_acquires.d 19178

>              Lock Id            Mode           Count
>     FirstLockMgrLock       Exclusive               1
>     RelCacheInitLock       Exclusive               2
>           SInvalLock       Exclusive               2
>        WALInsertLock       Exclusive              10
>       BufMappingLock       Exclusive              12
>       CheckpointLock          Shared              29
>  CheckpointStartLock          Shared              29
>           OidGenLock       Exclusive              29
>           XidGenLock       Exclusive              29
>     FirstLockMgrLock          Shared              33
>  CheckpointStartLock       Exclusive              78
>        FreeSpaceLock       Exclusive             114
>           OidGenLock          Shared             126
>           XidGenLock          Shared             152
>        ProcArrayLock          Shared             482

I don't think I believe these numbers.  For one thing, CheckpointLock
is simply not ever taken in shared mode.  The ratios of counts for
different locks seems pretty improbable too, eg there is no way on
earth that the LockMgrLocks are taken more often shared than
exclusive (I would expect no shared acquires at all in the sort of
test you are running).  Not to mention that the absolute number of
counts seems way too low.  So I think the counting tool is broken.

>              Lock Id   Combined Time (ns)
>           SInvalLock                29800
>     RelCacheInitLock                30300
>       BufMappingLock               168800
>     FirstLockMgrLock               414300
>        FreeSpaceLock              1281700
>        ProcArrayLock              7869900
>        WALInsertLock             11113200
>  CheckpointStartLock             13494700
>           OidGenLock             25719100
>           XidGenLock             26443300
>       CheckpointLock            194267800

Combined time of what exactly?  It looks like this must be the total
duration the lock is held, at least assuming that the time for
CheckpointLock is correctly reported.  It'd be much more useful to see
the total time spent waiting to acquire the lock.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

26 July 2007, 11:30:06

The count is only for a 10-second snapshot.. Plus remember there are
about 1000 users running so the connection  being profiled only gets
0.01 of the period on CPU..  And the count is for that CONNECTION only.

Anyway using the lock wait script it shows the real picture as you
requested. Here the combined time means time "spent waiting" for the lock.

bash-3.00# echo 500 users
500 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
 CheckpointStartLock          Shared               3
          OidGenLock          Shared               4
       WALInsertLock       Exclusive               7
       FreeSpaceLock       Exclusive               8
          XidGenLock       Exclusive              15
 CheckpointStartLock       Exclusive              16

             Lock Id   Combined Time (ns)
          XidGenLock              3825000
          OidGenLock              5307100
       WALInsertLock              6317800
       FreeSpaceLock              7244100
 CheckpointStartLock             22199200

bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
       WALInsertLock       Exclusive               1
 CheckpointStartLock          Shared               4
 CheckpointStartLock       Exclusive              11
          XidGenLock       Exclusive              21

             Lock Id   Combined Time (ns)
          OidGenLock              1728200
       WALInsertLock              2040700
          XidGenLock             19878500
 CheckpointStartLock             24156500

bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock          Shared               1
          XidGenLock          Shared               1
 CheckpointStartLock          Shared               2
       WALInsertLock       Exclusive               4
 CheckpointStartLock       Exclusive               6
       FreeSpaceLock       Exclusive               6
          XidGenLock       Exclusive              13

             Lock Id   Combined Time (ns)
          OidGenLock              1730000
       WALInsertLock              7253400
       FreeSpaceLock             10977700
 CheckpointStartLock             13356800
          XidGenLock             38220500

bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
 CheckpointStartLock          Shared               1
       WALInsertLock       Exclusive               2
          XidGenLock          Shared               2
 CheckpointStartLock       Exclusive               5
       FreeSpaceLock       Exclusive               8
          XidGenLock       Exclusive              12

             Lock Id   Combined Time (ns)
       WALInsertLock              3746800
       FreeSpaceLock              7628900
 CheckpointStartLock             11297500
          XidGenLock             16649000

bash-3.00# echo 900 users  - BIG DROP IN THROUGHPUT OCCURS...
900 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
          OidGenLock          Shared               1
          XidGenLock          Shared               1
       FreeSpaceLock       Exclusive               2
       WALInsertLock       Exclusive               2
 CheckpointStartLock          Shared               6
          XidGenLock       Exclusive              12
 CheckpointStartLock       Exclusive             121

             Lock Id   Combined Time (ns)
          OidGenLock              1968100
       FreeSpaceLock              2076300
       WALInsertLock              2190400
          XidGenLock             20259400
 CheckpointStartLock           1407294300

bash-3.00# echo 950 users
950 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock       Exclusive               1
          OidGenLock          Shared               2
 CheckpointStartLock          Shared               3
       WALInsertLock       Exclusive               4
       FreeSpaceLock       Exclusive               5
          XidGenLock       Exclusive              11
 CheckpointStartLock       Exclusive              50

             Lock Id   Combined Time (ns)
       WALInsertLock              5577100
       FreeSpaceLock              9115900
          XidGenLock             13765800
          OidGenLock             50155500
 CheckpointStartLock            759872200

bash-3.00# echo 1000 users
1000 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
          OidGenLock          Shared               1
       WALInsertLock       Exclusive               1
          XidGenLock       Exclusive               5
 CheckpointStartLock          Shared               6
 CheckpointStartLock       Exclusive             102

             Lock Id   Combined Time (ns)
          OidGenLock                21900
       WALInsertLock                82500
          XidGenLock              3313400
 CheckpointStartLock           1448289900

bash-3.00# echo 1050 users
1050 users
bash-3.00# ./4_lwlock_waits.d 20764

             Lock Id            Mode           Count
       FreeSpaceLock       Exclusive               1
 CheckpointStartLock          Shared               3
          XidGenLock       Exclusive               3
 CheckpointStartLock       Exclusive             146

             Lock Id   Combined Time (ns)
       FreeSpaceLock                18400
          XidGenLock              1900900
 CheckpointStartLock           2392893700

bash-3.00#






-Jignesh




Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> Here is how I got the numbers..
>> I had about 1600 users login into postgresql. Then started the run with
>> 500 users and using DTrace I started tracking Postgresql  Locking "as
>> viewed from one user/connection". Echo statements indicate how many
>> users were active at that point and how was throughput performing. All
>> IO is done on /tmp which means on a RAM disk.
>>
>
>
>> bash-3.00# echo 500 users - baseline number
>> 500 users
>> bash-3.00# ./3_lwlock_acquires.d 19178
>>
> I don't think I believe these numbers.  For one thing, CheckpointLock
> is simply not ever taken in shared mode.  The ratios of counts for
> different locks seems pretty improbable too, eg there is no way on
> earth that the LockMgrLocks are taken more often shared than
> exclusive (I would expect no shared acquires at all in the sort of
> test you are running).  Not to mention that the absolute number of
> counts seems way too low.  So I think the counting tool is broken.
>
>
> Combined time of what exactly?  It looks like this must be the total
> duration the lock is held, at least assuming that the time for
> CheckpointLock is correctly reported.  It'd be much more useful to see
> the total time spent waiting to acquire the lock.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

26 July 2007, 11:40:27

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> The count is only for a 10-second snapshot.. Plus remember there are
> about 1000 users running so the connection  being profiled only gets
> 0.01 of the period on CPU..  And the count is for that CONNECTION only.

OK, that explains the low absolute levels of the counts, but if the
counts are for a regular backend then there's still a lot of bogosity
here.  Backends won't be taking the CheckpointLock at all, nor do they
take CheckpointStartLock in exclusive mode.  The bgwriter would do that
but it'd not be taking most of these other locks.  So I think the script
is mislabeling the locks somehow.

Also, elementary statistics should tell you that a sample taken as above
is going to have enormous amounts of noise.  You should be sampling over
a much longer period, say on the order of a minute of runtime, to have
numbers that are trustworthy.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

26 July 2007, 12:25:01

On Thu, 2007-07-26 at 10:29 -0400, Jignesh K. Shah wrote:
> The count is only for a 10-second snapshot.. Plus remember there are
> about 1000 users running so the connection  being profiled only gets
> 0.01 of the period on CPU..  And the count is for that CONNECTION only.

Is that for one process, or all processes aggregated in some way?

>  CheckpointStartLock          Shared               6
>  CheckpointStartLock       Exclusive             102

That's definitely whacked. Surely we didn't start 102 checkpoints yet
attempt to commit 6 times?

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

26 July 2007, 12:27:37

I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..


For  600-850 Users: that potential mislabeled CheckPointStartLock  or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:



              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage_ReadOnly+0xc
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
          8202100

              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
         30822100


However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:



              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        406601300

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesUpdate+0x360
              postgres`heap_lock_tuple+0x27c
              postgres`ExecutePlan+0x33c
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        444523100

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
       1661300000



Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..

-Jignesh




Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> The count is only for a 10-second snapshot.. Plus remember there are
>> about 1000 users running so the connection  being profiled only gets
>> 0.01 of the period on CPU..  And the count is for that CONNECTION only.
>>
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here.  Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode.  The bgwriter would do that
> but it'd not be taking most of these other locks.  So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise.  You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
>             regards, tom lane
>

Re: User concurrency thresholding: where do I look?

From

Tom Lane

Date:

26 July 2007, 12:43:21

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> For  600-850 Users: that potential mislabeled CheckPointStartLock  or
> LockID==12 comes from various sources where the top source (while system
> is still doing great) comes from:

>               postgres`LWLockAcquire+0x1c8
>               postgres`SimpleLruReadPage_ReadOnly+0xc
>               postgres`TransactionIdGetStatus+0x14

That path would be taking CLogControlLock ... so you're off by at least
one.  Compare the script to src/include/storage/lwlock.h.

            regards, tom lane

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

26 July 2007, 12:50:37

On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:

> However at 900 Users where the big drop in throughput occurs:
> It gives a different top "consumer" of time:

                postgres`LWLockAcquire+0x1c8
>               postgres`SimpleLruReadPage+0x1ac
>               postgres`TransactionIdGetStatus+0x14
>               postgres`TransactionLogFetch+0x58

TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
Presumably the compiler has been rearranging things??

Looks like you're out of clog buffers. It seems like the clog buffers
aren't big enough to hold clog pages for long enough and the SELECT FOR
SHARE processing is leaving lots of additional read locks that are
increasing the number of clog requests for older xids.

Try the enclosed patch.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Attachment

clog_buffers.v1.test.patch

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

26 July 2007, 16:44:56

BEAUTIFUL!!!

Using the Patch I can now go upto 1300 users without dropping.. But now
it still repeats at 1300-1350 users..

I corrected the Lock Descriptions based on what I got from lwlock.h and
retried the whole scalability again with profiling again.. This time it
looks like the ProcArrayLock


bash-3.00# echo 600 users
600 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
          XidGenLock       Exclusive               1
     CLogControlLock          Shared               2
          XidGenLock          Shared               2
        WALWriteLock       Exclusive               4
       WALInsertLock       Exclusive               8
     CLogControlLock       Exclusive               9
       ProcArrayLock       Exclusive               9

             Lock Id   Combined Time (ns)
        WALWriteLock              2842300
          XidGenLock              4951000
     CLogControlLock             11151800
       WALInsertLock             13035600
       ProcArrayLock             20040000

bash-3.00# echo 700 users
700 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
          XidGenLock       Exclusive               1
        WALWriteLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
     CLogControlLock       Exclusive               8
       WALInsertLock       Exclusive               9
       ProcArrayLock       Exclusive              22

             Lock Id   Combined Time (ns)
          XidGenLock              4093300
        WALWriteLock              4914800
       WALInsertLock              7389100
       ProcArrayLock             10248200
     CLogControlLock             11989400

bash-3.00# echo 800 users
800 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
     CLogControlLock       Exclusive               7
       WALInsertLock       Exclusive               7
       ProcArrayLock       Exclusive              31

             Lock Id   Combined Time (ns)
        WALWriteLock               319100
          XidGenLock              5388700
       WALInsertLock              9901400
     CLogControlLock             13465000
       ProcArrayLock             42979700

bash-3.00# echo 900 users
900 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
     CLogControlLock          Shared               1
          XidGenLock       Exclusive               1
        WALWriteLock       Exclusive               2
     CLogControlLock       Exclusive               6
       WALInsertLock       Exclusive               9
       ProcArrayLock       Exclusive              25

             Lock Id   Combined Time (ns)
          XidGenLock              3197700
        WALWriteLock              3887100
     CLogControlLock             15774500
       WALInsertLock             38268700
       ProcArrayLock            162073100

bash-3.00# ./6_lwlock_stack.d 4 7056

             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
       ProcArrayLock       Exclusive              67

             Lock Id   Combined Time (ns)
       ProcArrayLock            216773800

             Lock Id   Combined Time (ns)


              postgres`LWLockAcquire+0x1c8
              postgres`GetSnapshotData+0x118
              postgres`GetTransactionSnapshot+0x5c
              postgres`PortalStart+0x150
              postgres`exec_bind_message+0x81c
              postgres`PostgresMain+0x12b8
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
          2779000

              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        213994800

bash-3.00# echo 1000 users - SLIGHT DROP
1000 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
        WALWriteLock       Exclusive               1
     CLogControlLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock          Shared               3
       WALInsertLock       Exclusive               4
       ProcArrayLock       Exclusive              26

             Lock Id   Combined Time (ns)
        WALWriteLock              1807400
          XidGenLock              2024000
       WALInsertLock              2177500
     CLogControlLock              9064700
       ProcArrayLock            199216000

bash-3.00# ./6_lwlock_stack.d 4 7056

             Lock Id            Mode           Count
       ProcArrayLock          Shared               3
       ProcArrayLock       Exclusive              38

             Lock Id   Combined Time (ns)
       ProcArrayLock            858238600

             Lock Id   Combined Time (ns)


              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdIsInProgress+0x50
              postgres`HeapTupleSatisfiesVacuum+0x2ec
              postgres`_bt_check_unique+0x2a0
              postgres`_bt_doinsert+0x98
              postgres`btinsert+0x54
              postgres`FunctionCall6+0x44
              postgres`index_insert+0x90
              postgres`ExecInsertIndexTuples+0x1bc
              postgres`ExecUpdate+0x500
              postgres`ExecutePlan+0x704
              postgres`ExecutorRun+0x60
              postgres`PortalRunMulti+0x2a0
              postgres`PortalRun+0x310
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
           167100

              postgres`LWLockAcquire+0x1c8
              postgres`GetSnapshotData+0x118
              postgres`GetTransactionSnapshot+0x5c
              postgres`PortalRunMulti+0x22c
              postgres`PortalRun+0x310
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
          7125900

              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        850945600

bash-3.00# echo 1100 users - DROP ....
1100 users
bash-3.00# ./4_lwlock_waits.d 7056

             Lock Id            Mode           Count
     CLogControlLock          Shared               1
        WALWriteLock       Exclusive               1
          XidGenLock       Exclusive               1
       ProcArrayLock          Shared               2
       WALInsertLock       Exclusive               2
          XidGenLock          Shared               2
     CLogControlLock       Exclusive               4
       ProcArrayLock       Exclusive              20

             Lock Id   Combined Time (ns)
        WALWriteLock              4179500
          XidGenLock              6249400
     CLogControlLock             20411100
       WALInsertLock             29707600
       ProcArrayLock            207923700

bash-3.00# ./6_lwlock_stack.d 4 7056

             Lock Id            Mode           Count
       ProcArrayLock       Exclusive              40

             Lock Id   Combined Time (ns)
       ProcArrayLock            692242100

             Lock Id   Combined Time (ns)


              postgres`LWLockAcquire+0x1c8
              postgres`CommitTransaction+0xe0
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        692242100

bash-3.00#




LockID for ProcArrayLock is 4 or the 5 entry in lwlock.h which seems to
indicate it is lwlock.h
Any tweaks for that?

-Jignesh


Simon Riggs wrote:
> On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
>
>
>> However at 900 Users where the big drop in throughput occurs:
>> It gives a different top "consumer" of time:
>>
>
>
>                 postgres`LWLockAcquire+0x1c8
>
>>               postgres`SimpleLruReadPage+0x1ac
>>               postgres`TransactionIdGetStatus+0x14
>>               postgres`TransactionLogFetch+0x58
>>
>
> TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
> Presumably the compiler has been rearranging things??
>
> Looks like you're out of clog buffers. It seems like the clog buffers
> aren't big enough to hold clog pages for long enough and the SELECT FOR
> SHARE processing is leaving lots of additional read locks that are
> increasing the number of clog requests for older xids.
>
> Try the enclosed patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/include/access/clog.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
> retrieving revision 1.19
> diff -c -r1.19 clog.h
> *** src/include/access/clog.h    5 Jan 2007 22:19:50 -0000    1.19
> --- src/include/access/clog.h    26 Jul 2007 15:44:58 -0000
> ***************
> *** 29,35 ****
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    8
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
> --- 29,35 ----
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    64
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: User concurrency thresholding: where do I look?

From

Robert Lor

Date:

26 July 2007, 17:02:03

Tom Lane wrote:
>
> That path would be taking CLogControlLock ... so you're off by at least
> one.  Compare the script to src/include/storage/lwlock.h.
>

Indeed, the indexing was off by one due to the removal of
BuffMappingLock in src/include/storage/lwlock.h between 8.1 and 8.2
which was not updated in the DTrace script.

Thanks,
Robert

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

26 July 2007, 17:04:18

On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
> BEAUTIFUL!!!
>
> Using the Patch I can now go upto 1300 users without dropping.. But now
> it still repeats at 1300-1350 users..

OK, can you try again with 16 and 32 buffers please?  We need to know
how many is enough and whether this number needs to be variable via a
parameter, or just slightly larger by default. Thanks.

> I corrected the Lock Descriptions based on what I got from lwlock.h and
> retried the whole scalability again with profiling again.. This time it
> looks like the ProcArrayLock

That's what I would expect with that many users.

>              Lock Id            Mode           Count
>           XidGenLock       Exclusive               1
>      CLogControlLock          Shared               2
>           XidGenLock          Shared               2
>         WALWriteLock       Exclusive               4
>        WALInsertLock       Exclusive               8
>      CLogControlLock       Exclusive               9
>        ProcArrayLock       Exclusive               9

...but as Tom mentioned, we need to do longer runs now so these counts
get to somewhere in the hundreds so we have some statistical validity.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

26 July 2007, 18:18:09

Will try 16 and 32  CLOGBUFFER tomorrow:

But here is locks data again with about increased time profiling (about
2 minutes) for the connection with about 2000 users:

bash-3.00# time ./4_lwlock_waits.d 13583
^C

             Lock Id            Mode           Count
       ProcArrayLock          Shared               5
          XidGenLock       Exclusive              13
     CLogControlLock          Shared              14
          XidGenLock          Shared              15
     CLogControlLock       Exclusive              21
       WALInsertLock       Exclusive              77
        WALWriteLock       Exclusive             175
       ProcArrayLock       Exclusive             275

             Lock Id   Combined Time (ns)
          XidGenLock            194966200
       WALInsertLock            517955000
     CLogControlLock            679665100
        WALWriteLock           2838716200
       ProcArrayLock          44181002600


Top Wait time   seems to come from the following code path for
ProcArrayLock:

             Lock Id            Mode           Count
       ProcArrayLock       Exclusive              21

             Lock Id   Combined Time (ns)
       ProcArrayLock           5255937500

             Lock Id   Combined Time (ns)


              postgres`LWLockAcquire+0x1f0
              postgres`CommitTransaction+0x104
              postgres`CommitTransactionCommand+0xbc
              postgres`finish_xact_command+0x78
              postgres`exec_execute_message+0x42c
              postgres`PostgresMain+0x1838
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
       5255937500



Regards,
Jignesh


Simon Riggs wrote:
> On Thu, 2007-07-26 at 15:44 -0400, Jignesh K. Shah wrote:
>
>> BEAUTIFUL!!!
>>
>> Using the Patch I can now go upto 1300 users without dropping.. But now
>> it still repeats at 1300-1350 users..
>>
>
> OK, can you try again with 16 and 32 buffers please?  We need to know
> how many is enough and whether this number needs to be variable via a
> parameter, or just slightly larger by default. Thanks.
>
>
>> I corrected the Lock Descriptions based on what I got from lwlock.h and
>> retried the whole scalability again with profiling again.. This time it
>> looks like the ProcArrayLock
>>
>
> That's what I would expect with that many users.
>
>
>>              Lock Id            Mode           Count
>>           XidGenLock       Exclusive               1
>>      CLogControlLock          Shared               2
>>           XidGenLock          Shared               2
>>         WALWriteLock       Exclusive               4
>>        WALInsertLock       Exclusive               8
>>      CLogControlLock       Exclusive               9
>>        ProcArrayLock       Exclusive               9
>>
>
> ...but as Tom mentioned, we need to do longer runs now so these counts
> get to somewhere in the hundreds so we have some statistical validity.
>
>

Re: User concurrency thresholding: where do I look?

From

Alvaro Herrera

Date:

27 July 2007, 05:58:17

Jignesh K. Shah wrote:

> Top Wait time   seems to come from the following code path for
> ProcArrayLock:
>
>             Lock Id            Mode           Count
>       ProcArrayLock       Exclusive              21
>
>             Lock Id   Combined Time (ns)
>       ProcArrayLock           5255937500
>
>             Lock Id   Combined Time (ns)
>
>
>              postgres`LWLockAcquire+0x1f0
>              postgres`CommitTransaction+0x104

Yeah, ProcArrayLock is pretty contended.  I think it would be kinda neat
if we could split it up in partitions.  This lock is quite particular
though.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

27 July 2007, 06:14:32

On Fri, 2007-07-27 at 04:58 -0400, Alvaro Herrera wrote:
> Jignesh K. Shah wrote:
>
> > Top Wait time   seems to come from the following code path for
> > ProcArrayLock:
> >
> >             Lock Id            Mode           Count
> >       ProcArrayLock       Exclusive              21
> >
> >             Lock Id   Combined Time (ns)
> >       ProcArrayLock           5255937500
> >
> >             Lock Id   Combined Time (ns)
> >
> >
> >              postgres`LWLockAcquire+0x1f0
> >              postgres`CommitTransaction+0x104
>
> Yeah, ProcArrayLock is pretty contended.  I think it would be kinda neat
> if we could split it up in partitions.  This lock is quite particular
> though.

Maybe, if we did we should set the partitions according to numbers of
users, so lower numbers of users are all in one partition.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: User concurrency thresholding: where do I look?

From

"Simon Riggs"

Date:

27 July 2007, 06:14:48

On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>              Lock Id   Combined Time (ns)
>           XidGenLock            194966200
>        WALInsertLock            517955000
>      CLogControlLock            679665100
>         WALWriteLock           2838716200
>        ProcArrayLock          44181002600

Is this the time the lock is held for or the time that we wait for that
lock? It would be good to see the break down of time separately for
shared and exclusive.

Can we have a table like this:
    LockId,LockMode,SumTimeLockHeld,SumTimeLockWait

> Top Wait time   seems to come from the following code path for
> ProcArrayLock:
>
>              Lock Id            Mode           Count
>        ProcArrayLock       Exclusive              21
>
>              Lock Id   Combined Time (ns)
>        ProcArrayLock           5255937500
>
>              Lock Id   Combined Time (ns)
>
>
>               postgres`LWLockAcquire+0x1f0
>               postgres`CommitTransaction+0x104
>               postgres`CommitTransactionCommand+0xbc
>               postgres`finish_xact_command+0x78

Well thats pretty weird. That code path clearly only happens once per
transaction and ought to be fast. The other code paths that take
ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
ought to spend more time holding the lock. Presumably you are running
with a fair number of SERIALIZABLE transactions?

Are you running with commit_delay > 0? Its possible that the call to
CountActiveBackends() is causing pinging of the procarray by other
backends while we're trying to read it during CommitTransaction(). If
so, try the attached patch.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Attachment

remove_commit_delay.v1.test.patch

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

27 July 2007, 09:45:08

Yes I can try to breakup the Shared and exclusive time..

Also yes I use commit delays =10, it helps out a lot in reducing IO load..

I will try out the patch soon.

-Jignesh


Simon Riggs wrote:
> On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>
>>              Lock Id   Combined Time (ns)
>>           XidGenLock            194966200
>>        WALInsertLock            517955000
>>      CLogControlLock            679665100
>>         WALWriteLock           2838716200
>>        ProcArrayLock          44181002600
>>
>
> Is this the time the lock is held for or the time that we wait for that
> lock? It would be good to see the break down of time separately for
> shared and exclusive.
>
> Can we have a table like this:
>     LockId,LockMode,SumTimeLockHeld,SumTimeLockWait
>
>
>> Top Wait time   seems to come from the following code path for
>> ProcArrayLock:
>>
>>              Lock Id            Mode           Count
>>        ProcArrayLock       Exclusive              21
>>
>>              Lock Id   Combined Time (ns)
>>        ProcArrayLock           5255937500
>>
>>              Lock Id   Combined Time (ns)
>>
>>
>>               postgres`LWLockAcquire+0x1f0
>>               postgres`CommitTransaction+0x104
>>               postgres`CommitTransactionCommand+0xbc
>>               postgres`finish_xact_command+0x78
>>
>
> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?
>
> Are you running with commit_delay > 0? Its possible that the call to
> CountActiveBackends() is causing pinging of the procarray by other
> backends while we're trying to read it during CommitTransaction(). If
> so, try the attached patch.
>
>

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

27 July 2007, 16:11:52

I tried CLOG Buffers 32 and the performance is as good as 64 bit.. (I
havent tried 16 yet though.. ) I am going to try your second patch now..

Also here is the breakup by Mode. The combined time is the total time it
waits for all counts.


             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
     CLogControlLock       Exclusive               4
     CLogControlLock          Shared               4
          XidGenLock          Shared               4
          XidGenLock       Exclusive               7
       WALInsertLock       Exclusive              21
        WALWriteLock       Exclusive              62
       ProcArrayLock       Exclusive              79

             Lock Id            Mode    Combined Time (ns)
     CLogControlLock        Exclusive               325200
     CLogControlLock           Shared              4509200
          XidGenLock        Exclusive             11839600
       ProcArrayLock           Shared             40506600
          XidGenLock           Shared            119013700
       WALInsertLock        Exclusive            148063100
        WALWriteLock        Exclusive            347052100
       ProcArrayLock        Exclusive           1054780600

Here is another one at higher user count 1600:

bash-3.00# ./4_lwlock_waits.d 9208

             Lock Id            Mode           Count
     CLogControlLock       Exclusive               1
     CLogControlLock          Shared               2
          XidGenLock          Shared               7
       WALInsertLock       Exclusive              12
        WALWriteLock       Exclusive              50
       ProcArrayLock       Exclusive              82

             Lock Id            Mode   Combined Time (ns)
     CLogControlLock        Exclusive                27300
          XidGenLock           Shared             14689300
     CLogControlLock           Shared             72664900
       WALInsertLock        Exclusive            101431300
        WALWriteLock        Exclusive            534357400
       ProcArrayLock        Exclusive           4110350300

Now I will try with your second patch.

Regards,
Jignesh

Simon Riggs wrote:
> On Thu, 2007-07-26 at 17:17 -0400, Jignesh K. Shah wrote:
>
>>              Lock Id   Combined Time (ns)
>>           XidGenLock            194966200
>>        WALInsertLock            517955000
>>      CLogControlLock            679665100
>>         WALWriteLock           2838716200
>>        ProcArrayLock          44181002600
>>
>
> Is this the time the lock is held for or the time that we wait for that
> lock? It would be good to see the break down of time separately for
> shared and exclusive.
>
> Can we have a table like this:
>     LockId,LockMode,SumTimeLockHeld,SumTimeLockWait
>
>
>> Top Wait time   seems to come from the following code path for
>> ProcArrayLock:
>>
>>              Lock Id            Mode           Count
>>        ProcArrayLock       Exclusive              21
>>
>>              Lock Id   Combined Time (ns)
>>        ProcArrayLock           5255937500
>>
>>              Lock Id   Combined Time (ns)
>>
>>
>>               postgres`LWLockAcquire+0x1f0
>>               postgres`CommitTransaction+0x104
>>               postgres`CommitTransactionCommand+0xbc
>>               postgres`finish_xact_command+0x78
>>
>
> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?
>
> Are you running with commit_delay > 0? Its possible that the call to
> CountActiveBackends() is causing pinging of the procarray by other
> backends while we're trying to read it during CommitTransaction(). If
> so, try the attached patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/backend/access/transam/xact.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
> retrieving revision 1.245
> diff -c -r1.245 xact.c
> *** src/backend/access/transam/xact.c    7 Jun 2007 21:45:58 -0000    1.245
> --- src/backend/access/transam/xact.c    27 Jul 2007 09:09:08 -0000
> ***************
> *** 820,827 ****
>                * are fewer than CommitSiblings other backends with active
>                * transactions.
>                */
> !             if (CommitDelay > 0 && enableFsync &&
> !                 CountActiveBackends() >= CommitSiblings)
>                   pg_usleep(CommitDelay);
>
>               XLogFlush(recptr);
> --- 820,826 ----
>                * are fewer than CommitSiblings other backends with active
>                * transactions.
>                */
> !             if (CommitDelay > 0 && enableFsync)
>                   pg_usleep(CommitDelay);
>
>               XLogFlush(recptr);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

27 July 2007, 17:05:06

Using CLOG Buffers 32 and the commit sibling check patch I still see a
drop at 1200-1300 users..



bash-3.00# ./4_lwlock_waits.d 18250

             Lock Id            Mode           Count
          XidGenLock          Shared               1
     CLogControlLock          Shared               2
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               4
     CLogControlLock       Exclusive              15
       WALInsertLock       Exclusive              18
        WALWriteLock       Exclusive              38
       ProcArrayLock       Exclusive              77

             Lock Id            Mode   Combined Time (ns)
          XidGenLock          Shared                88700
       WALInsertLock       Exclusive             69556000
       ProcArrayLock          Shared             95656800
          XidGenLock       Exclusive            139634100
     CLogControlLock       Exclusive            148822200
     CLogControlLock          Shared            161630000
        WALWriteLock       Exclusive            332781800
       ProcArrayLock       Exclusive           5688265500

bash-3.00# ./4_lwlock_waits.d 18599

             Lock Id            Mode           Count
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               3
          XidGenLock          Shared               4
     CLogControlLock          Shared               5
       WALInsertLock       Exclusive              10
     CLogControlLock       Exclusive              21
        WALWriteLock       Exclusive              28
       ProcArrayLock       Exclusive              54

             Lock Id            Mode   Combined Time (ns)
          XidGenLock       Exclusive              5688800
       WALInsertLock       Exclusive             11424700
     CLogControlLock          Shared             55589100
       ProcArrayLock          Shared            135220400
        WALWriteLock       Exclusive            177906900
          XidGenLock          Shared            524146500
     CLogControlLock       Exclusive            524563900
       ProcArrayLock       Exclusive           5828744500

bash-3.00#
bash-3.00# ./6_lwlock_stack.d 4 18599

             Lock Id            Mode           Count
       ProcArrayLock          Shared               1
       ProcArrayLock       Exclusive              52

             Lock Id            Mode   Combined Time (ns)
       ProcArrayLock          Shared             41428300
       ProcArrayLock       Exclusive           3858386500

             Lock Id   Combined Time (ns)


              postgres`LWLockAcquire+0x1f0
              postgres`GetSnapshotData+0x120
              postgres`GetTransactionSnapshot+0x80
              postgres`PortalStart+0x198
              postgres`exec_bind_message+0x84c
              postgres`PostgresMain+0x17f8
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
  Shared                                                     41428300

              postgres`LWLockAcquire+0x1f0
              postgres`CommitTransaction+0x104
              postgres`CommitTransactionCommand+0xbc
              postgres`finish_xact_command+0x78
              postgres`exec_execute_message+0x42c
              postgres`PostgresMain+0x1838
              postgres`BackendRun+0x2f8
              postgres`ServerLoop+0x680
              postgres`PostmasterMain+0xda8
              postgres`main+0x3d0
              postgres`_start+0x17c
  Exclusive                                                3858386500


-Jignesh

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

30 July 2007, 16:23:54

With CLOG 16 the drp[s comes at about 1150 users with the following lock
stats
bash-3.00# ./4_lwlock_waits.d 16404

             Lock Id            Mode           Count
       ProcArrayLock          Shared               2
          XidGenLock       Exclusive               2
          XidGenLock          Shared               4
       WALInsertLock       Exclusive               7
     CLogControlLock          Shared               8
        WALWriteLock       Exclusive              46
       ProcArrayLock       Exclusive              64
     CLogControlLock       Exclusive             263

             Lock Id            Mode   Combined Time (ns)
          XidGenLock       Exclusive               528300
       ProcArrayLock          Shared               968800
       WALInsertLock       Exclusive              4090900
          XidGenLock          Shared             73987600
        WALWriteLock       Exclusive             86200700
       ProcArrayLock       Exclusive            130756000
     CLogControlLock          Shared            240471000
     CLogControlLock       Exclusive           4115158500

So I think 32  is a better option for CLogs before ProcArrayLock becomes
the bottleneck.

Though I havent seen what we can do with ProcArrayLock problem.


Regards,
Jignesh



Jignesh K. Shah wrote:
> Using CLOG Buffers 32 and the commit sibling check patch I still see a
> drop at 1200-1300 users..
>
>
>
> bash-3.00# ./4_lwlock_waits.d 18250
>
>             Lock Id            Mode           Count
>          XidGenLock          Shared               1
>     CLogControlLock          Shared               2
>       ProcArrayLock          Shared               2
>          XidGenLock       Exclusive               4
>     CLogControlLock       Exclusive              15
>       WALInsertLock       Exclusive              18
>        WALWriteLock       Exclusive              38
>       ProcArrayLock       Exclusive              77
>
>             Lock Id            Mode   Combined Time (ns)
>          XidGenLock          Shared                88700
>       WALInsertLock       Exclusive             69556000
>       ProcArrayLock          Shared             95656800
>          XidGenLock       Exclusive            139634100
>     CLogControlLock       Exclusive            148822200
>     CLogControlLock          Shared            161630000
>        WALWriteLock       Exclusive            332781800
>       ProcArrayLock       Exclusive           5688265500
>
> bash-3.00# ./4_lwlock_waits.d 18599
>
>             Lock Id            Mode           Count
>       ProcArrayLock          Shared               2
>          XidGenLock       Exclusive               3
>          XidGenLock          Shared               4
>     CLogControlLock          Shared               5
>       WALInsertLock       Exclusive              10
>     CLogControlLock       Exclusive              21
>        WALWriteLock       Exclusive              28
>       ProcArrayLock       Exclusive              54
>
>             Lock Id            Mode   Combined Time (ns)
>          XidGenLock       Exclusive              5688800
>       WALInsertLock       Exclusive             11424700
>     CLogControlLock          Shared             55589100
>       ProcArrayLock          Shared            135220400
>        WALWriteLock       Exclusive            177906900
>          XidGenLock          Shared            524146500
>     CLogControlLock       Exclusive            524563900
>       ProcArrayLock       Exclusive           5828744500
>
> bash-3.00#
> bash-3.00# ./6_lwlock_stack.d 4 18599
>
>             Lock Id            Mode           Count
>       ProcArrayLock          Shared               1
>       ProcArrayLock       Exclusive              52
>
>             Lock Id            Mode   Combined Time (ns)
>       ProcArrayLock          Shared             41428300
>       ProcArrayLock       Exclusive           3858386500
>
>             Lock Id   Combined Time (ns)
>
>
>              postgres`LWLockAcquire+0x1f0
>              postgres`GetSnapshotData+0x120
>              postgres`GetTransactionSnapshot+0x80
>              postgres`PortalStart+0x198
>              postgres`exec_bind_message+0x84c
>              postgres`PostgresMain+0x17f8
>              postgres`BackendRun+0x2f8
>              postgres`ServerLoop+0x680
>              postgres`PostmasterMain+0xda8
>              postgres`main+0x3d0
>              postgres`_start+0x17c
>  Shared                                                     41428300
>
>              postgres`LWLockAcquire+0x1f0
>              postgres`CommitTransaction+0x104
>              postgres`CommitTransactionCommand+0xbc
>              postgres`finish_xact_command+0x78
>              postgres`exec_execute_message+0x42c
>              postgres`PostgresMain+0x1838
>              postgres`BackendRun+0x2f8
>              postgres`ServerLoop+0x680
>              postgres`PostmasterMain+0xda8
>              postgres`main+0x3d0
>              postgres`_start+0x17c
>  Exclusive                                                3858386500
>
>
> -Jignesh
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: User concurrency thresholding: where do I look?

From

Josh Berkus

Date:

31 July 2007, 18:32:05

Simon,

> Well thats pretty weird. That code path clearly only happens once per
> transaction and ought to be fast. The other code paths that take
> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
> ought to spend more time holding the lock. Presumably you are running
> with a fair number of SERIALIZABLE transactions?

Given that this is TPCC-analog, I'd assume that we are.

Jignesh?

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: User concurrency thresholding: where do I look?

From

"Jignesh K. Shah"

Date:

31 July 2007, 18:46:37

Yep quite a bit of transactions .. But the piece that's slow is where it
is clearing it up in CommitTransaction().
I am not sure of how ProcArrayLock is designed to work and hence not
clear what  we are seeing is what we expect.

Do we have some design doc on ProcArrayLock to understand its purpose?

Thanks.
Regards,
Jignesh

Josh Berkus wrote:
> Simon,
>
>
>> Well thats pretty weird. That code path clearly only happens once per
>> transaction and ought to be fast. The other code paths that take
>> ProcArrayLock like TransactionIdIsInProgress() and GetSnapshotData()
>> ought to spend more time holding the lock. Presumably you are running
>> with a fair number of SERIALIZABLE transactions?
>>
>
> Given that this is TPCC-analog, I'd assume that we are.
>
> Jignesh?
>
>

CLOG Patch

From

"Jignesh K. Shah"

Date:

03 August 2007, 17:10:23

Hi Simon,

This patch seems to work well (both with 32 and 64 value but not with 16
and the default 8). Is there a way we can integrate this in 8.3?

This will improve out of box performance quite a bit for high number of
users (atleat 30% in my OLTP test)

Regards,
Jignesh


Simon Riggs wrote:
> On Thu, 2007-07-26 at 11:27 -0400, Jignesh K. Shah wrote:
>
>
>> However at 900 Users where the big drop in throughput occurs:
>> It gives a different top "consumer" of time:
>>
>
>
>                 postgres`LWLockAcquire+0x1c8
>
>>               postgres`SimpleLruReadPage+0x1ac
>>               postgres`TransactionIdGetStatus+0x14
>>               postgres`TransactionLogFetch+0x58
>>
>
> TransactionIdGetStatus doesn't directly call SimpleLruReadPage().
> Presumably the compiler has been rearranging things??
>
> Looks like you're out of clog buffers. It seems like the clog buffers
> aren't big enough to hold clog pages for long enough and the SELECT FOR
> SHARE processing is leaving lots of additional read locks that are
> increasing the number of clog requests for older xids.
>
> Try the enclosed patch.
>
>
> ------------------------------------------------------------------------
>
> Index: src/include/access/clog.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/access/clog.h,v
> retrieving revision 1.19
> diff -c -r1.19 clog.h
> *** src/include/access/clog.h    5 Jan 2007 22:19:50 -0000    1.19
> --- src/include/access/clog.h    26 Jul 2007 15:44:58 -0000
> ***************
> *** 29,35 ****
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    8
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
> --- 29,35 ----
>
>
>   /* Number of SLRU buffers to use for clog */
> ! #define NUM_CLOG_BUFFERS    64
>
>
>   extern void TransactionIdSetStatus(TransactionId xid, XidStatus status);
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: CLOG Patch

From

"Simon Riggs"

Date:

03 August 2007, 17:28:53

On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote:

> This patch seems to work well (both with 32 and 64 value but not with 16
> and the default 8).

Could you test at 24 please also? Tom has pointed out the additional
cost of setting this higher, even in workloads that don't benefit from
the I/O-induced contention reduction.

> Is there a way we can integrate this in 8.3?

I just replied to Josh's thread on -hackers about this.

> This will improve out of box performance quite a bit for high number of
> users (atleat 30% in my OLTP test)

Yes, thats good. Will this have a dramatic effect on a particular
benchmark, or for what reason might we need this? Tom has questioned the
use case here, so I think it would be good to explain a little more for
everyone. Thanks.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com

Re: CLOG Patch

From

"Jignesh K. Shah"

Date:

10 August 2007, 14:54:59

I tried with CLOG  24 also and I got linear performance upto 1250 users
after which it started to tank. 32 got us to 1350 users before some
other bottleneck overtook it.


Based on what Tom said earlier, it might then make sense to make it a
tunable with the default of 8 but something one can change for high
number of users.


Thanks.
Regards,
Jignesh


Simon Riggs wrote:
> On Fri, 2007-08-03 at 16:09 -0400, Jignesh K. Shah wrote:
>
>
>> This patch seems to work well (both with 32 and 64 value but not with 16
>> and the default 8).
>>
>
> Could you test at 24 please also? Tom has pointed out the additional
> cost of setting this higher, even in workloads that don't benefit from
> the I/O-induced contention reduction.
>
>
>> Is there a way we can integrate this in 8.3?
>>
>
> I just replied to Josh's thread on -hackers about this.
>
>
>> This will improve out of box performance quite a bit for high number of
>> users (atleat 30% in my OLTP test)
>>
>
> Yes, thats good. Will this have a dramatic effect on a particular
> benchmark, or for what reason might we need this? Tom has questioned the
> use case here, so I think it would be good to explain a little more for
> everyone. Thanks.
>
>

Re: CLOG Patch

From

"Simon Riggs"

Date:

10 August 2007, 17:02:16

On Fri, 2007-08-10 at 13:54 -0400, Jignesh K. Shah wrote:
> I tried with CLOG  24 also and I got linear performance upto 1250 users
> after which it started to tank. 32 got us to 1350 users before some
> other bottleneck overtook it.

Jignesh,

Thanks for testing that.

It's not very clear to everybody why an extra 100 users is useful and it
would certainly help your case if you can explain.

--
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com