Thread: Straightforward changes for increased SMP scalability

Straightforward changes for increased SMP scalability

From
"Simon Riggs"
Date:
David Strong presented some excellent results of his SMP scalability
testing at Ottawa in May.
http://www.pgcon.org/2007/schedule/events/16.en.html

There are some easy things we can do to take advantage of those results,
especially the ones that were hardware independent.

The hardware independent results were these two:
- Avoid contention on WALInsertLock (+28% gain)
- Increase NUM_BUFFER_PARTITIONS (+7.7% gain)

Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to
show good gains even at 8 CPUs with these changes.

Proposals

1. For the first result, I suggest that we introduce some padding into
the shmem structure XLogCtlData to alleviate false sharing that may
exist between holders of WALInsertLock, WALWriteLock and info_lck. The
cost of this will be at most about 200 bytes of shmem, with a low risk
change. The benefits are hard to quantify, but we know this is an area
of high contention and we should do all we can to reduce that.
This hasn't been discussed previously, though we have seen good benefit
from avoiding false sharing in other cases, e.g. LWLOCK padding.

2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
This has been discussed previously:
http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php

Both of these changes are simple enough to consider for 8.3

Comments?

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com



Re: Straightforward changes for increased SMP scalability

From
"Joshua D. Drake"
Date:
Simon Riggs wrote:

> Proposals
> 
> 1. For the first result, I suggest that we introduce some padding into
> the shmem structure XLogCtlData to alleviate false sharing that may
> exist between holders of WALInsertLock, WALWriteLock and info_lck. The
> cost of this will be at most about 200 bytes of shmem, with a low risk
> change. The benefits are hard to quantify, but we know this is an area
> of high contention and we should do all we can to reduce that.
> This hasn't been discussed previously, though we have seen good benefit
> from avoiding false sharing in other cases, e.g. LWLOCK padding.
> 
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
> This has been discussed previously:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php
> 
> Both of these changes are simple enough to consider for 8.3
> 
> Comments?

+1 on the idea (I can speak to the technical side). What I can say is 
that it is pretty much known that after 8 cores we slow down. Although 
8.2 is better than any other release in this regard.

Joshua D. Drake

> 


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Straightforward changes for increased SMP scalability

From
Gregory Stark
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> +1 on the idea (I can speak to the technical side). What I can say is that it
> is pretty much known that after 8 cores we slow down. Although 8.2 is better
> than any other release in this regard.

Wait, what benchmarks have you seen where we slow down? 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Straightforward changes for increased SMP scalability

From
"Joshua D. Drake"
Date:
Gregory Stark wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
> 
>> +1 on the idea (I can speak to the technical side). What I can say is that it
>> is pretty much known that after 8 cores we slow down. Although 8.2 is better
>> than any other release in this regard.
> 
> Wait, what benchmarks have you seen where we slow down? 

The production type. :)

Hmm maybe that is a bad way to put it. I am not saying we slow down like 
we move slower than before. I mean per processor performance goes down. 
If I have 4 Cores things rock and roll. If I have 8 cores (and obvious 
sufficient workload) things rock and roll louder than 4 cores.

If I have 16 cores, things are still really loud but I start to not be 
able to tell the difference. The percentage of improvement is much lower.

E.g, 16 cores works and PostgreSQL work great, but it is not nearly as 
fantastic with 16 cores as 8 cores (in terms percentage gain).



Sincerely,

Joshua D. Drake


> 


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Straightforward changes for increased SMP scalability

From
"Strong, David"
Date:
>Simon Riggs wrote:
>
>> Proposals
>>
>> 1. For the first result, I suggest that we introduce some padding
into
>> the shmem structure XLogCtlData to alleviate false sharing that may
>> exist between holders of WALInsertLock, WALWriteLock and info_lck.
The
>> cost of this will be at most about 200 bytes of shmem, with a low
risk
>> change. The benefits are hard to quantify, but we know this is an
area
>> of high contention and we should do all we can to reduce that.
>> This hasn't been discussed previously, though we have seen good
benefit
>> from avoiding false sharing in other cases, e.g. LWLOCK padding.
>>
>> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
>> This has been discussed previously:
>> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php
>>
>> Both of these changes are simple enough to consider for 8.3
>>
>> Comments?
>>
>+1 on the idea (I can speak to the technical side). What I can say is
>that it is pretty much known that after 8 cores we slow down. Although
>8.2 is better than any other release in this regard.
>
>Joshua D. Drake
>
Here's a quick update. We're working on moving the patches we made
against Postgres 8.2.4 to 8.3 to see what is still valid. So far, the
base 8.3 shows ~7% improvement at 8 cores over 8.2.4.

The NUM_BUFFER_PARTITIONS patch is fairly simple. We've noticed gains
with NUM_BUFFER_PARTITIONS set between 256 and 2048, but little to no
gain after 2048, although this might depend on the benchmark and
platform being used. We've measured ~3% gain from the 8.3 base with
NUM_BUFFER_PARTITIONS set to 2048. This might be the way this patch
behaves with 8.3 or we might find that the NUM_BUFFER_PARTITIONS patch
complements patch "X" as the 7.7% number reported for
NUM_BUFFER_PARTITIONS in our presentation had a number of other patches
enabled. This was also running at a 20 cores.

We plan to start releasing patches this week for your consideration,
along with their current gains.

David

--
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Re: Straightforward changes for increased SMP scalability

From
Andrew Dunstan
Date:

Joshua D. Drake wrote:
> Gregory Stark wrote:
>> "Joshua D. Drake" <jd@commandprompt.com> writes:
>>
>>> +1 on the idea (I can speak to the technical side). What I can say 
>>> is that it
>>> is pretty much known that after 8 cores we slow down. Although 8.2 
>>> is better
>>> than any other release in this regard.
>>
>> Wait, what benchmarks have you seen where we slow down? 
>
> The production type. :)
>
> Hmm maybe that is a bad way to put it. I am not saying we slow down 
> like we move slower than before. I mean per processor performance goes 
> down. If I have 4 Cores things rock and roll. If I have 8 cores (and 
> obvious sufficient workload) things rock and roll louder than 4 cores.
>
> If I have 16 cores, things are still really loud but I start to not be 
> able to tell the difference. The percentage of improvement is much lower.
>
> E.g, 16 cores works and PostgreSQL work great, but it is not nearly as 
> fantastic with 16 cores as 8 cores (in terms percentage gain).
>
>
>
>

That's not the same thing as slowing down, it just means that scaling 
isn't always linear, which isn't surprising.

cheers

andrew


Re: Straightforward changes for increased SMP scalability

From
"Joshua D. Drake"
Date:
Andrew Dunstan wrote:
> 
> 
> Joshua D. Drake wrote:
>> Gregory Stark wrote:
>>> "Joshua D. Drake" <jd@commandprompt.com> writes:
>>>

>> If I have 16 cores, things are still really loud but I start to not be 
>> able to tell the difference. The percentage of improvement is much lower.
>>
>> E.g, 16 cores works and PostgreSQL work great, but it is not nearly as 
>> fantastic with 16 cores as 8 cores (in terms percentage gain).
>>

> That's not the same thing as slowing down, it just means that scaling 
> isn't always linear, which isn't surprising.

Right. Which is why I reposted, but it also makes what Simon proposes 
that much more attractive *because* it helps the linear problem (in theory).

Joshua D. Drake


> 
> cheers
> 
> andrew
> 


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Straightforward changes for increased SMP scalability

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).

Do you have any evidence to back up such a large increase?

This change is not free; at the very least it will break
contrib/pg_buffercache, which wants to lock all the partitions at once.
lwlock.c was designed on the assumption that only a pretty small number
of LWLocks would ever be held concurrently, and it will fall over.
I don't think fixing this would be as simple as increasing
MAX_SIMUL_LWLOCKS, because some of the algorithms are O(N^2).

I'd like to see numbers proving that there is useful incremental gain
from going above 32 or 64 partitions, before we start hacking to make
this work.
        regards, tom lane


Re: Straightforward changes for increased SMP scalability

From
"Strong, David"
Date:
Tom,

I'm happy to run some benchmarks to show the improvements with various
NUM_BUFFER_PARTITIONS settings. However, I want to make sure that this
is going to be useful. I can run 16 (base), 32, 64, 128 etc. type
increments, but I'm more concerned about the number of cores to use. Do
you have a suggestion for that? I can run with 1 to 32 cores. I had
planned to run a number of tests at 8 cores, but I can adjust to what
makes sense for the community.

David

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, July 16, 2007 9:10 AM
To: Simon Riggs
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Straightforward changes for increased SMP
scalability

"Simon Riggs" <simon@2ndquadrant.com> writes:
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).

Do you have any evidence to back up such a large increase?

This change is not free; at the very least it will break
contrib/pg_buffercache, which wants to lock all the partitions at once.
lwlock.c was designed on the assumption that only a pretty small number
of LWLocks would ever be held concurrently, and it will fall over.
I don't think fixing this would be as simple as increasing
MAX_SIMUL_LWLOCKS, because some of the algorithms are O(N^2).

I'd like to see numbers proving that there is useful incremental gain
from going above 32 or 64 partitions, before we start hacking to make
this work.
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Re: Straightforward changes for increased SMP scalability

From
Tom Lane
Date:
"Strong, David" <david.strong@unisys.com> writes:
> I'm happy to run some benchmarks to show the improvements with various
> NUM_BUFFER_PARTITIONS settings. However, I want to make sure that this
> is going to be useful. I can run 16 (base), 32, 64, 128 etc. type
> increments, but I'm more concerned about the number of cores to use. Do
> you have a suggestion for that? I can run with 1 to 32 cores. I had
> planned to run a number of tests at 8 cores, but I can adjust to what
> makes sense for the community.

Presumably the answers will be different.  I'd sort of like to see
several different curves for different numbers of processors, so we
can evaluate reasonably fairly.
        regards, tom lane


Re: Straightforward changes for increased SMP scalability

From
"Strong, David"
Date:
>> I'm happy to run some benchmarks to show the improvements with
various
>> NUM_BUFFER_PARTITIONS settings. However, I want to make sure that
this
>> is going to be useful. I can run 16 (base), 32, 64, 128 etc. type
>> increments, but I'm more concerned about the number of cores to use.
Do
>> you have a suggestion for that? I can run with 1 to 32 cores. I had
>> planned to run a number of tests at 8 cores, but I can adjust to what
?> makes sense for the community.
>
>Presumably the answers will be different.  I'd sort of like to see
>several different curves for different numbers of processors, so we
>can evaluate reasonably fairly.
>
>            regards, tom lane

Tom,

Correct. This is a scalability patch rather than a performance patch,
although each aspect is related. I would expect the gain to be better as
more cores and users are added.

I can run some tests along the following lines:

1. NUM_BUFFER_PARITIONS sizes for 16, 32, 64, 128, 256, 512, 1024, 2048.

2. Cores set at 1, 2, 4, 8, 16, 24 and 32.

Does anyone have any comments or suggestions?

David


Re: Straightforward changes for increased SMP scalability

From
Andrew Sullivan
Date:
On Mon, Jul 16, 2007 at 01:23:46PM +0100, Simon Riggs wrote:
> Both of these changes are simple enough to consider for 8.3

I'm in favour of scalability, of course, but are they really simple
enough to put in for 8.3?  I was under the impression that there was
a push on to get the thing shipped, and adding incremental changes
near the end of the cycle strikes me as a possible source of
significant additional surprises (and therefore delays).  I am no
code expert, though; I just wanted to be sure there's consensus on
the simplicity of the changes.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.     --Dennis Ritchie


Re: Straightforward changes for increased SMP scalability

From
"Zeugswetter Andreas ADI SD"
Date:
> The NUM_BUFFER_PARTITIONS patch is fairly simple. We've
> noticed gains with NUM_BUFFER_PARTITIONS set between 256 and
> 2048, but little to no gain after 2048, although this might
> depend on the benchmark and platform being used. We've

Might this also be a padding issue, because 2048 partitions seems mighty
high ?
Other db's seem to cope well with a max of 64 partitions.

Andreas


Re: Straightforward changes for increased SMP scalability

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:
> David Strong presented some excellent results of his SMP scalability
> testing at Ottawa in May.
> http://www.pgcon.org/2007/schedule/events/16.en.html
> 
> There are some easy things we can do to take advantage of those results,
> especially the ones that were hardware independent.
> 
> The hardware independent results were these two:
> - Avoid contention on WALInsertLock (+28% gain)
> - Increase NUM_BUFFER_PARTITIONS (+7.7% gain)
> 
> Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to
> show good gains even at 8 CPUs with these changes.
> 
> Proposals
> 
> 1. For the first result, I suggest that we introduce some padding into
> the shmem structure XLogCtlData to alleviate false sharing that may
> exist between holders of WALInsertLock, WALWriteLock and info_lck. The
> cost of this will be at most about 200 bytes of shmem, with a low risk
> change. The benefits are hard to quantify, but we know this is an area
> of high contention and we should do all we can to reduce that.
> This hasn't been discussed previously, though we have seen good benefit
> from avoiding false sharing in other cases, e.g. LWLOCK padding.
> 
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
> This has been discussed previously:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php
> 
> Both of these changes are simple enough to consider for 8.3
> 
> Comments?
> 
> -- 
>   Simon Riggs
>   EnterpriseDB  http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Straightforward changes for increased SMP scalability

From
Bruce Momjian
Date:
Add to TODO:

* SMP scalability improvements
 http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> David Strong presented some excellent results of his SMP scalability
> testing at Ottawa in May.
> http://www.pgcon.org/2007/schedule/events/16.en.html
> 
> There are some easy things we can do to take advantage of those results,
> especially the ones that were hardware independent.
> 
> The hardware independent results were these two:
> - Avoid contention on WALInsertLock (+28% gain)
> - Increase NUM_BUFFER_PARTITIONS (+7.7% gain)
> 
> Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to
> show good gains even at 8 CPUs with these changes.
> 
> Proposals
> 
> 1. For the first result, I suggest that we introduce some padding into
> the shmem structure XLogCtlData to alleviate false sharing that may
> exist between holders of WALInsertLock, WALWriteLock and info_lck. The
> cost of this will be at most about 200 bytes of shmem, with a low risk
> change. The benefits are hard to quantify, but we know this is an area
> of high contention and we should do all we can to reduce that.
> This hasn't been discussed previously, though we have seen good benefit
> from avoiding false sharing in other cases, e.g. LWLOCK padding.
> 
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
> This has been discussed previously:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php
> 
> Both of these changes are simple enough to consider for 8.3
> 
> Comments?
> 
> -- 
>   Simon Riggs
>   EnterpriseDB  http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Straightforward changes for increased SMP scalability

From
Bruce Momjian
Date:
Added to TODO:

* SMP scalability improvements
 http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php


---------------------------------------------------------------------------

Simon Riggs wrote:
> David Strong presented some excellent results of his SMP scalability
> testing at Ottawa in May.
> http://www.pgcon.org/2007/schedule/events/16.en.html
> 
> There are some easy things we can do to take advantage of those results,
> especially the ones that were hardware independent.
> 
> The hardware independent results were these two:
> - Avoid contention on WALInsertLock (+28% gain)
> - Increase NUM_BUFFER_PARTITIONS (+7.7% gain)
> 
> Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to
> show good gains even at 8 CPUs with these changes.
> 
> Proposals
> 
> 1. For the first result, I suggest that we introduce some padding into
> the shmem structure XLogCtlData to alleviate false sharing that may
> exist between holders of WALInsertLock, WALWriteLock and info_lck. The
> cost of this will be at most about 200 bytes of shmem, with a low risk
> change. The benefits are hard to quantify, but we know this is an area
> of high contention and we should do all we can to reduce that.
> This hasn't been discussed previously, though we have seen good benefit
> from avoiding false sharing in other cases, e.g. LWLOCK padding.
> 
> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher).
> This has been discussed previously:
> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php
> 
> Both of these changes are simple enough to consider for 8.3
> 
> Comments?
> 
> -- 
>   Simon Riggs
>   EnterpriseDB  http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
> 
>                 http://www.postgresql.org/about/donate

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +