Thread: Straightforward changes for increased SMP scalability
David Strong presented some excellent results of his SMP scalability testing at Ottawa in May. http://www.pgcon.org/2007/schedule/events/16.en.html There are some easy things we can do to take advantage of those results, especially the ones that were hardware independent. The hardware independent results were these two: - Avoid contention on WALInsertLock (+28% gain) - Increase NUM_BUFFER_PARTITIONS (+7.7% gain) Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to show good gains even at 8 CPUs with these changes. Proposals 1. For the first result, I suggest that we introduce some padding into the shmem structure XLogCtlData to alleviate false sharing that may exist between holders of WALInsertLock, WALWriteLock and info_lck. The cost of this will be at most about 200 bytes of shmem, with a low risk change. The benefits are hard to quantify, but we know this is an area of high contention and we should do all we can to reduce that. This hasn't been discussed previously, though we have seen good benefit from avoiding false sharing in other cases, e.g. LWLOCK padding. 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). This has been discussed previously: http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php Both of these changes are simple enough to consider for 8.3 Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > Proposals > > 1. For the first result, I suggest that we introduce some padding into > the shmem structure XLogCtlData to alleviate false sharing that may > exist between holders of WALInsertLock, WALWriteLock and info_lck. The > cost of this will be at most about 200 bytes of shmem, with a low risk > change. The benefits are hard to quantify, but we know this is an area > of high contention and we should do all we can to reduce that. > This hasn't been discussed previously, though we have seen good benefit > from avoiding false sharing in other cases, e.g. LWLOCK padding. > > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). > This has been discussed previously: > http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php > > Both of these changes are simple enough to consider for 8.3 > > Comments? +1 on the idea (I can speak to the technical side). What I can say is that it is pretty much known that after 8 cores we slow down. Although 8.2 is better than any other release in this regard. Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
"Joshua D. Drake" <jd@commandprompt.com> writes: > +1 on the idea (I can speak to the technical side). What I can say is that it > is pretty much known that after 8 cores we slow down. Although 8.2 is better > than any other release in this regard. Wait, what benchmarks have you seen where we slow down? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > >> +1 on the idea (I can speak to the technical side). What I can say is that it >> is pretty much known that after 8 cores we slow down. Although 8.2 is better >> than any other release in this regard. > > Wait, what benchmarks have you seen where we slow down? The production type. :) Hmm maybe that is a bad way to put it. I am not saying we slow down like we move slower than before. I mean per processor performance goes down. If I have 4 Cores things rock and roll. If I have 8 cores (and obvious sufficient workload) things rock and roll louder than 4 cores. If I have 16 cores, things are still really loud but I start to not be able to tell the difference. The percentage of improvement is much lower. E.g, 16 cores works and PostgreSQL work great, but it is not nearly as fantastic with 16 cores as 8 cores (in terms percentage gain). Sincerely, Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
>Simon Riggs wrote: > >> Proposals >> >> 1. For the first result, I suggest that we introduce some padding into >> the shmem structure XLogCtlData to alleviate false sharing that may >> exist between holders of WALInsertLock, WALWriteLock and info_lck. The >> cost of this will be at most about 200 bytes of shmem, with a low risk >> change. The benefits are hard to quantify, but we know this is an area >> of high contention and we should do all we can to reduce that. >> This hasn't been discussed previously, though we have seen good benefit >> from avoiding false sharing in other cases, e.g. LWLOCK padding. >> >> 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). >> This has been discussed previously: >> http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php >> >> Both of these changes are simple enough to consider for 8.3 >> >> Comments? >> >+1 on the idea (I can speak to the technical side). What I can say is >that it is pretty much known that after 8 cores we slow down. Although >8.2 is better than any other release in this regard. > >Joshua D. Drake > Here's a quick update. We're working on moving the patches we made against Postgres 8.2.4 to 8.3 to see what is still valid. So far, the base 8.3 shows ~7% improvement at 8 cores over 8.2.4. The NUM_BUFFER_PARTITIONS patch is fairly simple. We've noticed gains with NUM_BUFFER_PARTITIONS set between 256 and 2048, but little to no gain after 2048, although this might depend on the benchmark and platform being used. We've measured ~3% gain from the 8.3 base with NUM_BUFFER_PARTITIONS set to 2048. This might be the way this patch behaves with 8.3 or we might find that the NUM_BUFFER_PARTITIONS patch complements patch "X" as the 7.7% number reported for NUM_BUFFER_PARTITIONS in our presentation had a number of other patches enabled. This was also running at a 20 cores. We plan to start releasing patches this week for your consideration, along with their current gains. David -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
Joshua D. Drake wrote: > Gregory Stark wrote: >> "Joshua D. Drake" <jd@commandprompt.com> writes: >> >>> +1 on the idea (I can speak to the technical side). What I can say >>> is that it >>> is pretty much known that after 8 cores we slow down. Although 8.2 >>> is better >>> than any other release in this regard. >> >> Wait, what benchmarks have you seen where we slow down? > > The production type. :) > > Hmm maybe that is a bad way to put it. I am not saying we slow down > like we move slower than before. I mean per processor performance goes > down. If I have 4 Cores things rock and roll. If I have 8 cores (and > obvious sufficient workload) things rock and roll louder than 4 cores. > > If I have 16 cores, things are still really loud but I start to not be > able to tell the difference. The percentage of improvement is much lower. > > E.g, 16 cores works and PostgreSQL work great, but it is not nearly as > fantastic with 16 cores as 8 cores (in terms percentage gain). > > > > That's not the same thing as slowing down, it just means that scaling isn't always linear, which isn't surprising. cheers andrew
Andrew Dunstan wrote: > > > Joshua D. Drake wrote: >> Gregory Stark wrote: >>> "Joshua D. Drake" <jd@commandprompt.com> writes: >>> >> If I have 16 cores, things are still really loud but I start to not be >> able to tell the difference. The percentage of improvement is much lower. >> >> E.g, 16 cores works and PostgreSQL work great, but it is not nearly as >> fantastic with 16 cores as 8 cores (in terms percentage gain). >> > That's not the same thing as slowing down, it just means that scaling > isn't always linear, which isn't surprising. Right. Which is why I reposted, but it also makes what Simon proposes that much more attractive *because* it helps the linear problem (in theory). Joshua D. Drake > > cheers > > andrew > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
"Simon Riggs" <simon@2ndquadrant.com> writes: > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). Do you have any evidence to back up such a large increase? This change is not free; at the very least it will break contrib/pg_buffercache, which wants to lock all the partitions at once. lwlock.c was designed on the assumption that only a pretty small number of LWLocks would ever be held concurrently, and it will fall over. I don't think fixing this would be as simple as increasing MAX_SIMUL_LWLOCKS, because some of the algorithms are O(N^2). I'd like to see numbers proving that there is useful incremental gain from going above 32 or 64 partitions, before we start hacking to make this work. regards, tom lane
Tom, I'm happy to run some benchmarks to show the improvements with various NUM_BUFFER_PARTITIONS settings. However, I want to make sure that this is going to be useful. I can run 16 (base), 32, 64, 128 etc. type increments, but I'm more concerned about the number of cores to use. Do you have a suggestion for that? I can run with 1 to 32 cores. I had planned to run a number of tests at 8 cores, but I can adjust to what makes sense for the community. David -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane Sent: Monday, July 16, 2007 9:10 AM To: Simon Riggs Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Straightforward changes for increased SMP scalability "Simon Riggs" <simon@2ndquadrant.com> writes: > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). Do you have any evidence to back up such a large increase? This change is not free; at the very least it will break contrib/pg_buffercache, which wants to lock all the partitions at once. lwlock.c was designed on the assumption that only a pretty small number of LWLocks would ever be held concurrently, and it will fall over. I don't think fixing this would be as simple as increasing MAX_SIMUL_LWLOCKS, because some of the algorithms are O(N^2). I'd like to see numbers proving that there is useful incremental gain from going above 32 or 64 partitions, before we start hacking to make this work. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
"Strong, David" <david.strong@unisys.com> writes: > I'm happy to run some benchmarks to show the improvements with various > NUM_BUFFER_PARTITIONS settings. However, I want to make sure that this > is going to be useful. I can run 16 (base), 32, 64, 128 etc. type > increments, but I'm more concerned about the number of cores to use. Do > you have a suggestion for that? I can run with 1 to 32 cores. I had > planned to run a number of tests at 8 cores, but I can adjust to what > makes sense for the community. Presumably the answers will be different. I'd sort of like to see several different curves for different numbers of processors, so we can evaluate reasonably fairly. regards, tom lane
>> I'm happy to run some benchmarks to show the improvements with various >> NUM_BUFFER_PARTITIONS settings. However, I want to make sure that this >> is going to be useful. I can run 16 (base), 32, 64, 128 etc. type >> increments, but I'm more concerned about the number of cores to use. Do >> you have a suggestion for that? I can run with 1 to 32 cores. I had >> planned to run a number of tests at 8 cores, but I can adjust to what ?> makes sense for the community. > >Presumably the answers will be different. I'd sort of like to see >several different curves for different numbers of processors, so we >can evaluate reasonably fairly. > > regards, tom lane Tom, Correct. This is a scalability patch rather than a performance patch, although each aspect is related. I would expect the gain to be better as more cores and users are added. I can run some tests along the following lines: 1. NUM_BUFFER_PARITIONS sizes for 16, 32, 64, 128, 256, 512, 1024, 2048. 2. Cores set at 1, 2, 4, 8, 16, 24 and 32. Does anyone have any comments or suggestions? David
On Mon, Jul 16, 2007 at 01:23:46PM +0100, Simon Riggs wrote: > Both of these changes are simple enough to consider for 8.3 I'm in favour of scalability, of course, but are they really simple enough to put in for 8.3? I was under the impression that there was a push on to get the thing shipped, and adding incremental changes near the end of the cycle strikes me as a possible source of significant additional surprises (and therefore delays). I am no code expert, though; I just wanted to be sure there's consensus on the simplicity of the changes. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
> The NUM_BUFFER_PARTITIONS patch is fairly simple. We've > noticed gains with NUM_BUFFER_PARTITIONS set between 256 and > 2048, but little to no gain after 2048, although this might > depend on the benchmark and platform being used. We've Might this also be a padding issue, because 2048 partitions seems mighty high ? Other db's seem to cope well with a max of 64 partitions. Andreas
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > David Strong presented some excellent results of his SMP scalability > testing at Ottawa in May. > http://www.pgcon.org/2007/schedule/events/16.en.html > > There are some easy things we can do to take advantage of those results, > especially the ones that were hardware independent. > > The hardware independent results were these two: > - Avoid contention on WALInsertLock (+28% gain) > - Increase NUM_BUFFER_PARTITIONS (+7.7% gain) > > Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to > show good gains even at 8 CPUs with these changes. > > Proposals > > 1. For the first result, I suggest that we introduce some padding into > the shmem structure XLogCtlData to alleviate false sharing that may > exist between holders of WALInsertLock, WALWriteLock and info_lck. The > cost of this will be at most about 200 bytes of shmem, with a low risk > change. The benefits are hard to quantify, but we know this is an area > of high contention and we should do all we can to reduce that. > This hasn't been discussed previously, though we have seen good benefit > from avoiding false sharing in other cases, e.g. LWLOCK padding. > > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). > This has been discussed previously: > http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php > > Both of these changes are simple enough to consider for 8.3 > > Comments? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Add to TODO: * SMP scalability improvements http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php --------------------------------------------------------------------------- Simon Riggs wrote: > David Strong presented some excellent results of his SMP scalability > testing at Ottawa in May. > http://www.pgcon.org/2007/schedule/events/16.en.html > > There are some easy things we can do to take advantage of those results, > especially the ones that were hardware independent. > > The hardware independent results were these two: > - Avoid contention on WALInsertLock (+28% gain) > - Increase NUM_BUFFER_PARTITIONS (+7.7% gain) > > Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to > show good gains even at 8 CPUs with these changes. > > Proposals > > 1. For the first result, I suggest that we introduce some padding into > the shmem structure XLogCtlData to alleviate false sharing that may > exist between holders of WALInsertLock, WALWriteLock and info_lck. The > cost of this will be at most about 200 bytes of shmem, with a low risk > change. The benefits are hard to quantify, but we know this is an area > of high contention and we should do all we can to reduce that. > This hasn't been discussed previously, though we have seen good benefit > from avoiding false sharing in other cases, e.g. LWLOCK padding. > > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). > This has been discussed previously: > http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php > > Both of these changes are simple enough to consider for 8.3 > > Comments? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Added to TODO: * SMP scalability improvements http://archives.postgresql.org/pgsql-hackers/2007-07/msg00439.php --------------------------------------------------------------------------- Simon Riggs wrote: > David Strong presented some excellent results of his SMP scalability > testing at Ottawa in May. > http://www.pgcon.org/2007/schedule/events/16.en.html > > There are some easy things we can do to take advantage of those results, > especially the ones that were hardware independent. > > The hardware independent results were these two: > - Avoid contention on WALInsertLock (+28% gain) > - Increase NUM_BUFFER_PARTITIONS (+7.7% gain) > > Scalability begins to slow down at 8 CPUs on 8.2.4 and David was able to > show good gains even at 8 CPUs with these changes. > > Proposals > > 1. For the first result, I suggest that we introduce some padding into > the shmem structure XLogCtlData to alleviate false sharing that may > exist between holders of WALInsertLock, WALWriteLock and info_lck. The > cost of this will be at most about 200 bytes of shmem, with a low risk > change. The benefits are hard to quantify, but we know this is an area > of high contention and we should do all we can to reduce that. > This hasn't been discussed previously, though we have seen good benefit > from avoiding false sharing in other cases, e.g. LWLOCK padding. > > 2. Increase NUM_BUFFER_PARTITIONS from 16 to 256 (or higher). > This has been discussed previously: > http://archives.postgresql.org/pgsql-hackers/2006-09/msg00967.php > > Both of these changes are simple enough to consider for 8.3 > > Comments? > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +