Thread: FW: [ADMIN] Shared_buffers and kernel parameters, tuning
> -----Original Message----- > From: Lending, Rune [SMTP:rune.lending@inpoc.com] > Sent: 05 June 2003 10:11 > To: 'pgsql-admin@postgresql.org' > Subject: [ADMIN] Shared_buffers and kernel parameters, tuning > > After days of searching and testing I have come up with this way of > configuring our postgresql 7.2 db. > I have not yet increased my shared_buffer as high as suggested below on > our > prod machine (24-7 high traffic), but after testing on our dev machines > this allows at least the databse to start up. It is very difficult to test > the actual performance since there is a hugh difference in traffic on dev > and prod. > Here is what we have: > > We have a high traffic system with a database described as followed: > > 4 pentium 3 633 cpu's > 3753456 kB RAM (3.5 Gb) > Red Hat Linux 7.2 > postgresql 7.2 > > > What I like to do is: > > /proc/sys/kernel/sem=250 32000 100 500 (after advise from forum/docs) > /proc/sys/kernel/shmmax=1921769472 (RAM / 2 * 1024 - this > piece of math I got from some of oracle's support pages (ooopps) actually > ) > > /proc/sys/kernel/shmall=1921769472 (RAM / 2 * 1024) > > in postgresql.conf: > > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this > forum. > > > Does this sound right or am I totally out of bounds here? I have, as said > before done this on our dev macine ( a lot smaller machine ), but it would > be nice with some feedback .. > > Thanx in advance for response. > > /rune > >
On 9 Jun 2003 at 11:18, Howard Oblowitz wrote: > > in postgresql.conf: > > > > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this > > forum. > > > > > > Does this sound right or am I totally out of bounds here? I have, as said > > before done this on our dev macine ( a lot smaller machine ), but it would > > be nice with some feedback .. With that kind of RAM and that kind of shared buffers setting, you must set effective OS cache size so that postgresql can calculate when to flush buffers. While tuning database, it always help to pin down the target first and then try to reach it. If you could let us know what performance you are expecting out of this machine and for what kind of load in terms of concurrent users, database size and usage pattern etc., that would help. HTH Bye Shridhar -- QOTD: "I'm just a boy named 'su'..."
Rune, > > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this > > forum. > > Does this sound right or am I totally out of bounds here? I have, as said Out of bounds, through no fault of your own .... I'm still working on documentation for this. However, let me qoute the upcoming supplimentary docs: SHARED_BUFFERS Sets the size of Postgres' memory buffer where queries are held before being fed into the Kernel buffer of the host system. It's very important to remember that this is only a holding area, and not the total memory available for the server. As such, resist the urge to set this number to a large portion of your RAM, as this will actually degrade performance on many OSes. Members of the pgsql-performance mailing list have found useful values in the range of 1000-6000, depending on available RAM, database size, and number of concurrent queries. No one has yet reported positive results for any number over 6000. -- Josh Berkus Aglio Database Solutions San Francisco
On 10 Jun 2003 at 8:46, Josh Berkus wrote: > SHARED_BUFFERS > Sets the size of Postgres' memory buffer where queries are held before being > fed into the Kernel buffer of the host system. It's very important to > remember that this is only a holding area, and not the total memory available > for the server. As such, resist the urge to set this number to a large > portion of your RAM, as this will actually degrade performance on many OSes. > Members of the pgsql-performance mailing list have found useful values in the > range of 1000-6000, depending on available RAM, database size, and number of > concurrent queries. No one has yet reported positive results for any number > over 6000. I was planning to document postgresql.conf with little hints, enough to get one started, drawing inspiration from lilo.conf of debian, which is beautiful to say the least.. I haven't find enough time to do that. But I will do it.. But I don't know all the parameters enough. Of course I will post a starter but any input would be welcome. Point is we should be able to say RTFC rather than RTFA as that would get a DBA single place to look at. I agree that no amount of simplicity is enough but still..:-) Bye Shridhar -- Brooke's Law: Whenever a system becomes completely defined, some damn fool discovers something which either abolishes the system or expands it beyond recognition.
Shridhar, > I was planning to document postgresql.conf with little hints, enough to get > one started, drawing inspiration from lilo.conf of debian, which is > beautiful to say the least.. This week, I am: 1) Submiting a patch to re-organize postgresql.conf.sample and "Run-Time Configuration" docs in a more logical order. 2) Finishing up a massive OpenOffice.org spreadsheet full information on each postgresql.conf option, including anecdotal advice from this list. Next week, I will try to turn the spreadsheet into a series of HTML pages for Techdocs. I would be thrilled to have your help on: a) editing + augmenting the spreadsheet contents b) transforming it into HTML pages. -- Josh Berkus Aglio Database Solutions San Francisco
On Tue, Jun 10, 2003 at 21:26:49 +0530, Shridhar Daithankar <shridhar_daithankar@persistent.co.in> wrote: > > Point is we should be able to say RTFC rather than RTFA as that would get a DBA > single place to look at. I agree that no amount of simplicity is enough but > still..:-) I believe there was discussion a couple of months ago that came to a different conclusion. There was concern about having documenation that wasn't in the documentation and have to versions of essentially the same information that both need to be maintained.
On 2003-06-10 08:46:21 -0700, Josh Berkus wrote: > Rune, > > > > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this > > > forum. > > > Does this sound right or am I totally out of bounds here? I have, as said > > Out of bounds, through no fault of your own .... I'm still working on > documentation for this. However, let me qoute the upcoming supplimentary > docs: > > SHARED_BUFFERS > Sets the size of Postgres' memory buffer where queries are held before being > fed into the Kernel buffer of the host system. It's very important to > remember that this is only a holding area, and not the total memory available > for the server. As such, resist the urge to set this number to a large > portion of your RAM, as this will actually degrade performance on many OSes. > Members of the pgsql-performance mailing list have found useful values in the > range of 1000-6000, depending on available RAM, database size, and number of > concurrent queries. No one has yet reported positive results for any number > over 6000. > We run a dual P3 1GHz server, running Debian Linux (stable), kernel 2.4.20, with a 5-disk (10K rpm) RAID 5 array (ICP Vortex controller) and 4GB RAM, most of which is used for filesystem cache. This server runs Postgresql 7.3.2 exclusively, with a database of roughly 7GB. This database is used for a very busy community website, running an enormous amount of small and simple select/update/insert queries and a number of complex select queries, to search through all kinds of data. This server isn't running postgres that long, and we're still trying to figure out the best configuration parameters for the highest possible performance. Shared_buffers was one of the first things we looked at. We've tested with shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with shared_buffers set at 32768 was the best we could attain. 8192 and 131072 came out roughly equal. 1024 was miserable. (yay, 3 lines in a row starting with the word 'shared_buffers'! ;)) Also, there was a very strong relation between the shared_buffers setting and the amount of cpu time spent in kernelland. Currently, the server spends roughly 20% of it's time in kernelspace (according to vmstat). When shared_buffers was 8192, this went up to about 30%. I don't have any hard performance statistics, we just threw the site live with different settings and watched the load on all servers, and the amount of requests/second our webservers could generate (the bottleneck is the postgresql server, not the webservers). I'm really eager for any useful tips regarding the various cost settings. I've been following this list for months and read through a large portion of the archives, but noone has been able to do more than handwaving around certain numbers, which are close to the defaults anyway. Currently, we have the following settings: shared_buffers = 32768 max_fsm_relations = 100 max_fsm_pages = 100000 sort_mem = 16384 vacuum_mem = 131072 effective_cache_size = 327680 random_page_cost = 1.5 cpu_tuple_cost = 0.005 #cpu_index_tuple_cost = 0.001 (default) #cpu_operator_cost = 0.0025 (default) Halving the cpu_tuple_cost has given a very impressive performance boost (performance roughly doubled). I'm not sure why, because the plans of the large queries I was checking haven't changed as far as I can see, but maybe some smaller queries I didn't bother to check are using a different plan now. Although I was quite sure those smaller queries were all using the correct indexes etc before the change anyway. Just to be absolutely sure: all *_cost parameters only influence the chosen plan, right? There is absolutely nothing else influenced which doesn't show up in an EXPLAIN ANALYZE, right? Regards, Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
Vincent, > This server isn't running postgres that long, and we're still trying to figure > out the best configuration parameters for the highest possible performance. > Shared_buffers was one of the first things we looked at. We've tested with > shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with > shared_buffers set at 32768 was the best we could attain. 8192 and 131072 came > out roughly equal. 1024 was miserable. Cool! This is the first report we've had of a successful higher setting for shared_buffers. I'll need to revise the text. What do people think of: SHARED_BUFFERS Sets the size of Postgres' memory buffer where queries are held before being fed into the Kernel buffer of the host system. It's very important to remember that this is only a holding area, and not the total memory available for the server. As such, resist the urge to set this number to a large portion of your RAM, as this will actually degrade performance on many OSes. Members of the pgsql-performance mailing list have mostly found useful values in the range of 1000-6000, depending on available RAM, database size, and number of concurrent queries. This can go up slightly for servers with a great deal of RAM; the useful maximum on Linux seems to be 6% to 10% of available RAM, with performance degrading at higher settings. Information on other OSes is not yet posted. On multi-purpose servers, of course, the setting should be lowered. > Also, there was a very strong relation between the shared_buffers setting and > the amount of cpu time spent in kernelland. Currently, the server spends > roughly 20% of it's time in kernelspace (according to vmstat). When > shared_buffers was 8192, this went up to about 30%. This makes perfect sense ... less shared_buffers = more kernel_buffers, and vice-versa. > Currently, we have the following settings: > shared_buffers = 32768 > max_fsm_relations = 100 You might wanna increase this; current recommended is 300 just to make sure that you have one for every table. > Just to be absolutely sure: all *_cost parameters only influence the chosen > plan, right? There is absolutely nothing else influenced which doesn't show up > in an EXPLAIN ANALYZE, right? Yes, AFAIK. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Vincent, > > > This server isn't running postgres that long, and we're still trying to > figure > > out the best configuration parameters for the highest possible performance. > > Shared_buffers was one of the first things we looked at. We've tested with > > shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with > > shared_buffers set at 32768 was the best we could attain. 8192 and 131072 > came > > out roughly equal. 1024 was miserable. > > Cool! This is the first report we've had of a successful higher setting for > shared_buffers. I'll need to revise the text. What do people think of: I have been thinking about shared_buffers, and it seems it is the age-old issue of working set. Traditionally Unix doesn't use working set (though a few do). It just allocates memory proportionally among all processes, with unreferenced pages being paged out first. For PostgreSQL, if your working set is X, if you set your shared buffers to X, you will get optimal performance (assuming there is no memory pressure). If set allocate X/2, you will probably get worse performance. If you allocate X*2, you will also probably get slightly worse performance. Now, let's suppose you can't allocate X shared buffers, because of memory pressure. Suppose you can allocate X/2 shared buffers, and that will leave X/2 kernel buffers. It would be better to allocate X/4 shared buffers, and leave X*3/4 kernel buffers. If you can only allocate X/5 shared buffers, you might be better with X/10 shared buffers because you are going to be doing a lot of I/O, and you need lots of kernel buffers for that. I think that is what people are seeing when modifying shared buffers: X shared buffers is best >X shared buffers is too much overhead and starves kernel <X might be better by not maximizing shared buffers and have more kernel buffers Add to this that it is very hard to estimate working set. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Vincent van Leeuwen <pgsql.spam@vinz.nl> writes: > Halving the cpu_tuple_cost has given a very impressive performance boost > (performance roughly doubled). I'm not sure why, because the plans of the > large queries I was checking haven't changed as far as I can see, but maybe > some smaller queries I didn't bother to check are using a different plan now. That's very curious; I'd expect that parameter to have only marginal effect in the first place (unless you make huge changes in it, of course). It must have changed some plan that you didn't take note of. If you can find it I'd be interested to know. > Just to be absolutely sure: all *_cost parameters only influence the > chosen plan, right? There is absolutely nothing else influenced which > doesn't show up in an EXPLAIN ANALYZE, right? AFAIR, the only one of these parameters that the executor pays any attention to is SORT_MEM; that will determine how soon the runtime code starts to spill tuples to disk in sorts, hash tables, etc. regards, tom lane
On 2003-06-10 15:08:22 -0400, Bruce Momjian wrote: > For PostgreSQL, if your working set is X, if you set your shared buffers > to X, you will get optimal performance (assuming there is no memory > pressure). If set allocate X/2, you will probably get worse > performance. If you allocate X*2, you will also probably get slightly > worse performance. > > Now, let's suppose you can't allocate X shared buffers, because of > memory pressure. Suppose you can allocate X/2 shared buffers, and that > will leave X/2 kernel buffers. It would be better to allocate X/4 > shared buffers, and leave X*3/4 kernel buffers. If you can only > allocate X/5 shared buffers, you might be better with X/10 shared > buffers because you are going to be doing a lot of I/O, and you need > lots of kernel buffers for that. > > I think that is what people are seeing when modifying shared buffers: > > X shared buffers is best > >X shared buffers is too much overhead and starves kernel > <X might be better by not maximizing shared buffers and have > more kernel buffers > > Add to this that it is very hard to estimate working set. > Makes a lot of sense to me. We're doing a lot of I/O on a small part of that 7GB, and the rest is accessed in a more or less random fashion, so 256MB of shared buffers sounds about right. I'll play more with this in the future to see at what setting it performs best. Is there any information available in the system tables or statistics collector that can help determine X? Could PostgreSQL be easily modified to provide more information in this area? Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
On 2003-06-10 15:43:47 -0400, Tom Lane wrote: > Vincent van Leeuwen <pgsql.spam@vinz.nl> writes: > > Halving the cpu_tuple_cost has given a very impressive performance boost > > (performance roughly doubled). I'm not sure why, because the plans of the > > large queries I was checking haven't changed as far as I can see, but maybe > > some smaller queries I didn't bother to check are using a different plan now. > > That's very curious; I'd expect that parameter to have only marginal > effect in the first place (unless you make huge changes in it, of course). > It must have changed some plan that you didn't take note of. If you can > find it I'd be interested to know. > Unfortunately, we're not exactly in the best position to test a lot of things. Our website has been running on MySQL and PHP for the last 3 years, and I've been wanting to switch to PostgreSQL for about the last 2 years. A lot of preparation went in to the change, but once we switched our live site to use PostgreSQL as it's main database we were utterly dissapointed in our own preparations. I knew our website was somewhat optimized for MySQL usage, but looking back I am totally amazed that we were able to squeeze so much performance out of a database that locks entire tables for every update (yes, we used the MyISAM table format). One of the most surprising things we learned was that MySQL was totally bottlenecking on I/O, with a large chunk of CPU unused, and with PostgreSQL it's the other way around. The last couple of weeks have been a nice collection of whacky antics and performance tuning all over the place. The first week everything performed abysmal, and another week later we're close to our original performance again. Ofcourse, the goal is to exceed MySQL's performance by a comfortable margin, but we're not there yet :) So, basically, this server is pushed far harder than it should be. Average system load is at about 4, and there are always 50-200 postgresql threads running during daytime. A new server that will replace this one and which is roughly 2-3 times as fast will be put live in a few weeks, and until that's here this box will have to bear the burden on it's own. > > Just to be absolutely sure: all *_cost parameters only influence the > > chosen plan, right? There is absolutely nothing else influenced which > > doesn't show up in an EXPLAIN ANALYZE, right? > > AFAIR, the only one of these parameters that the executor pays any > attention to is SORT_MEM; that will determine how soon the runtime code > starts to spill tuples to disk in sorts, hash tables, etc. > Current sort_mem setting is based on monitoring the pgsql_tmp directory and concluding that sort_mem needed to be doubled to avoid swapping to disk. It's not as if this box doesn't have enough RAM :) But this means I'll have to look more closely at my query plans, more things are changing than I'm noticing when I tweak various settings. One of the hardest parts is that some queries which should use sequential scans are using indexes and some queries which should use indexes are using sequential scans :) We're currently using some ugly 'set enable_seqscan to off;' hacks in a few places, until everything is tweaked right, but I hope we can remove those as soon as possible. Regards, Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
Vincent van Leeuwen wrote: > On 2003-06-10 15:08:22 -0400, Bruce Momjian wrote: > > For PostgreSQL, if your working set is X, if you set your shared buffers > > to X, you will get optimal performance (assuming there is no memory > > pressure). If set allocate X/2, you will probably get worse > > performance. If you allocate X*2, you will also probably get slightly > > worse performance. > > > > Now, let's suppose you can't allocate X shared buffers, because of > > memory pressure. Suppose you can allocate X/2 shared buffers, and that > > will leave X/2 kernel buffers. It would be better to allocate X/4 > > shared buffers, and leave X*3/4 kernel buffers. If you can only > > allocate X/5 shared buffers, you might be better with X/10 shared > > buffers because you are going to be doing a lot of I/O, and you need > > lots of kernel buffers for that. > > > > I think that is what people are seeing when modifying shared buffers: > > > > X shared buffers is best > > >X shared buffers is too much overhead and starves kernel > > <X might be better by not maximizing shared buffers and have > > more kernel buffers > > > > Add to this that it is very hard to estimate working set. > > > > Makes a lot of sense to me. We're doing a lot of I/O on a small part of that > 7GB, and the rest is accessed in a more or less random fashion, so 256MB of > shared buffers sounds about right. I'll play more with this in the future to > see at what setting it performs best. > > Is there any information available in the system tables or statistics > collector that can help determine X? Could PostgreSQL be easily modified to > provide more information in this area? Estimatinge working set is an old problem. You can look at pgsql_tmp under each database directory for sort mem, but for shared buffers, I am not sure how to know the proper size. Anyone else have an idea? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073