Thread: Re: [GENERAL] PostgreSQL Performance Tuning
Shohab Abdullah wrote: > > Dear, > We are facing performance tuning problem while using PostgreSQL Database > over the network on a linux OS. > Our Database consists of more than 500 tables with an average of 10K > records per table with an average of 20 users accessing the database > simultaneously over the network. Each table has indexes and we are > querying the database using Hibernate. > The biggest problem is while insertion, updating and fetching of > records, ie the database performance is very slow. It take a long time > to respond in the above scenario. > Please provide me with the tuning of the database. I am attaching my > *postgresql.conf* file for the reference of our current configuration Have you changed _anything_ from the defaults? The defaults are set so PG will run on as many installations as practical. They are not set for performance - that is specific to your equipment, your data, and how you need to handle the data. Assuming the record sizes aren't huge, that's not a very large data set nor number of users. Look at these for starters: http://www.varlena.com/GeneralBits/Tidbits/perf.html http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html You might try setting the logging parameters to log queries longer than "x" (where x is in milliseconds - you will have to decide the appropriate value for "too long") and start looking into those first. Make sure that you are running "analyze" if it is not being run by autovacuum. Use "EXPLAIN <your query>" to see how the query is being planned - as a first-pass assume that on any reasonably sized table the words "sequential scan" means "fix this". Note that you may have to cast variables in a query to match the variable in an index in order for the planner to figure out that it can use the index. Read the guidelines then take an educated stab at some settings and see how they work - other than turning off fsync, there's not much in postgresql.conf that will put your data at risk. Cheers, Steve
Steve Crawford wrote: > Have you changed _anything_ from the defaults? The defaults are set so > PG will run on as many installations as practical. They are not set for > performance - that is specific to your equipment, your data, and how you > need to handle the data. Is this really the sensible thing to do? I know we should not encourage the world we're leaving in even more in the ways of "have the computer do everything for us so that we don't need to have even a clue about what we're doing" ... But, wouldn't it make sense that the configure script determines the amount of physical memory and perhaps even do a HD speed estimate to set up defaults that are closer to a performance-optimized configuration? Then, perhaps command switches so that you could specify the type of access you estimate for your system. Perhaps something like: ./configure --db-size=100GB --write-percentage=20 .... etc. (switch write-percentage above indicates that we estimate that 20% of the DB activity would be writing to the disk --- there may be other switches to indicate the percentage of queries that are transactions, the percentage of queries that are complex; percentage that require index usage, etc. etc. etc.)... And then, based on that, a better set of defaults could be set by the configuration script. Does this make sense? Or perhaps I'm watching too much science fiction? Carlos --
Carlos Moreno <moreno_pg@mochima.com> writes: > ... But, wouldn't it make sense that the configure script > determines the amount of physical memory and perhaps even do a HD > speed estimate to set up defaults that are closer to a > performance-optimized > configuration? No. Most copies of Postgres these days are executed on machines very far away from where the code was built. It's a little bit safer to try to tune things at initdb time ... as indeed we already do. But the fundamental problem remains that we don't know that much about how the installation will be used. For example, the planner configuration parameters turn out to have not that much to do with the absolute speed of your drive, and a whole lot to do with the ratio of the size of your database to the amount of RAM you've got; and the ultimate size of the DB is one thing initdb certainly can't guess. Also, there is an extremely good reason why Postgres will never be set up to try to take over the whole machine by default: most of the developers run multiple postmasters on their machines. regards, tom lane
Tom Lane wrote: > Carlos Moreno <moreno_pg@mochima.com> writes: > >> ... But, wouldn't it make sense that the configure script >> determines the amount of physical memory and perhaps even do a HD >> speed estimate to set up defaults that are closer to a >> performance-optimized >> configuration? >> > > No. Most copies of Postgres these days are executed on machines very > far away from where the code was built. It's a little bit safer to > try to tune things at initdb time ... as indeed we already do. D'oh! Yes, that makes more sense, of course. > But > the fundamental problem remains that we don't know that much about > how the installation will be used. Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. Wouldn't initdb be able to do a better job at coming up with sensible defaults if it counts on this information? Of course, all these parameters would have their own defaults --- the user won't necessarily know or have an accurate estimate for each and every one of them. > Also, there is an extremely good reason why Postgres will never be set > up to try to take over the whole machine by default: most of the > developers run multiple postmasters on their machines. > Wouldn't this be covered by the above suggestion?? One of the switches for the command initdb could allow the user to specify how many instances will be run (I assume you're talking about having different instances listening on different ports for increased concurrency-related benefits?) Does my suggestion make more sense now? Or is it still too unrealistic to make it work properly/safely? Carlos --
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: >Notice that the second part of my suggestion covers this --- have >additional >switches to initdb so that the user can tell it about estimates on how >the DB >will be used: estimated size of the DB, estimated percentage of >activity that >will involve writing, estimated percentage of activity that will be >transactions, >percentage that will use indexes, percentage of queries that will be >complex, >etc. etc. If the person knows all that, why wouldn't they know to just change the config parameters? Mike Stone
Carlos Moreno <moreno_pg@mochima.com> writes: > Tom Lane wrote: >> But >> the fundamental problem remains that we don't know that much about >> how the installation will be used. > Notice that the second part of my suggestion covers this --- have > additional switches to initdb That's been proposed and rejected before, too; the main problem being that initdb is frequently a layer or two down from the user (eg, executed by initscripts that can't pass extra arguments through, even assuming they're being invoked by hand in the first place). regards, tom lane
Maybe he's looking for a switch for initdb that would make it interactive and quiz you about your expected usage-- sort of a magic auto-configurator wizard doohicky? I could see that sort of thing being nice for the casual user or newbie who otherwise would have a horribly mis-tuned database. They could instead have only a marginally mis-tuned database :) On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote: > On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: > >Notice that the second part of my suggestion covers this --- have > >additional > >switches to initdb so that the user can tell it about estimates on how > >the DB > >will be used: estimated size of the DB, estimated percentage of > >activity that > >will involve writing, estimated percentage of activity that will be > >transactions, > >percentage that will use indexes, percentage of queries that will be > >complex, > >etc. etc. > > If the person knows all that, why wouldn't they know to just change the > config parameters? > > Mike Stone > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
Hello. Just my 2 cents, and not looking to the technical aspects: setting up PSQL is the weakest point of PSQL as we have experienced ourself, once it is running it is great. I can imagine that a lot of people of stops after their first trials after they have experienced the troubles and performance of a standard set up. This is ofcourse a lost user forever. So anything that could be done to get an easier and BETTER setup would strongly enhance PSQL. My 2 cents. Henk Sanders -----Oorspronkelijk bericht----- Van: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]Namens Tom Lane Verzonden: vrijdag 27 april 2007 16:37 Aan: Carlos Moreno CC: PostgreSQL Performance Onderwerp: Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning Carlos Moreno <moreno_pg@mochima.com> writes: > Tom Lane wrote: >> But >> the fundamental problem remains that we don't know that much about >> how the installation will be used. > Notice that the second part of my suggestion covers this --- have > additional switches to initdb That's been proposed and rejected before, too; the main problem being that initdb is frequently a layer or two down from the user (eg, executed by initscripts that can't pass extra arguments through, even assuming they're being invoked by hand in the first place). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
On Fri, Apr 27, 2007 at 07:36:52AM -0700, Mark Lewis wrote: >Maybe he's looking for a switch for initdb that would make it >interactive and quiz you about your expected usage-- sort of a magic >auto-configurator wizard doohicky? I could see that sort of thing being >nice for the casual user or newbie who otherwise would have a horribly >mis-tuned database. They could instead have only a marginally mis-tuned >database :) However you implement it, anyone who can answer all of those questions is probably capable of reading and understanding the performance section in the manual. It's probably more practical to have a seperate script that looks at the running system (ram, disks, pg config, db size, indices, stats, etc.) and makes suggestions--if someone wants to write such a thing. Mike Stone
On Apr 27, 2007, at 3:30 PM, Michael Stone wrote: > On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: >> Notice that the second part of my suggestion covers this --- have >> additional >> switches to initdb so that the user can tell it about estimates on >> how the DB >> will be used: estimated size of the DB, estimated percentage of >> activity that >> will involve writing, estimated percentage of activity that will >> be transactions, >> percentage that will use indexes, percentage of queries that will >> be complex, >> etc. etc. > > If the person knows all that, why wouldn't they know to just change > the config parameters? Because knowing your expected workload is a lot easier for many people than knowing what every GUC does. Personally, I think it would be a tremendous start if we just provided a few sample configs like MySQL does. Or if someone wanted to get fancy they could stick a web page somewhere that would produce a postgresql.conf based simply on how much available RAM you had, since that's one of the biggest performance-hampering issues we run into (ie: shared_buffers left at the default of 32MB). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Michael Stone wrote: > On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: >> Notice that the second part of my suggestion covers this --- have >> additional >> switches to initdb <snip> > If the person knows all that, why wouldn't they know to just change the > config parameters? > Exactly.. What I think would be much more productive is to use the great amount of information that PG tracks internally and auto-tune the parameters based on it. For instance: Why does the user need to manually track max_fsm_pages and max_fsm_relations? I bet there are many users who have never taken the time to understand what this means and wondering why performance still stinks after vacuuming their database ( spoken from my own experience ) How about work_mem? shared_buffers? column statistics sizes? random_page_cost? Couldn't some fairly simple regression tests akin to a VACUUM process spot potential problems? "Hey, it looks like you need more fsm_relations.. I bumped that up automatically for you". Or "These indexes look bloated, shall I automatically reindex them for you?" I'm sure there are many more examples, that with some creative thinking, could be auto-adjusted to match the usage patterns of the database. PG does an excellent job of exposing the variables to the users, but mostly avoids telling the user what to do or doing it for them. Instead, it is up to the user to know where to look, what to look for, and how to react to things to improve performance. This is not all bad, but it is assuming that all users are hackers ( which used to be true ), but certainly doesn't help when the average SQLServer admin tries out Postgres and then is surprised at the things they are now responsible for managing. PG is certainly *not* the only database to suffer from this syndrome, I know.. I like to think of my systems as good employees. I don't want to have to micromanage everything they do. I want to tell them "here's what I want done", and assuming I made a good hiring choice, they will do it and take some liberty to adjust parameters where needed to achieve the spirit of the goal, rather than blindly do something inefficiently because I failed to explain to them the absolute most efficient way to accomplish the task. Granted, there are some people who don't like the developers making any assumptions about their workload. But this doesn't have to be an either/or proposition. I don't think any control needs to be abandoned. But self-adjusting defaults seem like an achievable goal ( I know, I know, "show us the patch" ). I just don't know if this feeling has resonated well between new users and long-term developers. I know it must be grating to have to answer the same questions over and over and over "have you analyzed? Did you leave postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO. In closing, I am not bashing PG! I love it and swear by it. These comments are purely from an advocacy perspective. I'd love to see PG user base continue to grow. My .02 -Dan
Dan, > Exactly.. What I think would be much more productive is to use the > great amount of information that PG tracks internally and auto-tune the > parameters based on it. For instance: *Everyone* wants this. The problem is that it's very hard code to write given the number of variables. I'm working on it but progress is slow, due to my travel schedule. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
In response to Dan Harris <fbsd@drivefaster.net>: > Michael Stone wrote: > > On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: > >> Notice that the second part of my suggestion covers this --- have > >> additional > >> switches to initdb > <snip> > > If the person knows all that, why wouldn't they know to just change the > > config parameters? > > Exactly.. What I think would be much more productive is to use the great amount > of information that PG tracks internally and auto-tune the parameters based on > it. For instance: > > Why does the user need to manually track max_fsm_pages and max_fsm_relations? I > bet there are many users who have never taken the time to understand what this > means and wondering why performance still stinks after vacuuming their database > ( spoken from my own experience ) But there are two distinct routes that can be taken if there's not enough fsm space: add fsm space or vacuum more frequently. I don't want the system to eat up a bunch of memory for fsm entries if my workload indicates that I can easily vacuum more frequently. > How about work_mem? shared_buffers? column statistics sizes? random_page_cost? The only one that seems practical (to me) is random_page_cost. The others are all configuration options that I (as a DBA) want to be able to decide for myself. For example, I have some dedicated PG servers that I pretty much max those values out at, to let PG know that it can use everything on the system -- but I also have some shared use machines with PG, where I carefully constrain those values so that PG doesn't muscle other daemons out of their share of the RAM (work_mem is probably the best example) It would be nice to have some kind of utility that could tell me what random_page_cost should be, as I've never felt comfortable tweaking it. Like some utility to run that would say "based on the seek tests I just ran, you should set random_page_cost to x". Of course, if such a thing existed, it could just fill in the value for you. But I haven't figured out how to pick a good value for that setting, so I have no idea how to suggest to have it automatically set. > Couldn't some fairly simple regression tests akin to a VACUUM process spot > potential problems? "Hey, it looks like you need more fsm_relations.. I bumped > that up automatically for you". Or "These indexes look bloated, shall I > automatically reindex them for you?" A lot of that stuff does happen. A vacuum verbose will tell you what it thinks you should do, but I don't _want_ it to do it automatically. What if I create huge temporary tables once a week for some sort of analysis that overload the fsm space? And if I'm dropping those tables when the analysis is done, do I want the fsm space constantly adjusting? Plus, some is just impossible. shared_buffers requires a restart. Do you want your DB server spontaneously restarting because it thought more buffers might be nice? > I'm sure there are many more examples, that with some creative thinking, could > be auto-adjusted to match the usage patterns of the database. PG does an > excellent job of exposing the variables to the users, but mostly avoids telling > the user what to do or doing it for them. Instead, it is up to the user to know > where to look, what to look for, and how to react to things to improve > performance. This is not all bad, but it is assuming that all users are hackers > ( which used to be true ), but certainly doesn't help when the average SQLServer > admin tries out Postgres and then is surprised at the things they are now > responsible for managing. PG is certainly *not* the only database to suffer > from this syndrome, I know.. I expect the suffering is a result of the fact that databases are non-trivial pieces of software, and there's no universally simple way to set them up and make them run well. > I like to think of my systems as good employees. I don't want to have to > micromanage everything they do. I want to tell them "here's what I want done", > and assuming I made a good hiring choice, they will do it and take some liberty > to adjust parameters where needed to achieve the spirit of the goal, rather than > blindly do something inefficiently because I failed to explain to them the > absolute most efficient way to accomplish the task. That's silly. No software does that. You're asking software to behave like humans. If that were the case, this would be Isaac Asimov's world, not the real one. > Granted, there are some people who don't like the developers making any > assumptions about their workload. But this doesn't have to be an either/or > proposition. I don't think any control needs to be abandoned. But > self-adjusting defaults seem like an achievable goal ( I know, I know, "show us > the patch" ). I just don't know if this feeling has resonated well between new > users and long-term developers. I know it must be grating to have to answer the > same questions over and over and over "have you analyzed? Did you leave > postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO. Well, it seems like this is happening where it's practical -- autovacuum is a good example. Personally, I wouldn't be opposed to more automagic stuff, just as long as I have the option to disable it. There are some cases where I still disable autovac. > In closing, I am not bashing PG! I love it and swear by it. These comments are > purely from an advocacy perspective. I'd love to see PG user base continue to grow. I expect that part of the problem is "who's going to do it?" -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
At 10:36a -0400 on 27 Apr 2007, Tom Lane wrote: > That's been proposed and rejected before, too; the main problem being > that initdb is frequently a layer or two down from the user (eg, > executed by initscripts that can't pass extra arguments through, even > assuming they're being invoked by hand in the first place). And following after Dan Harris' response . . . So what's the problem with having some sort of cronjob contrib module that utilizes the actual and current statistics to make recommendations? I don't think it'd be right to simply change the configuration options as it sees fit (especially as it was pointed out that many run multiple postmasters or have other uses for the machines in question), but perhaps it could send a message (email?) along the lines of "Hey, I'm currently doing this many of X transactions, against this much of Y data, and working under these constraints. You might get better performance (in this area ... ) if you altered the the configurations options like so: ..." Certainly not for the masters, but perhaps for standard installation sort of deals, sort of liking bringing up the rear . . . just a thought. Kevin
On Fri, Apr 27, 2007 at 02:40:07PM -0400, Kevin Hunter wrote: > out that many run multiple postmasters or have other uses for the > machines in question), but perhaps it could send a message (email?) > along the lines of "Hey, I'm currently doing this many of X > transactions, against this much of Y data, and working under these > constraints. You might get better performance (in this area ... ) if > you altered the the configurations options like so: ..." or storing the values in the db for later trending analysis, witness ora statspack.
Bill, > The only one that seems practical (to me) is random_page_cost. The > others are all configuration options that I (as a DBA) want to be able > to decide for myself. Actually, random_page_cost *should* be a constant "4.0" or "3.5", which represents the approximate ratio of seek/scan speed which has been relatively constant across 6 years of HDD technology. The only reason we make it a configuration variable is that there's defects in our cost model which cause users to want to tinker with it. Mind you, that's gotten better in recent versions as well. Lately I mostly tinker with effective_cache_size and the various cpu_* stats rather than modifying random_page_cost. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Bill Moran wrote: > In response to Dan Harris <fbsd@drivefaster.net>: <snip> >> Why does the user need to manually track max_fsm_pages and max_fsm_relations? I >> bet there are many users who have never taken the time to understand what this >> means and wondering why performance still stinks after vacuuming their database >> ( spoken from my own experience ) > > But there are two distinct routes that can be taken if there's not enough > fsm space: add fsm space or vacuum more frequently. I don't want the system > to eat up a bunch of memory for fsm entries if my workload indicates that > I can easily vacuum more frequently. There's no magic bullet here, but heuristics should be able to tell us you can "easily vacuum more frequently" And again, I said these things would be *optional*. Like an item in postgresql.conf "i_have_read_the_manual_and_know_what_this_all_means = false #default false". If you change it to true, you have all the control you're used to and nothing will get in your way. > >> How about work_mem? shared_buffers? column statistics sizes? random_page_cost? > > The only one that seems practical (to me) is random_page_cost. The others are > all configuration options that I (as a DBA) want to be able to decide for > myself. For example, I have some dedicated PG servers that I pretty much > max those values out at, to let PG know that it can use everything on the > system -- but I also have some shared use machines with PG, where I carefully > constrain those values so that PG doesn't muscle other daemons out of their > share of the RAM (work_mem is probably the best example) > Just because you carefully constrain it does not preclude the ability for program logic to maintain statistics to do what I suggested. > It would be nice to have some kind of utility that could tell me what > random_page_cost should be, as I've never felt comfortable tweaking it. > Like some utility to run that would say "based on the seek tests I just > ran, you should set random_page_cost to x". Of course, if such a thing > existed, it could just fill in the value for you. But I haven't figured > out how to pick a good value for that setting, so I have no idea how to > suggest to have it automatically set. Me either, but I thought if there's a reason it's user-settable, there must be some demonstrable method for deciding what is best. > >> Couldn't some fairly simple regression tests akin to a VACUUM process spot >> potential problems? "Hey, it looks like you need more fsm_relations.. I bumped >> that up automatically for you". Or "These indexes look bloated, shall I >> automatically reindex them for you?" > > A lot of that stuff does happen. A vacuum verbose will tell you what it > thinks you should do, but I don't _want_ it to do it automatically. What > if I create huge temporary tables once a week for some sort of analysis that > overload the fsm space? And if I'm dropping those tables when the analysis > is done, do I want the fsm space constantly adjusting? I understand *you* don't want it done automatically. But my suspicion is that there are a lot more newbie pg admins who would rather let the system do something sensible as a default. Again, you sound defensive that somehow my ideas would take power away from you. I'm not sure why that is, but certainly I'm not suggesting that. An auto-pilot mode is not a bad idea just because a few pilots don't want to use it. > > Plus, some is just impossible. shared_buffers requires a restart. Do you > want your DB server spontaneously restarting because it thought more > buffers might be nice? Well, maybe look at the bigger picture and see if it can be fixed to *not* require a program restart? Or.. take effect on the next pid that gets created? This is a current limitation, but doesn't need to be one for eternity does it? > >> I'm sure there are many more examples, that with some creative thinking, could >> be auto-adjusted to match the usage patterns of the database. PG does an >> excellent job of exposing the variables to the users, but mostly avoids telling >> the user what to do or doing it for them. Instead, it is up to the user to know >> where to look, what to look for, and how to react to things to improve >> performance. This is not all bad, but it is assuming that all users are hackers >> ( which used to be true ), but certainly doesn't help when the average SQLServer >> admin tries out Postgres and then is surprised at the things they are now >> responsible for managing. PG is certainly *not* the only database to suffer >> from this syndrome, I know.. > > I expect the suffering is a result of the fact that databases are non-trivial > pieces of software, and there's no universally simple way to set them up > and make them run well. Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to version 2000 ), I can say there *is* a way to make them simple. It's certainly not a perfect piece of software, but the learning curve speaks for itself. It can auto-shrink your databases ( without locking problems ). Actually it pretty much runs itself. It auto-allocates RAM for you ( up to the ceiling *you* control ). It automatically re-analyzes itself.. I was able to successfully manage several servers with not insignificant amounts of data in them for many years without being a trained DBA. After switching to PG, I found myself having to twiddle with all sorts of settings that seemed like it should just know about without me having to tell it. I'm not saying it was simple to make it do that. MS has invested LOTS of money and effort into making it that way. I don't expect PG to have features like that tomorrow or even next release. But, I feel it's important to make sure that those who *can* realistically take steps in that direction understand this point of view ( and with Josh's other reply to this, I think many do ). > >> I like to think of my systems as good employees. I don't want to have to >> micromanage everything they do. I want to tell them "here's what I want done", >> and assuming I made a good hiring choice, they will do it and take some liberty >> to adjust parameters where needed to achieve the spirit of the goal, rather than >> blindly do something inefficiently because I failed to explain to them the >> absolute most efficient way to accomplish the task. > > That's silly. No software does that. You're asking software to behave like > humans. If that were the case, this would be Isaac Asimov's world, not the > real one. It's not silly. There are plenty of systems that do that. Maybe you just haven't used them. Again, SQL Server did a lot of those things for me. I didn't have to fiddle with checkboxes or multi-select tuning options. It learned what its load was and reacted appropriately. I never had to stare at planner outputs and try and figure out why the heck did it choose that plan. Although, I certainly could have if I wanted to. It has a tool called the SQL Profiler which will "watch" your workload on the database, do regression testing and suggest ( and optionally implement with a single click ) indexes on your tables. I've been wanting to do this for years with PG, and had a small start on a project to do just that actually. > >> Granted, there are some people who don't like the developers making any >> assumptions about their workload. But this doesn't have to be an either/or >> proposition. I don't think any control needs to be abandoned. But >> self-adjusting defaults seem like an achievable goal ( I know, I know, "show us >> the patch" ). I just don't know if this feeling has resonated well between new >> users and long-term developers. I know it must be grating to have to answer the >> same questions over and over and over "have you analyzed? Did you leave >> postgresql.conf at the defaults??". Seems like a win-win for both sides, IMHO. > > Well, it seems like this is happening where it's practical -- autovacuum is > a good example. Agreed, this is a huge step forward. And again, I'm not taking an offensive posture on this. Just that I think it's worth giving my .02 since I have had strong feelings about this for awhile. > > Personally, I wouldn't be opposed to more automagic stuff, just as long as > I have the option to disable it. There are some cases where I still > disable autovac. > >> In closing, I am not bashing PG! I love it and swear by it. These comments are >> purely from an advocacy perspective. I'd love to see PG user base continue to grow. > > I expect that part of the problem is "who's going to do it?" > Yes, this is the classic problem. I'm not demanding anyone pick up the ball and jump on this today, tomorrow, etc.. I just think it would be good for those who *could* make a difference to keep those goals in mind when they continue. If you have the right mindset, this problem will fix itself over time. -Dan
Dan, > Yes, this is the classic problem. I'm not demanding anyone pick up the > ball and jump on this today, tomorrow, etc.. I just think it would be > good for those who *could* make a difference to keep those goals in mind > when they continue. If you have the right mindset, this problem will > fix itself over time. Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took Microsoft 3-4 years, and theirs still has major issues last I checked. And both of those DBs support less OSes than we do. I think it's going to take more than the *right mindset* and my spare time. > I appreciate your efforts in this regard. Do you have a formal project > plan for this? If you can share it with me, I'll take a look and see if > there is anything I can do to help out. Nope, just some noodling around on the configurator: www.pgfoundry.org/projects/configurator > I am on the verge of starting a Java UI that will query a bunch of the > pg_* tables and give the user information about wasted table space, > index usage, table scans, slow-running queries and spoon-feed it in a > nice attractive interface that can be a real-time system monitor tool. > This could be a cooperative project or might have some redundancy with > what you're up to. I'd be *very* interested in collaborating with you on this. Further, we could feed DTrace (& systemtap?) into the interface to get data that PostgreSQL doesn't currently produce. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Fri, 27 Apr 2007, Josh Berkus wrote: > Dan, > >> Yes, this is the classic problem. I'm not demanding anyone pick up the >> ball and jump on this today, tomorrow, etc.. I just think it would be >> good for those who *could* make a difference to keep those goals in mind >> when they continue. If you have the right mindset, this problem will >> fix itself over time. > > Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took > Microsoft 3-4 years, and theirs still has major issues last I checked. And > both of those DBs support less OSes than we do. I think it's going to > take more than the *right mindset* and my spare time. I think there are a couple different things here. 1. full autotuning as you say, this is very hard and needs a lot of info about your particular database useage. 2. getting defaults that are closer to right then current. this is much easier. for this nobody is expecting that the values are right, we're just begging for some tool to get us within an couple orders of magnatude of what's correct. the current defaults are appropriate for a single cpu with 10's of MB of ram and a single drive nowdays you have people trying to run quick-and-dirty tests on some spare hardware they have laying around (waiting for another project) that's got 4-8 CPU's with 10's of GB of ram and a couple dozen drives these people don't know about database tuneing, they can learn, but they want to see if postgres is even in the ballpark. if the results are close to acceptable they will ask questions and research the tuneing, but if the results are orders of magnatude lower then they need to be they'll just say that postgress is too slow and try another database. an autodefault script that was written assuming that postgres has the box to itself would be a wonderful start. I think the next step would be to be able to tell the script 'only plan on useing 1/2 of this box' and beyond that would be the steps that you are thinking of where the useage pattern is considered. but when every performance question is answered with "did you change the defaults? they are way too low for modern hardware, raise them by 2 orders of magnatude and then we'll start investigating" David Lang
Carlos,
about your feature proposal: as I learned, nearly all Perfomance.Configuration can be done by editing the .INI file and making the Postmaster re-read it.
So, WHY at all should those parameters be guessed at the installation of the database? Would'nt it be a saver point of time to have some postgresql-tune
utilitiy, which gets run after the installation, maybe every once in a while. That tool can check vital information like "Databasesize to memory relation"; and suggest a new postgresql.ini.
That tool needs NO INTEGRATION whatsoever - it can be developed, deployed totally independend and later only be bundled.
And as this tool can be tested seperately, does not need a new initdb every time ... it can be developed more easily.
Maybe there is even a pointy flashy version possible (perhaps even for money :) which gives nice graphics and "optimized", like those Windows Optimizers. :) I am sure, some DBAs in BIGCOMPs would be thrilled :)
May that be a possible way?
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
about your feature proposal: as I learned, nearly all Perfomance.Configuration can be done by editing the .INI file and making the Postmaster re-read it.
So, WHY at all should those parameters be guessed at the installation of the database? Would'nt it be a saver point of time to have some postgresql-tune
utilitiy, which gets run after the installation, maybe every once in a while. That tool can check vital information like "Databasesize to memory relation"; and suggest a new postgresql.ini.
That tool needs NO INTEGRATION whatsoever - it can be developed, deployed totally independend and later only be bundled.
Does my suggestion make more sense now? Or is it still too unrealistic to
make it work properly/safely?
Maybe there is even a pointy flashy version possible (perhaps even for money :) which gives nice graphics and "optimized", like those Windows Optimizers. :) I am sure, some DBAs in BIGCOMPs would be thrilled :)
May that be a possible way?
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
On 4/28/07, Harald Armin Massa <haraldarminmassa@gmail.com> wrote: > about your feature proposal: as I learned, nearly all > Perfomance.Configuration can be done by editing the .INI file and making the > Postmaster re-read it. Um, shared_buffers is one of the most important initial parameters to set and it most certainly cannot be set after startup. > So, WHY at all should those parameters be guessed at the installation of the > database? Because a lot of good assumptions can be made on the initial install. Likewise, some of the most important parameters cannot be tuned after startup. > Maybe there is even a pointy flashy version possible (perhaps even for money > :) which gives nice graphics and "optimized", like those Windows Optimizers. > :) I am sure, some DBAs in BIGCOMPs would be thrilled :) I'd suggest that you not make snide remarks about someone else's design when your own analysis is somewhat flawed. -- Jonah H. Harris, Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com Iselin, New Jersey 08830 | http://www.enterprisedb.com/
Jonah,
Not after startup, correct. But after installation. It is possible to change PostgreSQL.conf (not ini, to much windows on my side, sorry) and restart postmaster.
Yes. These assumptions can be made - but then they are assumptions. When the database is filled and working, there are measurable facts. And yes, that needs a restart of postmaster, that does not work on 24/7. But there are many databases which can be restartet for tuning in regular maintainance sessions.
Sorry, Jonah, if my words sounded "snide". I had feedback from some DBAs in BIGCOMPs, who said very positive things about the beauty of pgadmin. I saw some DBAs quite happy about the graphical displays of TOAD. I worked for a MVS Hoster who paid BIG SUMS to Candle Software for a Software called Omegamon, which made it possible to have charts about performance figures.
So I deducted that people would even be willing to pay money for a GUI which presents the opimizing process.
That idea of "tune PostgreSQL database after installation" also came from the various request on pgsql-performance. Some ask before they install; but there are also MANY questions with "our PostgreSQL database was running fast untill xxxx", with xxxx usually being a table grown bigger then n records.
And I really did not want to discredit the idea of properly configuring from the start. Just wanted to open an other option to do that tuning.
Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
Um, shared_buffers is one of the most important initial parameters to
set and it most certainly cannot be set after startup.
Not after startup, correct. But after installation. It is possible to change PostgreSQL.conf (not ini, to much windows on my side, sorry) and restart postmaster.
Because a lot of good assumptions can be made on the initial install.
Likewise, some of the most important parameters cannot be tuned after
startup.
Yes. These assumptions can be made - but then they are assumptions. When the database is filled and working, there are measurable facts. And yes, that needs a restart of postmaster, that does not work on 24/7. But there are many databases which can be restartet for tuning in regular maintainance sessions.
> :) which gives nice graphics and "optimized", like those Windows Optimizers.
> :) I am sure, some DBAs in BIGCOMPs would be thrilled :)
>I'd suggest that you not make snide remarks about someone else's
>design when your own analysis is somewhat flawed.
Sorry, Jonah, if my words sounded "snide". I had feedback from some DBAs in BIGCOMPs, who said very positive things about the beauty of pgadmin. I saw some DBAs quite happy about the graphical displays of TOAD. I worked for a MVS Hoster who paid BIG SUMS to Candle Software for a Software called Omegamon, which made it possible to have charts about performance figures.
So I deducted that people would even be willing to pay money for a GUI which presents the opimizing process.
That idea of "tune PostgreSQL database after installation" also came from the various request on pgsql-performance. Some ask before they install; but there are also MANY questions with "our PostgreSQL database was running fast untill xxxx", with xxxx usually being a table grown bigger then n records.
And I really did not want to discredit the idea of properly configuring from the start. Just wanted to open an other option to do that tuning.
Harald
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.
Harald Armin Massa wrote: > Carlos, > > about your feature proposal: as I learned, nearly all > Perfomance.Configuration can be done by editing the .INI file and > making the Postmaster re-read it. > > So, WHY at all should those parameters be guessed at the installation > of the database? Would'nt it be a saver point of time to have some > postgresql-tune > utilitiy, which gets run after the installation, maybe every once in a > while. That tool can check vital information like "Databasesize to > memory relation"; and suggest a new postgresql.ini. I would soooo volunteer to do that and donate it to the PG project!! Problem is, you see, I'm not sure what an appropriate algorithm would be --- for instance, you mention "DB size to memory relation" as if it is an extremely important parameter and/or a parameter with extremely obvious consequences in the configuration file --- I'm not even sure about what I would do given that. I always get the feeling that figuring out stuff from the performance-tuning-related documentation requires technical knowledge about the internals of PG (which, granted, it's out there, given that we get 100% of the source code ... But still, that's far from the point, I'm sure we agree) That's why I formulated my previous post as a Feature Request, rather than a "would this be a good feature / should I get started working on that?" :-) Carlos --
On Fri, 27 Apr 2007, Josh Berkus wrote: > *Everyone* wants this. The problem is that it's very hard code to write > given the number of variables There's lots of variables, and there are at least three major ways to work on improving someone's system: 1) Collect up data about their system (memory, disk layout), find out a bit about their apps/workload, and generate a config file based on that. 2) Connect to the database and look around. Study the tables and some their stats, make some estimates based on what your find, produce a new config file. 3) Monitor the database while it's doing its thing. See which parts go well and which go badly by viewing database statistics like pg_statio. From that, figure out where the bottlenecks are likely to be and push more resources toward them. What I've been working on lately is exposing more readouts of performance-related database internals to make this more practical. When first exposed to this problem, most people assume that (1) is good enough--ask some questions, look at the machine, and magically a reasonable starting configuration can be produced. It's already been pointed out that anyone with enough knowledge to do all that can probably spit out a reasonable guess for the config file without help. If you're going to the trouble of building a tool for offering configuration advice, it can be widly more effective if you look inside the database after it's got data in it, and preferably after it's been running under load for a while, and make your recommendations based on all that information. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > If you're going to the trouble of building a tool for offering > configuration advice, it can be widly more effective if you look inside > the database after it's got data in it, and preferably after it's been > running under load for a while, and make your recommendations based on > all that information. There are two completely different problems that are getting mixed together in this discussion. Several people have triedto distinguish them, but let's be explicit: 1. Generating a resonable starting configuration for neophyte users who have installed Postgres for the first time. 2. Generating an optimal configuration for a complex, running system that's loaded with data. The first problem is easy: Any improvement would be welcome and would give most users a better initial experience. The secondproblem is nearly impossible. Forget the second problem (or put it on the "let's find someone doing a PhD project"list), and focus on the first. From my limited experience, a simple questionaire could be used to create a pretty good starting configuration file. Furthermore,many of the answers can be discovered automatically: 1. How much memory do you have? 2. How many disks do you have? a. Which disk contains the OS? b. Which disk(s) have swap space? c. Which disks are "off limits" (not to be used by Postgres) 3. What is the general nature of your database? a. Mostly static (few updates, lots of access) b. Mostly archival (lots of writes, few reads) c. Very dynamic (data are added, updated, and deleted a lot) 4. Do you have a lot of small, fast transactions or a few big, long transactions? 5. How big do you expect your database to be? 6. How many simultaneous users do you expect? 7. What are the users you want configured initially? 8. Do you want local access only, or network access? With these few questions (and perhaps a couple more), a decent set of startup files could be created that would give good,'tho not optimal, performance for most people just getting started. I agree with an opinion posted a couple days ago: The startup configuration is one of the weakest features of Postgres. It's not rocket science, but there are several files, and it's not obvious to the newcomer that the files even exist. Here's just one example: A coworker installed Postgres and couldn't get it to work at all. He struggled for hours. Whenhe contacted me, I tried his installation and it worked fine. He tried it, and he couldn't connect. I asked him, "Areyou using localhost?" He said yes, but what he meant was he was using the local *network*, 192.168.0.5, whereas I wasusing "localhost". He didn't have network access enabled. So, four hours wasted. This is the sort of thing that makes experienced users say, "Well, duh!" But there are many number of these little trapsand obscure configuration parameters that make the initial Postgres experience a poor one. It wouldn't take much tomake a big difference to new users. Craig
At 12:18p -0400 on 30 Apr 2007, Craig A. James wrote: > 1. Generating a resonable starting configuration for neophyte users > who have installed Postgres for the first time. I recognize that PostgreSQL and MySQL try to address different problem-areas, but is this one reason why a lot of people with whom I talk prefer MySQL? Because PostgreSQL is so "slooow" out of the box?* Thanks, Kevin * Not trolling; I use PostgreSQL almost exclusively.
Greg, > 1) Collect up data about their system (memory, disk layout), find out a > bit about their apps/workload, and generate a config file based on that. We could start with this. Where I bogged down is that collecting system information about several different operating systems ... and in some cases generating scripts for boosting things like shmmax ... is actually quite a large problem from a slog perspective; there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. I actually have algorithms which are "good enough to start with" for most of the important GUCs worked out, and others could be set through an interactive script ("Does your application require large batch loads involving thousands or millions of updates in the same transaction?" "How large (GB) do you expect your database to be?") > 2) Connect to the database and look around. Study the tables and some > their stats, make some estimates based on what your find, produce a new > config file. I'm not sure that much more for (2) can be done than for (1). Tables-on-disk don't tell us much. > 3) Monitor the database while it's doing its thing. See which parts go > well and which go badly by viewing database statistics like pg_statio. > From that, figure out where the bottlenecks are likely to be and push more > resources toward them. What I've been working on lately is exposing more > readouts of performance-related database internals to make this more > practical. We really should collaborate on that. > When first exposed to this problem, most people assume that (1) is good > enough--ask some questions, look at the machine, and magically a > reasonable starting configuration can be produced. It's already been > pointed out that anyone with enough knowledge to do all that can probably > spit out a reasonable guess for the config file without help. But that's actually more than most people already do. Further, if you don't start with a "reasonable" configuration, then it's difficult-impossible to analyze where your settings are out-of-whack; behavior introduced by some way-to-low settings will mask any other tuning that needs to be done. It's also hard/impossible to devise tuning algorithms that work for both gross tuning (increase shared_buffers by 100x) and fine tuning (decrease bgwriter_interval to 45ms). So whether or not we do (3), we need to do (1) first. -- Josh Berkus PostgreSQL @ Sun San Francisco
> large problem from a slog perspective; there is no standard way even within > Linux to describe CPUs, for example. Collecting available disk space > information is even worse. So I'd like some help on this portion. > Quite likely, naiveness follows... But, aren't things like /proc/cpuinfo , /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very least across Linux distros? I'm not familiar with BSD or other Unix flavours, but I would expect these (or their equivalent) to exist in those, no? Am I just being naive? Carlos --
On Tue, 1 May 2007, Carlos Moreno wrote: >> large problem from a slog perspective; there is no standard way even >> within Linux to describe CPUs, for example. Collecting available disk >> space information is even worse. So I'd like some help on this portion. >> > > Quite likely, naiveness follows... But, aren't things like /proc/cpuinfo , > /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very > least across Linux distros? I'm not familiar with BSD or other Unix > flavours, but I would expect these (or their equivalent) to exist in those, > no? > > Am I just being naive? unfortunantly yes. across different linux distros they are fairly standard (however different kernel versions will change them) however different kernels need drasticly different tools to get the info from them. David Lang
On Mon, 30 Apr 2007, Kevin Hunter wrote: > I recognize that PostgreSQL and MySQL try to address different > problem-areas, but is this one reason why a lot of people with whom I > talk prefer MySQL? Because PostgreSQL is so "slooow" out of the box? It doesn't help, but there are many other differences that are as big or bigger. Here are a few samples off the top of my head: 1) Performance issues due to MVCC (MySQL fans love to point out how fast they can do select count(*) from x) 2) Not knowing you have to run vacuum analyze and therefore never seeing a good query result 3) Unfair comparison of PostgreSQL with robust WAL vs. MySQL+MyISAM on write-heavy worksloads These are real issues, which of course stack on top of things like outdated opinions from older PG releases with performance issues resolved in the last few years. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Tue, 1 May 2007, Josh Berkus wrote: > there is no standard way even within Linux to describe CPUs, for > example. Collecting available disk space information is even worse. So > I'd like some help on this portion. I'm not fooled--secretly you and your co-workers laugh at how easy this is on Solaris and are perfectly happy with how difficult it is on Linux, right? I joke becuase I've been re-solving some variant on this problem every few years for a decade now and it just won't go away. Last time I checked the right answer was to find someone else who's already done it, packaged that into a library, and appears committed to keeping it up to date; just pull a new rev of that when you need it. For example, for the CPU/memory part, top solves this problem and is always kept current, so on open-source platforms there's the potential to re-use that code. Now that I know that's one thing you're (understandably) fighting with I'll dig up my references on that (again). > It's also hard/impossible to devise tuning algorithms that work for both > gross tuning (increase shared_buffers by 100x) and fine tuning (decrease > bgwriter_interval to 45ms). I would advocate focusing on iterative improvements to an existing configuration rather than even bothering with generating a one-off config for exactly this reason. It *is* hard/impossible to get it right in a single shot, because of how many parameters interact and the way bottlenecks clear, so why not assume from the start you're going to do it several times--then you've only got one piece of software to write. The idea I have in my head is a tool that gathers system info, connects to the database, and then spits out recommendations in order of expected effectiveness--with the specific caveat that changing too many things at one time isn't recommended, and some notion of parameter dependencies. The first time you run it, you'd be told that shared_buffers was wildly low, effective_cache_size isn't even in the right ballpark, and your work_mem looks small relative to the size of your tables; fix those before you bother doing anything else because any data collected with those at very wrong values is bogus. Take two, those parameters pass their sanity tests, but since you're actually running at a reasonable speed now the fact that your tables are no longer being vacuumed frequently enough might bubble to the top. It would take a few passes through to nail down everything, but as long as it's put together such that you'd be in a similar position to the single-shot tool after running it once it would remove that as something separate that needed to be built. To argue against myself for a second, it may very well be the case that writing the simpler tool is the only way to get a useful prototype for building the more complicated one; very easy to get bogged down in feature creep on a grand design otherwise. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Tue, 1 May 2007, Greg Smith wrote: > On Tue, 1 May 2007, Josh Berkus wrote: > >> there is no standard way even within Linux to describe CPUs, for example. >> Collecting available disk space information is even worse. So I'd like >> some help on this portion. what type of description of the CPU's are you looking for? >> It's also hard/impossible to devise tuning algorithms that work for both >> gross tuning (increase shared_buffers by 100x) and fine tuning (decrease >> bgwriter_interval to 45ms). > > I would advocate focusing on iterative improvements to an existing > configuration rather than even bothering with generating a one-off config for > exactly this reason. It *is* hard/impossible to get it right in a single > shot, because of how many parameters interact and the way bottlenecks clear, > so why not assume from the start you're going to do it several times--then > you've only got one piece of software to write. nobody is asking for things to be right the first time. > The idea I have in my head is a tool that gathers system info, connects to > the database, and then spits out recommendations in order of expected > effectiveness--with the specific caveat that changing too many things at one > time isn't recommended, and some notion of parameter dependencies. The first > time you run it, you'd be told that shared_buffers was wildly low, > effective_cache_size isn't even in the right ballpark, and your work_mem > looks small relative to the size of your tables; fix those before you bother > doing anything else because any data collected with those at very wrong > values is bogus. why not have a much simpler script that gets these values up into the right ballpark first? then after that the process and analysis that you are suggesting would be useful. the problem is that the defaults are _so_ far off that no sane incremental program is going to be able to converge on the right answer rapidly. David Lang > Take two, those parameters pass their sanity tests, but > since you're actually running at a reasonable speed now the fact that your > tables are no longer being vacuumed frequently enough might bubble to the > top. > > It would take a few passes through to nail down everything, but as long as > it's put together such that you'd be in a similar position to the single-shot > tool after running it once it would remove that as something separate that > needed to be built. > > To argue against myself for a second, it may very well be the case that > writing the simpler tool is the only way to get a useful prototype for > building the more complicated one; very easy to get bogged down in feature > creep on a grand design otherwise.
The more I think about this thread, the more I'm convinced of 2 things: 1= Suggesting initial config values is a fundamentally different exercise than tuning a running DBMS. This can be handled reasonably well by HW and OS snooping. OTOH, detailed fine tuning of a running DBMS does not appear to be amenable to this approach. So... 2= We need to implement the kind of timer support that Oracle 10g has. Oracle performance tuning was revolutionized by there being micro-second accurate timers available for all Oracle operations. IMHO, we should learn from that. Only the combination of the above looks like it will really be successful in addressing the issues brought up in this thread. Cheers, Ron Peacetree At 01:59 PM 4/27/2007, Josh Berkus wrote: >Dan, > > > Exactly.. What I think would be much more productive is to use the > > great amount of information that PG tracks internally and auto-tune the > > parameters based on it. For instance: > >*Everyone* wants this. The problem is that it's very hard code to write >given the number of variables. I'm working on it but progress is slow, >due to my travel schedule. > >-- >--Josh > >Josh Berkus >PostgreSQL @ Sun >San Francisco > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match
Greg, > I'm not fooled--secretly you and your co-workers laugh at how easy this > is on Solaris and are perfectly happy with how difficult it is on Linux, > right? Don't I wish. There's issues with getting CPU info on Solaris, too, if you get off of Sun Hardware to generic white boxes. The base issue is that there's no standardization on how manufacturers report the names of their CPUs, 32/64bit, or clock speeds. So any attempt to determine "how fast" a CPU is, even on a 1-5 scale, requires matching against a database of regexes which would have to be kept updated. And let's not even get started on Windows. > I joke becuase I've been re-solving some variant on this problem every > few years for a decade now and it just won't go away. Last time I > checked the right answer was to find someone else who's already done it, > packaged that into a library, and appears committed to keeping it up to > date; just pull a new rev of that when you need it. For example, for > the CPU/memory part, top solves this problem and is always kept current, > so on open-source platforms there's the potential to re-use that code. > Now that I know that's one thing you're (understandably) fighting with > I'll dig up my references on that (again). Actually, total memory is not an issue, that's fairly straight forwards. Nor is # of CPUs. Memory *used* is a PITA, which is why I'd ignore that part and make some assumptions. It would have to be implemented in a per-OS manner, which is what bogged me down. > I would advocate focusing on iterative improvements to an existing > configuration rather than even bothering with generating a one-off > config for exactly this reason. It *is* hard/impossible to get it right > in a single shot, because of how many parameters interact and the way > bottlenecks clear, so why not assume from the start you're going to do > it several times--then you've only got one piece of software to write. Sounds fine to me. > To argue against myself for a second, it may very well be the case that > writing the simpler tool is the only way to get a useful prototype for > building the more complicated one; very easy to get bogged down in > feature creep on a grand design otherwise. It's certainly easy for me. ;-) -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Thu, 3 May 2007, Josh Berkus wrote: > Greg, > >> I'm not fooled--secretly you and your co-workers laugh at how easy this >> is on Solaris and are perfectly happy with how difficult it is on Linux, >> right? > > Don't I wish. There's issues with getting CPU info on Solaris, too, if you > get off of Sun Hardware to generic white boxes. The base issue is that > there's no standardization on how manufacturers report the names of their > CPUs, 32/64bit, or clock speeds. So any attempt to determine "how fast" > a CPU is, even on a 1-5 scale, requires matching against a database of > regexes which would have to be kept updated. > > And let's not even get started on Windows. I think the only sane way to try and find the cpu speed is to just do a busy loop of some sort (ideally something that somewhat resembles the main code) and see how long it takes. you may have to do this a few times until you get a loop that takes long enough (a few seconds) on a fast processor David Lang
>> CPUs, 32/64bit, or clock speeds. So any attempt to determine "how >> fast" >> a CPU is, even on a 1-5 scale, requires matching against a database of >> regexes which would have to be kept updated. >> >> And let's not even get started on Windows. > > I think the only sane way to try and find the cpu speed is to just do > a busy loop of some sort (ideally something that somewhat resembles > the main code) and see how long it takes. you may have to do this a > few times until you get a loop that takes long enough (a few seconds) > on a fast processor I was going to suggest just that (but then was afraid that again I may have been just being naive) --- I can't remember the exact name, but I remember using (on some Linux flavor) an API call that fills a struct with data on the resource usage for the process, including CPU time; I assume measured with precision (that is, immune to issues of other applications running simultaneously, or other random events causing the measurement to be polluted by random noise). As for 32/64 bit --- doesn't PG already know that information? I mean, ./configure does gather that information --- does it not? Carlos --
On Thu, 3 May 2007, Carlos Moreno wrote: >> > CPUs, 32/64bit, or clock speeds. So any attempt to determine "how >> > fast" >> > a CPU is, even on a 1-5 scale, requires matching against a database of >> > regexes which would have to be kept updated. >> > >> > And let's not even get started on Windows. >> >> I think the only sane way to try and find the cpu speed is to just do a >> busy loop of some sort (ideally something that somewhat resembles the main >> code) and see how long it takes. you may have to do this a few times until >> you get a loop that takes long enough (a few seconds) on a fast processor > > I was going to suggest just that (but then was afraid that again I may have > been just being naive) --- I can't remember the exact name, but I remember > using (on some Linux flavor) an API call that fills a struct with data on the > resource usage for the process, including CPU time; I assume measured > with precision (that is, immune to issues of other applications running > simultaneously, or other random events causing the measurement to be > polluted by random noise). since what we are looking for here is a reasonable first approximation, not perfection I don't think we should worry much about pollution of the value. if the person has other things running while they are running this test that will be running when they run the database it's no longer 'pollution' it's part of the environment. I think a message at runtime that it may produce inaccurate results if you have other heavy processes running for the config that won't be running with the database would be good enough (remember it's not only CPU time that's affected like this, it's disk performance as well) > As for 32/64 bit --- doesn't PG already know that information? I mean, > ./configure does gather that information --- does it not? we're not talking about comiling PG, we're talking about getting sane defaults for a pre-compiled binary. if it's a 32 bit binary assume a 32 bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded into the binary at compile time) David Lang
>> been just being naive) --- I can't remember the exact name, but I >> remember >> using (on some Linux flavor) an API call that fills a struct with >> data on the >> resource usage for the process, including CPU time; I assume measured >> with precision (that is, immune to issues of other applications running >> simultaneously, or other random events causing the measurement to be >> polluted by random noise). > > since what we are looking for here is a reasonable first > approximation, not perfection I don't think we should worry much about > pollution of the value. Well, it's not as much worrying as it is choosing the better among two equally difficult options --- what I mean is that obtaining the *real* resource usage as reported by the kernel is, from what I remember, equally hard as it is obtaining the time with milli- or micro-seconds resolution. So, why not choosing this option? (in fact, if we wanted to do it "the scripted way", I guess we could still use "time test_cpuspeed_loop" and read the report by the command time, specifying CPU time and system calls time. >> As for 32/64 bit --- doesn't PG already know that information? I mean, >> ./configure does gather that information --- does it not? > > we're not talking about comiling PG, we're talking about getting sane > defaults for a pre-compiled binary. if it's a 32 bit binary assume a > 32 bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded > into the binary at compile time) Right --- I was thinking that configure, which as I understand, generates the Makefiles to compile applications including initdb, could plug those values as compile-time constants, so that initdb (or a hypothetical additional utility that would do what we're discussing in this thread) already has them. Anyway, yes, that would go for the binaries as well --- we're pretty much saying the same thing :-) Carlos --
On Thu, 3 May 2007, Carlos Moreno wrote: >> > been just being naive) --- I can't remember the exact name, but I >> > remember >> > using (on some Linux flavor) an API call that fills a struct with data >> > on the >> > resource usage for the process, including CPU time; I assume measured >> > with precision (that is, immune to issues of other applications running >> > simultaneously, or other random events causing the measurement to be >> > polluted by random noise). >> >> since what we are looking for here is a reasonable first approximation, >> not perfection I don't think we should worry much about pollution of the >> value. > > Well, it's not as much worrying as it is choosing the better among two > equally > difficult options --- what I mean is that obtaining the *real* resource usage > as > reported by the kernel is, from what I remember, equally hard as it is > obtaining > the time with milli- or micro-seconds resolution. > > So, why not choosing this option? (in fact, if we wanted to do it "the > scripted > way", I guess we could still use "time test_cpuspeed_loop" and read the > report > by the command time, specifying CPU time and system calls time. I don't think it's that hard to get system time to a reasonable level (if this config tuner needs to run for a min or two to generate numbers that's acceptable, it's only run once) but I don't think that the results are really that critical. do we really care if the loop runs 1,000,000 times per second or 1,001,000 times per second? I'd argue that we don't even care about 1,000,000 times per second vs 1,100,000 times per second, what we care about is 1,000,000 times per second vs 100,000 times per second, if you do a 10 second test and run it for 11 seconds you are still in the right ballpark (i.e. close enough that you really need to move to the stage2 tuneing to figure the exact values) >> > As for 32/64 bit --- doesn't PG already know that information? I mean, >> > ./configure does gather that information --- does it not? >> >> we're not talking about comiling PG, we're talking about getting sane >> defaults for a pre-compiled binary. if it's a 32 bit binary assume a 32 >> bit cpu, if it's a 64 bit binary assume a 64 bit cpu (all hardcoded into >> the binary at compile time) > > Right --- I was thinking that configure, which as I understand, generates the > Makefiles to compile applications including initdb, could plug those values > as compile-time constants, so that initdb (or a hypothetical additional > utility > that would do what we're discussing in this thread) already has them. > Anyway, > yes, that would go for the binaries as well --- we're pretty much saying the > same thing :-) I'm thinking along the lines of a script or pre-compiled binary (_not_ initdb) that you could run and have it generate a new config file that has values that are at within about an order of magnatude of being correct. David Lang
> I don't think it's that hard to get system time to a reasonable level > (if this config tuner needs to run for a min or two to generate > numbers that's acceptable, it's only run once) > > but I don't think that the results are really that critical. Still --- this does not provide a valid argument against my claim. Ok, we don't need precision --- but do we *need* to have less precision?? I mean, you seem to be proposing that we deliberately go out of our way to discard a solution with higher precision and choose the one with lower precision --- just because we do not have a critical requirement for the extra precision. That would be a valid argument if the extra precision came at a considerable cost (well, or at whatever cost, considerable or not). But my point is still that obtaining the time in the right ballpark and obtaining the time with good precision are two things that have, from any conceivable point of view (programming effort, resources consumption when executing it, etc. etc.), the exact same cost --- why not pick the one that gives us the better results? Mostly when you consider that: > I'd argue that we don't even care about 1,000,000 times per second vs > 1,100,000 times per second, what we care about is 1,000,000 times per > second vs 100,000 times per second Part of my claim is that measuring real-time you could get an error like this or even a hundred times this!! Most of the time you wouldn't, and definitely if the user is careful it would not happen --- but it *could* happen!!! (and when I say could, I really mean: trust me, I have actually seen it happen) Why not just use an *extremely simple* solution that is getting information from the kernel reporting the actual CPU time that has been used??? Of course, this goes under the premise that in all platforms there is such a simple solution like there is on Linux (the exact name of the API function still eludes me, but I have used it in the past, and I recall that it was just three or five lines of code). Carlos --
On Thu, 3 May 2007, Carlos Moreno wrote: >> I don't think it's that hard to get system time to a reasonable level (if >> this config tuner needs to run for a min or two to generate numbers that's >> acceptable, it's only run once) >> >> but I don't think that the results are really that critical. > > Still --- this does not provide a valid argument against my claim. > > Ok, we don't need precision --- but do we *need* to have less > precision?? I mean, you seem to be proposing that we deliberately > go out of our way to discard a solution with higher precision and > choose the one with lower precision --- just because we do not > have a critical requirement for the extra precision. > > That would be a valid argument if the extra precision came at a > considerable cost (well, or at whatever cost, considerable or not). the cost I am seeing is the cost of portability (getting similarly accruate info from all the different operating systems) > But my point is still that obtaining the time in the right ballpark > and obtaining the time with good precision are two things that > have, from any conceivable point of view (programming effort, > resources consumption when executing it, etc. etc.), the exact > same cost --- why not pick the one that gives us the better results? > > Mostly when you consider that: > >> I'd argue that we don't even care about 1,000,000 times per second vs >> 1,100,000 times per second, what we care about is 1,000,000 times per >> second vs 100,000 times per second > > Part of my claim is that measuring real-time you could get an > error like this or even a hundred times this!! Most of the time > you wouldn't, and definitely if the user is careful it would not > happen --- but it *could* happen!!! (and when I say could, I > really mean: trust me, I have actually seen it happen) if you have errors of several orders of magnatude in the number of loops it can run in a given time period then you don't have something that you can measure to any accuracy (and it wouldn't matter anyway, if your loops are that variable, your code execution would be as well) > Why not just use an *extremely simple* solution that is getting > information from the kernel reporting the actual CPU time that > has been used??? > > Of course, this goes under the premise that in all platforms there > is such a simple solution like there is on Linux (the exact name > of the API function still eludes me, but I have used it in the past, > and I recall that it was just three or five lines of code). I think the problem is that it's a _different_ 3-5 lines of code for each OS. if I'm wrong and it's the same for the different operating systems then I agree that we should use the most accurate clock we can get. I just don't think we have that. David Lang
>> That would be a valid argument if the extra precision came at a >> considerable cost (well, or at whatever cost, considerable or not). > > the cost I am seeing is the cost of portability (getting similarly > accruate info from all the different operating systems) Fair enough --- as I mentioned, I was arguing under the premise that there would be a quite similar solution for all the Unix-flavours (and hopefully an equivalent --- and equivalently simple --- one for Windows) ... Whether or not that premise holds, I wouldn't bet either way. >> error like this or even a hundred times this!! Most of the time >> you wouldn't, and definitely if the user is careful it would not >> happen --- but it *could* happen!!! (and when I say could, I >> really mean: trust me, I have actually seen it happen) > Part of my claim is that measuring real-time you could get an > > if you have errors of several orders of magnatude in the number of > loops it can run in a given time period then you don't have something > that you can measure to any accuracy (and it wouldn't matter anyway, > if your loops are that variable, your code execution would be as well) Not necessarily --- operating conditions may change drastically from one second to the next; that does not mean that your system is useless; simply that the measuring mechanism is way too vulnerable to the particular operating conditions at the exact moment it was executed. I'm not sure if that was intentional, but you bring up an interesting issue --- or in any case, your comment made me drastically re-think my whole argument: do we *want* to measure the exact speed, or rather the effective speed under normal operating conditions on the target machine? I know the latter is almost impossible --- we're talking about an estimate of a random process' parameter (and we need to do it in a short period of time) ... But the argument goes more or less like this: if you have a machine that runs at 1000 MIPS, but it's usually busy running things that in average consume 500 of those 1000 MIPS, would we want PG's configuration file to be obtained based on 1000 or based on 500 MIPS??? After all, the CPU is, as far as PostgreSQL will be able see, 500 MIPS fast, *not* 1000. I think I better stop, if we want to have any hope that the PG team will ever actually implement this feature (or similar) ... We're probably just scaring them!! :-) Carlos --
On Thu, 3 May 2007, Carlos Moreno wrote: > >> > error like this or even a hundred times this!! Most of the time >> > you wouldn't, and definitely if the user is careful it would not >> > happen --- but it *could* happen!!! (and when I say could, I >> > really mean: trust me, I have actually seen it happen) >> Part of my claim is that measuring real-time you could get an >> >> if you have errors of several orders of magnatude in the number of loops >> it can run in a given time period then you don't have something that you >> can measure to any accuracy (and it wouldn't matter anyway, if your loops >> are that variable, your code execution would be as well) > > Not necessarily --- operating conditions may change drastically from > one second to the next; that does not mean that your system is useless; > simply that the measuring mechanism is way too vulnerable to the > particular operating conditions at the exact moment it was executed. > > I'm not sure if that was intentional, but you bring up an interesting > issue --- or in any case, your comment made me drastically re-think > my whole argument: do we *want* to measure the exact speed, or > rather the effective speed under normal operating conditions on the > target machine? > > I know the latter is almost impossible --- we're talking about an estimate > of a random process' parameter (and we need to do it in a short period > of time) ... But the argument goes more or less like this: if you have a > machine that runs at 1000 MIPS, but it's usually busy running things > that in average consume 500 of those 1000 MIPS, would we want PG's > configuration file to be obtained based on 1000 or based on 500 MIPS??? > After all, the CPU is, as far as PostgreSQL will be able see, 500 MIPS > fast, *not* 1000. > > I think I better stop, if we want to have any hope that the PG team will > ever actually implement this feature (or similar) ... We're probably just > scaring them!! :-) simpler is better (or perfect is the enemy of good enough) if you do your sample over a few seconds (or few tens of seconds) things will average out quite a bit. the key is to be going for a reasonable starting point. after that then the full analysis folks can start in with all their monitoring and tuneing, but the 80/20 rule really applies here. 80% of the gain is from getting 'fairly close' to the right values, and that should only be 20% of the full 'tuneing project' David Lang
On Thu, 3 May 2007, Josh Berkus wrote: > So any attempt to determine "how fast" a CPU is, even on a 1-5 scale, > requires matching against a database of regexes which would have to be > kept updated. This comment, along with the subsequent commentary today going far astray into CPU measurement land, serves as a perfect example to demonstrate why I advocate attacking this from the perspective that assumes there is already a database around we can query. We don't have to care how fast the CPU is in any real terms; all we need to know is how many of them are (which as you point out is relatively easy to find), and approximately how fast each one of them can run PostgreSQL. Here the first solution to this problem I came up with in one minute of R&D: -bash-3.00$ psql postgres=# \timing Timing is on. postgres=# select count(*) from generate_series(1,100000,1); count -------- 100000 (1 row) Time: 106.535 ms There you go, a completely cross-platform answer. You should run the statement twice and only use the second result for better consistancy. I ran this on all the sytems I was around today and got these results: P4 2.4GHz 107ms Xeon 3GHz 100ms Opteron 275 65ms Athlon X2 4600 61ms For comparison sake, these numbers are more useful at predicting actual application performance than Linux's bogomips number, which completely reverses the relative performance of the Intel vs. AMD chips in this set from the reality of how well they run Postgres. My philosophy in this area is that if you can measure something performance-related with reasonable accuracy, don't even try to estimate it instead. All you have to do is follow some of the downright bizzare dd/bonnie++ results people post here to realize that there can be a vast difference between the performance you'd expect given a particular hardware class and what you actually get. While I'm ranting here, I should mention that I also sigh every time I see people suggest we should ask the user how big their database is. The kind of newbie user people keep talking about helping has *no idea whatsoever* how big the data actually is after it gets into the database and all the indexes are built. But if you tell someone "right now this database has 1 million rows and takes up 800MB; what multiple of its current size do you expect it to grow to?", now that's something people can work with. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, May 04, 2007 at 12:33:29AM -0400, Greg Smith wrote: >-bash-3.00$ psql >postgres=# \timing >Timing is on. >postgres=# select count(*) from generate_series(1,100000,1); > count >-------- > 100000 >(1 row) > >Time: 106.535 ms > >There you go, a completely cross-platform answer. You should run the >statement twice and only use the second result for better consistancy. I >ran this on all the sytems I was around today and got these results: > >P4 2.4GHz 107ms >Xeon 3GHz 100ms >Opteron 275 65ms >Athlon X2 4600 61ms PIII 1GHz 265ms Opteron 250 39ms something seems inconsistent here. >For comparison sake, these numbers are more useful at predicting actual >application performance than Linux's bogomips number, which completely >reverses the relative performance of the Intel vs. AMD chips in this set >from the reality of how well they run Postgres. You misunderstand the purpose of bogomips; they have no absolute meaning, and a comparison between different type of cpus is not possible. >While I'm ranting here, I should mention that I also sigh every time I see >people suggest we should ask the user how big their database is. The kind >of newbie user people keep talking about helping has *no idea whatsoever* >how big the data actually is after it gets into the database and all the >indexes are built. 100% agreed. Mike Stone
david@lang.hm schrieb: > On Tue, 1 May 2007, Carlos Moreno wrote: > >>> large problem from a slog perspective; there is no standard way even >>> within Linux to describe CPUs, for example. Collecting available disk >>> space information is even worse. So I'd like some help on this >>> portion. >>> >> >> Quite likely, naiveness follows... But, aren't things like >> /proc/cpuinfo , >> /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very >> least across Linux distros? I'm not familiar with BSD or other Unix >> flavours, but I would expect these (or their equivalent) to exist in >> those, >> no? >> >> Am I just being naive? > > unfortunantly yes. > > across different linux distros they are fairly standard (however > different kernel versions will change them) > > however different kernels need drasticly different tools to get the > info from them. > > David Lang > Before inventing a hyper tool, we might consider to provide 3-5 example szenarios for common hardware configurations. This consumes less time and be discussed and defined in a couple of days. This is of course not the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system but these are probably not the target for default configurations. If we carefully document these szenario they would we a great help for people having some hardware "between" the szenarios. Sebastian Hennebrueder
Sebastian, > Before inventing a hyper tool, we might consider to provide 3-5 example > szenarios for common hardware configurations. This consumes less time > and be discussed and defined in a couple of days. This is of course not > the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system > but these are probably not the target for default configurations. That's been suggested a number of times, but some GUCs are really tied to the *exact* amount of RAM you have available. So I've never seen how "example configurations" could help. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus schrieb: > Sebastian, > > >> Before inventing a hyper tool, we might consider to provide 3-5 example >> szenarios for common hardware configurations. This consumes less time >> and be discussed and defined in a couple of days. This is of course not >> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system >> but these are probably not the target for default configurations. >> > > That's been suggested a number of times, but some GUCs are really tied to the > *exact* amount of RAM you have available. So I've never seen how "example > configurations" could help. > > I would define the szenario as 256 MB freely available for PostgresQL => setting x can be of size ...
On Fri, 4 May 2007, Michael Stone wrote: >> P4 2.4GHz 107ms >> Xeon 3GHz 100ms >> Opteron 275 65ms >> Athlon X2 4600 61ms > PIII 1GHz 265ms > Opteron 250 39ms > something seems inconsistent here. I don't see what you mean. The PIII results are exactly what I'd expect, and I wouldn't be surprised that your Opteron 250 has significantly faster memory than my two AMD samples (which are slower than average in this regard) such that it runs this particular task better. Regardless, the requirement here, as Josh put it, was to get a way to grade the CPUs on approximately a 1-5 scale. In that context, there isn't a need for an exact value. The above has 3 major generations of processors involved, and they sort out appropriately into groups; that's all that needs to happen here. > You misunderstand the purpose of bogomips; they have no absolute meaning, and > a comparison between different type of cpus is not possible. As if I don't know what the bogo stands for, ha! I brought that up because someone suggested testing CPU speed using some sort of idle loop. That's exactly what bogomips does. My point was that something that simple can give dramatically less useful results for predicting PostgreSQL performance than what you can find out running a real query. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Josh Berkus wrote: > Sebastian, > >> Before inventing a hyper tool, we might consider to provide 3-5 example >> szenarios for common hardware configurations. This consumes less time >> and be discussed and defined in a couple of days. This is of course not >> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system >> but these are probably not the target for default configurations. > > That's been suggested a number of times, but some GUCs are really tied to the > *exact* amount of RAM you have available. So I've never seen how "example > configurations" could help. > I'm not convinced about this objection - having samples gives a bit of a heads up on *what* knobs you should at least look at changing. Also it might be helpful on the -general or -perf lists to be able to say "try config 3 (or whatever we call 'em) and see what changes..." I've certainly found the sample config files supplied with that database whose name begins with 'M' a useful *start* when I want something better than default... Cheers Mark
On Fri, May 04, 2007 at 09:07:53PM -0400, Greg Smith wrote: > As if I don't know what the bogo stands for, ha! I brought that up > because someone suggested testing CPU speed using some sort of idle loop. > That's exactly what bogomips does. Just for reference (I'm sure you know, but others might not): BogoMIPS are measured using a busy loop because that's precisely the number the kernel is interested in -- it's used to know for how long to loop when doing small delays in the kernel (say, "sleep 20 microseconds"), which is done using a busy loop. /* Steinar */ -- Homepage: http://www.sesse.net/
Mark Kirkwood schrieb: > Josh Berkus wrote: >> Sebastian, >> >>> Before inventing a hyper tool, we might consider to provide 3-5 example >>> szenarios for common hardware configurations. This consumes less time >>> and be discussed and defined in a couple of days. This is of course not >>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system >>> but these are probably not the target for default configurations. >> >> That's been suggested a number of times, but some GUCs are really >> tied to the *exact* amount of RAM you have available. So I've never >> seen how "example configurations" could help. >> > > I'm not convinced about this objection - having samples gives a bit of > a heads up on *what* knobs you should at least look at changing. > > Also it might be helpful on the -general or -perf lists to be able to > say "try config 3 (or whatever we call 'em) and see what changes..." > > I've certainly found the sample config files supplied with that > database whose name begins with 'M' a useful *start* when I want > something better than default... > > Cheers > > Mark > Some ideas about szenarios and setting. This is meant as a discussion proposal, I am by far not a database guru! The settings do not provide a perfect setup but a more efficient as compared to default setup. criterias: free memory cpu ? what is the consequence? separate spindels total connections Windows/linux/soloars ? adapted settings: max_connections shared_buffers effective_cache_size /work_mem //maintenance_work_mem /checkpoint_segments ? checkpoint_timeout ? checkpoint_warning ? Szenario a) 256 MB free memory, one disk or raid where all disks are in the raid, max_connections = 40 shared_buffers = 64MB effective_cache_size = 180 MB /work_mem = 1 MB //maintenance_work_mem = 4 MB / Szenario b) 1024 MB free memory, one disk or raid where all disks are in the raid max_connections = 80 shared_buffers = 128 MB effective_cache_size = 600 MB /work_mem = 1,5 MB //maintenance_work_mem = 16 MB / Szenario c) 2048 MB free memory, one disk or raid where all disks are in the raid max_connections = 160 shared_buffers = 256 MB effective_cache_size = 1200 MB /work_mem = 2 MB //maintenance_work_mem = 32 MB / Szenario d) 2048 MB free memory, raid of multiple discs, second raid or disk max_connections = 160 shared_buffers = 256 MB effective_cache_size = 1200 MB /work_mem = 2 MB/ /maintenance_work_mem = 32 MB /WAL on second spindle
On May 4, 2007, at 12:11 PM, Josh Berkus wrote: > Sebastian, >> Before inventing a hyper tool, we might consider to provide 3-5 >> example >> szenarios for common hardware configurations. This consumes less time >> and be discussed and defined in a couple of days. This is of >> course not >> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 >> system >> but these are probably not the target for default configurations. > > That's been suggested a number of times, but some GUCs are really > tied to the > *exact* amount of RAM you have available. So I've never seen how > "example > configurations" could help. Uh... what GUCs are that exacting on the amount of memory? For a decent, base-line configuration, that is. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Only some problems that come to my mind with this: a) Hardware is sometimes changed underhand without telling the customer. Even for server-level hardware. (Been there.) b) Hardware recommendations would get stale quickly. What use is a hardware spec that specifies some versions of Xeons, when the supply dries up. (the example is not contrived, certain versions of PG and Xeons with certain usage patterns don't work that well. google for context switch storms) c) All that is depending upon the PG version too, so with every new version somebody would have to reverify that the recommendations are still valid. (Big example, partitioned tables got way better supported in recent versions. So a setup that anticipated Seqscans over big tables might suddenly perform way better. OTOH, there are some regressions performance wise sometimes) d) And to add insult to this, all that tuning (hardware and software side) is sensitive to your workload. Before you start yelling, well, have you ever rolled back an application version, because you notice what stupidities the developers have added. (And yes you can try to avoid this by adding better staging to your processes, but it's really really hard to setup a staging environment that has the same performance characteristics as production.) So, while it's a nice idea to have a set of recommended hardware setups, I don't see much of a point. What makes a sensible database server is not exactly a secret. Sizing slightly harder. And after that one enters the realm of fine tuning the complete system. That does not end at the socket on port 5432. Andreas Jim Nasby wrote: > On May 4, 2007, at 12:11 PM, Josh Berkus wrote: >> Sebastian, >>> Before inventing a hyper tool, we might consider to provide 3-5 example >>> szenarios for common hardware configurations. This consumes less time >>> and be discussed and defined in a couple of days. This is of course not >>> the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system >>> but these are probably not the target for default configurations. >> >> That's been suggested a number of times, but some GUCs are really tied >> to the >> *exact* amount of RAM you have available. So I've never seen how >> "example >> configurations" could help. > > Uh... what GUCs are that exacting on the amount of memory? For a decent, > base-line configuration, that is. > -- > Jim Nasby jim@nasby.net > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGPX5aHJdudm4KnO0RAorYAJ9XymZy+pp1oHEQUu3VGB7G2G2cSgCfeGaU X2bpEq3aM3tzP4MYeR02D6U= =vtPy -----END PGP SIGNATURE-----