Thread: PostgreSQL Configuration Tool for Dummies
Below is a link to the HTML JavaScript configuration page I am creating:
http://www.webservices.uiuc.edu/postgresql/
I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.
Memory
There are many different ways to ask about memory. Rather than ask a series of questions I went with a single question, #2. If it is better to ask about the memory in a series of questions then please give me the questions you would ask and why you would ask each of them. From my understanding the primary memory issue as it relates to PostgreSQL is “how much memory is available to PostgreSQL”. Remember that this needs to be as simple as possible.
My next step is to list the PostgreSQL parameters found in the postgresql.conf file and how I will generate their values based on the questions I have so far. I will primarily focus on PostgreSQL 8.2.x. Once I have a consensus from everyone then I will put functionality behind the “Generate Suggested Settings” button.
Thanks for all of the feedback,
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
Below is a link to the HTML JavaScript configuration page I am creating:
http://www.webservices.uiuc.edu/postgresql/
I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.
Memory
There are many different ways to ask about memory. Rather than ask a series of questions I went with a single question, #2. If it is better to ask about the memory in a series of questions then please give me the questions you would ask and why you would ask each of them. From my understanding the primary memory issue as it relates to PostgreSQL is "how much memory is available to PostgreSQL". Remember that this needs to be as simple as possible.
My next step is to list the PostgreSQL parameters found in the postgresql.conf file and how I will generate their values based on the questions I have so far. I will primarily focus on PostgreSQL 8.2.x. Once I have a consensus from everyone then I will put functionality behind the "Generate Suggested Settings" button.
Thanks for all of the feedback,
Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
Simply awesome!
--
Yudhvir Singh Sidhu
408 375 3134 cell
Campbell, Lance writes: For the "6) Are your searches:" How about having "many simple"
On Tue, 19 Jun 2007, Campbell, Lance wrote: > Memory > > There are many different ways to ask about memory. Rather than ask a > series of questions I went with a single question, #2. If it is better > to ask about the memory in a series of questions then please give me the > questions you would ask and why you would ask each of them. From my > understanding the primary memory issue as it relates to PostgreSQL is > "how much memory is available to PostgreSQL". Remember that this needs > to be as simple as possible. there are three catagories of memory useage 1. needed by other software 2. available for postgres 3. needed by the OS it's not clear if what you are asking is #2 or a combination of #2 and #3 IMHO you should ask for #2 and #3, possibly along the lines of "how much memory is in the machine that isn't already used by other applications" David Lang
> there are three catagories of memory useage > > 1. needed by other software > 2. available for postgres > 3. needed by the OS There's actually only two required memory questions: M1) How much RAM do you have on this machine? M2) Is this: () Dedicated PostgreSQL Server? () Server shared with a few other applications? () Desktop? I don't think the "mostly reads / mostly writes" question covers anything, nor is it likely to produce accurate answers. Instead, we need to ask the users to characterize what type of application they are running: T1) Please characterize the general type of workload you will be running on this database. Choose one of the following four: () WEB: any scripting-language application which mainly needs to support 90% or more data reads, and many rapid-fire small queries over a large number of connections. Examples: forums, content management systems, directories. () OLTP: this application involves a large number of INSERTs, UPDATEs and DELETEs because most users are modifying data instead of just reading it. Examples: accounting, ERP, logging tools, messaging engines. () Data Warehousing: also called "decision support" and "BI", these database support a fairly small number of large, complicated reporting queries, very large tables, and large batch data loads. () Mixed/Other: if your application doesn't fit any of the above, our script will try to pick "safe, middle-of-the-road" values. Hmmm, drop question (6) too. (2) should read: "What is the maximum number of database connections which you'll need to support? If you don't know, we'll pick a default." Other questions we need: How many/how fast processors do you have? Pick the option which seems closest to what you have: () A single laptop processor () Single or dual older processors (1ghz) () Dual or quad current consumer processors (2ghz+) () Large, recent multi-core server system "What OS Are You Using", of course, needs to have Linux, Solaris, BSD, OSX and Windows. At some point, this tool will also need to generate for the user any shmem settings that they need to make on the OS. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Tue, 19 Jun 2007, Josh Berkus wrote: > "What OS Are You Using", of course, needs to have Linux, Solaris, BSD, OSX > and Windows. At some point, this tool will also need to generate for the > user any shmem settings that they need to make on the OS. I also noticed that on FreeBSD (6.2) at least the stock config simply won't run without building a new kernel that bumps up all the SHM stuff or dropping down resource usage in the postgres config... Overall, I like the idea. I've been slowly working on weaning myself off of mysql and I think removing any roadblocks that new users might stumble upon seems like an excellent way to get more exposure. Charles > -- > --Josh > > Josh Berkus > PostgreSQL @ Sun > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
On Tue, 19 Jun 2007, Josh Berkus wrote: > I don't think the "mostly reads / mostly writes" question covers anything, > nor is it likely to produce accurate answers. Instead, we need to ask the > users to characterize what type of application they are running: > T1) Please characterize the general type of workload you will be running on > this database. Choose one of the following four... We've hashed through this area before, but for Lance's benefit I'll reiterate my dissenting position on this subject. If you're building a "tool for dummies", my opinion is that you shouldn't ask any of this information. I think there's an enormous benefit to providing something that takes basic sizing information and gives conservative guidelines based on that--as you say, "safe, middle-of-the-road values"--that are still way, way more useful than the default values. The risk in trying to make a complicated tool that satisfies all the users Josh is aiming his more sophisticated effort at is that you'll lose the newbies. Scan the archives of this mailing list for a bit. If you look at what people discover they've being nailed by, it's rarely because they need to optimize something like random_page_cost. It's usually because they have a brutally wrong value for one of the memory or vacuum parameters that are very easy to provide reasonable suggestions for without needing a lot of information about the server. I wouldn't even bother asking how many CPUs somebody has for what Lance is building. The kind of optimizations you'd do based on that are just too complicated to expect a tool to get them right and still be accessible to a novice. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Tue, 19 Jun 2007, Josh Berkus wrote: > >> I don't think the "mostly reads / mostly writes" question covers >> anything, >> nor is it likely to produce accurate answers. Instead, we need to >> ask the >> users to characterize what type of application they are running: >> T1) Please characterize the general type of workload you will be >> running on >> this database. Choose one of the following four... > > We've hashed through this area before, but for Lance's benefit I'll > reiterate my dissenting position on this subject. If you're building > a "tool for dummies", my opinion is that you shouldn't ask any of this > information. I think there's an enormous benefit to providing > something that takes basic sizing information and gives conservative > guidelines based on that--as you say, "safe, middle-of-the-road > values"--that are still way, way more useful than the default values. > The risk in trying to make a complicated tool that satisfies all the > users Josh is aiming his more sophisticated effort at is that you'll > lose the newbies. Generally I agree, however, how about a first switch, for beginner / intermediate / advanced. The choice you make determines how much detail we ask you about your setup. Beginners get two or three simple questions, intermediate a handful, and advanced gets grilled on everything. Then, just write the beginner and maybe intermediate to begin with and ghost out the advanced until it's ready.
On Thu, 21 Jun 2007, Scott Marlowe wrote: > Generally I agree, however, how about a first switch, for beginner / > intermediate / advanced. You're describing a perfectly reasonable approach for a second generation tool in this area. I think it would be very helpful for the user community to get a first generation one that works fairly well before getting distracted at all by things like this. The people capable of filling out the intermediate/advanced settings can probably just do a bit of reading and figure out most of what they should be doing themselves. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 2007-06-22 at 02:32 -0400, Greg Smith wrote: > On Thu, 21 Jun 2007, Scott Marlowe wrote: > > > Generally I agree, however, how about a first switch, for beginner / > > intermediate / advanced. > > You're describing a perfectly reasonable approach for a second generation > tool in this area. I think it would be very helpful for the user > community to get a first generation one that works fairly well before > getting distracted at all by things like this. The people capable of > filling out the intermediate/advanced settings can probably just do a bit > of reading and figure out most of what they should be doing themselves. Just as an aside; how come the installation/setup "Tutorial" section - http://www.postgresql.org/docs/8.2/interactive/tutorial-start.html - doesn't mention setting some rough reasonable defaults in postgresql.conf or even a reference to the parameter documentation section. It seems like such a reference should exist between - http://www.postgresql.org/docs/8.2/interactive/tutorial-arch.html - and - http://www.postgresql.org/docs/8.2/interactive/tutorial-accessdb.html At least something along those lines should be said at http://www.postgresql.org/docs/8.2/interactive/install-post.html Personally, as DBA for more than a decade, I've got 0 sympathy for people who setup a database but can't be bothered to read the documentation. But in the case of PostgreSQL the documentation could do a better job of driving users to even the existence [and importance of] postgresql.conf and routine maintenance techniques. http://www.postgresql.org/docs/8.2/interactive/runtime-config.html http://www.postgresql.org/docs/8.2/interactive/maintenance.html Seems to me that even a remake of something like - http://www.iiug.org/~waiug/old/forum2000/SQLTunning/sld001.htm - focused on PostgreSQL would be novel and very interesting. Just my two cents. PostgreSQL is awesome, BTW.
Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control
""Campbell, Lance"" <lance@uiuc.edu> wrote in message news:A3AC4FA47DC0B1458C3E5396E685E63302395E63@SAB-DC1.sab.uiuc.edu...Below is a link to the HTML JavaScript configuration page I am creating:
http://www.webservices.uiuc.edu/postgresql/
I had many suggestions. Based on the feedback I received, I put together the initial list of questions. This list of questions can be change.
Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control
On Fri, 22 Jun 2007, Sabin Coanda wrote: > Instead of (or in addition to) configure dozens of settings, what do you > say about a feedback adjustable control based on the existing system > statistics and parsing logs (e.g > http://pgfouine.projects.postgresql.org/index.html ) ? something like this would be useful for advanced tuneing, but the biggest problem is that it's so difficult to fingoure out a starting point. bad choices at the starting point can cause several orders of magnatude difference in the database performsnce. In addition we know that the current defaults are bad for just about everyone (we just can't decide what better defaults would be) this horrible starting point gives people a bad first impression that a simple tool like what's being discussed can go a long way towards solving. David Lang
On Fri, 22 Jun 2007, Adam Tauno Williams wrote: > Just as an aside; how come the installation/setup "Tutorial" section - > http://www.postgresql.org/docs/8.2/interactive/tutorial-start.html - > doesn't mention setting some rough reasonable defaults in > postgresql.conf or even a reference to the parameter documentation > section. I think that anyone who has been working with the software long to know what should go into such a section has kind of forgotten about this part of the documentation by the time they get there. It is an oversight and yours is an excellent suggestion. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 22 Jun 2007, Sabin Coanda wrote: > Instead of (or in addition to) configure dozens of settings, what do you > say about a feedback adjustable control based on the existing system > statistics and parsing logs Take a look at the archive of this list for the end of April/Early May. There's a thread there named "Feature Request --- was: PostgreSQL Performance Tuning" that addressed this subject in length I think you'll find interesting reading. I personally feel there's much more long-term potential for a tool that inspects the database, but the needs of something looking for getting good starting configuration file (before there necessarily is even a populated database) is different enough that it may justify building two different tools. I would suggest you or anything else building the starter configuration tool not stray from the path of getting the most important variables set to reasonable values. Trying to satisfy every possible user is the path that leads to a design so complicated that it's unlikely you'll ever get a finished build done at all. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Re: PostgreSQL Configuration Tool for Dummies - feedback adjustable control
<david@lang.hm> wrote in message news:Pine.LNX.4.64.0706221144290.14942@asgard.lang.hm... > On Fri, 22 Jun 2007, Sabin Coanda wrote: > >> Instead of (or in addition to) configure dozens of settings, what do you >> say about a feedback adjustable control based on the existing system >> statistics and parsing logs (e.g >> http://pgfouine.projects.postgresql.org/index.html ) ? > > something like this would be useful for advanced tuneing, but the biggest > problem is that it's so difficult to fingoure out a starting point. bad > choices at the starting point can cause several orders of magnatude > difference in the database performsnce. In addition we know that the > current defaults are bad for just about everyone (we just can't decide > what better defaults would be) > You are right. But an automatic tool beeing able to take decisions by different inputs, would be able to set a startup configuration too, based on the hw/sw environment, and interactive user requirements. > this horrible starting point gives people a bad first impression that a > simple tool like what's being discussed can go a long way towards solving. > Well, I think to an automatic tool, not an utopian application good for everything. For instance the existing automatic daemon have some abilities, bat not all of the VACUUM command. I'm realistic that good things may be done in steps, not once. I would be super happy if an available automatic configuration tool would be able to set for the beginning just the shared_buffers or max_fsm_pages based on the available memory. Adjustments can be done later. Regards, Sabin
Greg, > We've hashed through this area before, but for Lance's benefit I'll > reiterate my dissenting position on this subject. If you're building a > "tool for dummies", my opinion is that you shouldn't ask any of this > information. I think there's an enormous benefit to providing something > that takes basic sizing information and gives conservative guidelines > based on that--as you say, "safe, middle-of-the-road values"--that are > still way, way more useful than the default values. The risk in trying to > make a complicated tool that satisfies all the users Josh is aiming his > more sophisticated effort at is that you'll lose the newbies. The problem is that there are no "safe, middle-of-the-road" values for some things, particularly max_connections and work_mem. Particularly, there are very different conf profiles between reporting applications and OLTP/Web. We're talking about order-of-magnitude differences here, not just a few points. e.g.: Web app, typical machine: max_connections = 200 work_mem = 256kb default_statistics_target=100 autovacuum=on Reporting app, same machine: max_connections = 20 work_mem = 32mb default_statistics_target=500 autovacuum=off Possibly we could make the language of the "application type" selection less technical, but I don't see it as dispensible even for a basic tool. > I wouldn't even bother asking how many CPUs somebody has for what Lance is > building. The kind of optimizations you'd do based on that are just too > complicated to expect a tool to get them right and still be accessible to > a novice. CPUs affects the various cpu_cost parameters, but I can but the idea that this should only be part of the "advanced" tool. -- Josh Berkus PostgreSQL @ Sun San Francisco
On Tue, 26 Jun 2007, Josh Berkus wrote: > The problem is that there are no "safe, middle-of-the-road" values for some > things, particularly max_connections and work_mem. Your max_connections concern is one fact that haunts the idea of just giving out some sample configs for people. Lance's tool asks outright the expectation for max_connections which I think is the right thing to do. > Web app, typical machine: > work_mem = 256kb > default_statistics_target=100 > autovacuum=on > Reporting app, same machine: > work_mem = 32mb > default_statistics_target=500 > autovacuum=off I think people are stuck with actually learning a bit about work_mem whether they like it or not, because it's important to make it larger but we know going too high will be a problem with lots of connections doing sorts. As for turning autovacuum on/off and the stats target, I'd expect useful defaults for those would come out of how the current sample is asking about read vs. write workloads and expected database size. Those simple to understand questions might capture enough of the difference between your two types here. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg, > Your max_connections concern is one fact that haunts the idea of just > giving out some sample configs for people. Lance's tool asks outright the > expectation for max_connections which I think is the right thing to do. ... > I think people are stuck with actually learning a bit about work_mem > whether they like it or not, because it's important to make it larger but > we know going too high will be a problem with lots of connections doing > sorts. I find it extremely inconsistent that you want to select "middle-of-the-road" defaults for some values and ask users detailed questions for other values. Which are we trying to do, here? Given an "application type" selection, which is a question which can be written in easy-to-understand terms, these values can be set at reasonable defaults. In fact, for most performance tuning clients I had, we never actually looped back and tested the defaults by monitoring pg_temp, memstat and the log; performance was acceptable with the approximate values. > As for turning autovacuum on/off and the stats target, I'd expect useful > defaults for those would come out of how the current sample is asking > about read vs. write workloads and expected database size. Those simple > to understand questions might capture enough of the difference between > your two types here. Both of the questions you cite above are unlikely to result in accurate answers from users, and the read vs. write answer is actually quite useless except for the extreme cases (e.g. read-only or mostly-write). The deciding answer in turning autovacuum off is whether or not the user does large bulk loads / ETL operations, which autovac would interfere with. The fact that we can't expect an accurate answer on database size (except from the minority of users who already have a full production DB) will be a chronic stumbling block for any conf tool we build. Quite a number of settings want to know this figure: max_fsm_pages, maintenance_work_mem, max_freeze_age, etc. Question is, would order-of-magnitude answers be likely to have better results? i.e.: How large is your database expected to grow? [] Less than 100MB / thousands of rows [] 100mb to 1gb / tens to hundreds of thousands of rows [] 1 gb to 10 gb / millions of rows [] 10 gb to 100 gb / tens to hundreds of millions of rows [] 100 gb to 1 TB / billions of rows [] more than 1 TB / many billions of rows ... users might have better guesses within those rough ranges, and it would give us enough data to set rough values. -- Josh Berkus PostgreSQL @ Sun San Francisco
On Tue, 26 Jun 2007, Josh Berkus wrote: > I find it extremely inconsistent that you want to select "middle-of-the-road" > defaults for some values and ask users detailed questions for other values. > Which are we trying to do, here? I'd like to see people have a really simple set of questions to get them past the completely undersized initial configuration phase, then ship them toward resources to help educate about the parts that could be problematic for them based on what they do or don't know. I don't see an inconsistancy that I'd expect people to have a reasonable guess for max_connections, while also telling them that setting sort_mem is important, a middle value has been assigned, but a really correct setting isn't something they can expect the simple config tool to figure out for them; here's a pointer to the appropriate documentation to learn more. > The fact that we can't expect an accurate answer on database size > (except from the minority of users who already have a full production > DB) will be a chronic stumbling block for any conf tool we build. I'm still of the opinion that recommendations for settings like max_fsm_pages and maintenance_work_mem should come out of a different type of tool that connects to the database. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg, > I'd like to see people have a really simple set of questions to get them > past the completely undersized initial configuration phase, then ship them > toward resources to help educate about the parts that could be problematic > for them based on what they do or don't know. I don't see an > inconsistancy that I'd expect people to have a reasonable guess for > max_connections, while also telling them that setting sort_mem is > important, a middle value has been assigned, but a really correct setting > isn't something they can expect the simple config tool to figure out for > them; here's a pointer to the appropriate documentation to learn more. I disagree that this is acceptable, especially when we could set a better value using an easy-to-understand question. It's also been my experience (in 3 years of professional performance tuning) that most users *don't* have an accurate guess for max_connections. I'm really not clear on why you think "what flavor of application do you have?" is a difficult question. It's certainly one that my clients were able to answer easily. Overall, it seems like you're shooting for a conf tool which only really works for web apps, which isn't my personal goal or I think a good use of our time. > I'm still of the opinion that recommendations for settings like > max_fsm_pages and maintenance_work_mem should come out of a different type > of tool that connects to the database. Well, there's several steps to this: 1) Run conf tool when installing PG; 2) Run conf tool++ after application is first up and running; 3) Run conf tool++ after application has been in production The (1) tool should at least provide a configuration which isn't going to lead to long term issues. For example, dramatically underallocating fsm_pages can result in having to run VACUUM FULL and the associated downtime, so it's something we want to avoid at the outset. -- Josh Berkus PostgreSQL @ Sun San Francisco