Thread: Overhauling GUCS
Magnus and I had the fortune to share a plane flight when leaving pgCon. This gave us a chance to discuss my ideas aroundreforming PostgreSQL configuration at some length. Here's what we came up with (Magnus will presumably correct the parts I got wrong): Currently, PostgreSQL,conf and our set of GUC configurations suffer from 4 large problems: 1. Most people have no idea how to set these. 2. The current postgresql.conf file is a huge mess of 194 options, the vast majority of which most users will never touch. 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, and the settings.sgml), which are only synched with each other manually. 4. We don't seem to be getting any closer to autotuning. We think all of these are solvable for 8.4. We also think that it's vitally important than any overhaul of the GUCS to be completed in one version to minimize the pain involved. Here's a list of the things we want to change. It's all a package and should make sense if you take all the changes as a whole. 1) Add several pieces of extra information to guc.c in the form of extra "gettext" commands: default value, subcategory, long description, recommendations, enum lists. 2) Incorporate this data into pg_settings 3) Delete postgresql.conf.sample 4) Add a script called pg_generate_conf to generate a postgresql.conf based on guc.c and command-line switches (rather than postgresql.conf.sample), which would be called automatically by initdb. For (4), pg_generate_conf would take the following switches: -b , --basic = short conf file, listing only the 15-18 most commonly changed options -a , --advanced = conf file listing all 196+ options -t, --terse = conf file lists only category headings and actual settings, no comments -n, --normal = conf file has category and subcategory settings, with short desc comments -v, --verbose = conf file lists full descriptions and recommendations in comments with each option -c "option = value" set specific option to specific value in the file -f "filename" = take options and values from file "filename" The default would be "-b, -n" with specific settings for shared_buffers and wal_sync_method. The current postgresql.conf is a lot more like an "-a, -v" file. This would help us in the following ways: A. there would now only be 2 places to maintain GUCS lists, the docs and guc.c. B. by having a generated postgresql.conf and an easy way to generate it, writing autoconfiguration scripts (as well as shortcuts like SET PERSISTENT) become vastly easier. C. Most users would deal only with a limited set of 15-20 configuration variables. There's obviously some refinements needed, but what do people think of the above general idea?
On Fri, 30 May 2008, Josh Berkus wrote: > 1) Add several pieces of extra information to guc.c in the form of extra > "gettext" commands: default value, subcategory, long description, > recommendations, enum lists. > 2) Incorporate this data into pg_settings When you last brought this up in February (I started on a long reply to http://archives.postgresql.org/pgsql-hackers/2008-02/msg00759.php that I never quite finished) the thing I got stuck on was how to deal with the way people tend to comment in these files as they change things. One problem I didn't really see addressed by the improvements you're suggesting is how to handle migrating customized settings to a new version (I'm talking about 8.4->9.0 after this is in place, 8.3->8.4 is a whole different problem). It would be nice to preserve "history" of what people did like in your examples (which look exactly like what I find myself running into in the field). Now, that will get a lot easier just by virtue of having a smaller config file, but I think that adding something into pg_settings that allows saving user-added commentary would be a nice step toward some useful standardization on that side of things. It would make future automated tools aimed at parsing and generating new files, as part of things like version upgrades, a lot easier if there was a standard way such comments were handled in addition to the raw data itself. The other thing I'd like to see make its way into pg_settings, so that tools can operate on it just by querying the database, is noting what file the setting came from so that you can track things like include file usage. I think with those two additions (comments and source file tracking) it would even be concievable to clone a working facsimile of even a complicated postgresql.conf file set remotely just by reading pg_settings. While a bit outside of the part you're specifically aiming to improve here, if you could slip these two additions in I think it would be a boon to future writers of multi-server management tools as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 2008-05-30 at 17:34 -0700, Josh Berkus wrote: > There's obviously some refinements needed, but what do people think of > the above general idea? My initial reaction is that this is too much change, though I agree with many of the points and understand it is well intended. We have many supporters, though 90% of them seldom touch the database from one release to the next. Many are dismayed that every time they do we've fiddled with the knobs so some don't work anymore, some are missing or renamed and there's a few new ones. So if they don't know what they're doing it is frequently because we keep moving the goalposts. We should also consider that most people with multiple databases are running multiple versions of PostgreSQL. The main reason for that is that we keep changing the behaviour of SQL between releases, so even if you had a magic superfast upgrade utility, in perhaps 50% of cases they won't use it because they have to do a full application re-test, which costs time and money. Trying to be a Postgres DBA is hard when each new release is configured differently to the last one and we have a major release about 3-5 more frequently than Oracle/SQLServer. That is probably largest source of questions and annoyance from the students on the courses I run. So my viewpoint is that we should be aggressively adding new features, yet be conservative in changing existing behaviour: provide options for behaves-like-previous-release and keep the administrative interfaces as similar as possible between releases. If you wish to make changes, I would suggest that you simply reorder some of the parameters in the .conf file, so that the ones you think are important are grouped at the top. Another suggestion would be to allow a #include style interface within postgresql.conf. We can then do this: #include standard_postgresql.conf # now we put Josh's favourite GUCs as overrides on the above ... That keeps things simple because the standard_postgresql.conf looks almost identical to what people are used to. It also provides a new feature, allowing people to include site-wide defaults for various settings to allow easy implementation of local policy. It also demonstrates a best practice approach to managing GUCs. Some other problems I see with GUCs * It's not possible to set one parameter depending upon the setting of another. * It's always unclear which GUCs can be changed, and when. That is much more infrequently understood than the meaning of them. * We should rename effective_cache_size to something that doesn't sound like it does what shared_buffers does * There is no config verification utility, so if you make a change and then try to restart and it won't, you are in trouble. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Fri, 2008-05-30 at 17:34 -0700, Josh Berkus wrote: > We have many supporters, though 90% of them seldom touch the database > from one release to the next. Many are dismayed that every time they do > we've fiddled with the knobs so some don't work anymore, some are > missing or renamed and there's a few new ones. I certainly agree we don't want to make it more difficult. > > So if they don't know > what they're doing it is frequently because we keep moving the > goalposts. No, its because they don't read the docs. We are not talking about dumping SQL here :) I don't feel it is correct to not perform such a needed cleanup for fear that some user can't be bothered to read the documentation. > We should also consider that most people with multiple databases are > running multiple versions of PostgreSQL. The main reason for that is > that we keep changing the behaviour of SQL between releases, so even if > you had a magic superfast upgrade utility, in perhaps 50% of cases they > won't use it because they have to do a full application re-test, which > costs time and money. This could certainly be a problem. > > Trying to be a Postgres DBA is hard when each new release is configured > differently to the last one and we have a major release about 3-5 more > frequently than Oracle/SQLServer. That is probably largest source of > questions and annoyance from the students on the courses I run. That is the nature of the beast though. We are still learning, improving, engineering. It's part of progress of the database. I would suggest that your students (which is what I tell mine) not upgrade every release but instead try hanging out for 3 years per release. > > So my viewpoint is that we should be aggressively adding new features, > yet be conservative in changing existing behaviour: provide options for > behaves-like-previous-release and keep the administrative interfaces as > similar as possible between releases. > I agree with this in principle but not on this argument. > If you wish to make changes, I would suggest that you simply reorder > some of the parameters in the .conf file, so that the ones you think are > important are grouped at the top. IMO, the only settings that should be in the postgresql.conf are the ones that require a restart. The rest should be gathered from pg_settings. > > Another suggestion would be to allow a #include style interface within > postgresql.conf. We can then do this: > > #include standard_postgresql.conf > > # now we put Josh's favourite GUCs as overrides on the above > ... I think that could get very messy but Apache allows this. It isn't exactly a new idea and I wouldn't fight against it. > Some other problems I see with GUCs > > * It's not possible to set one parameter depending upon the setting of > another. That is getting a bit more complicated than I think we need. Unless I am misunderstanding what you are saying. > > * It's always unclear which GUCs can be changed, and when. That is much > more infrequently understood than the meaning of them. > This is a definite problem. Even the docs are a bit difficult on this one. It should be a nice simple grid :) or like I said above, only goes in the conf if requires a restart. Hmm, SET effective_cache_size = '5000MB'; WARNING: You must issue a reload to the database for this to take effect. > * We should rename effective_cache_size to something that doesn't sound > like it does what shared_buffers does Hmmm, I wonder if it is not the other way around. Although I know that one thing I run into is that a lot of people think effective_cache is an allocation. shared_buffer_alloc? shared_mem? > > * There is no config verification utility, so if you make a change and > then try to restart and it won't, you are in trouble. > +1 +1 Sincerely, Joshua D. Drake
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Many are dismayed that every time they do we've fiddled with the > knobs so some don't work anymore, some are missing or renamed and > there's a few new ones. ... > * We should rename effective_cache_size *raises eyebrow* - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200805311114 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhBa3UACgkQvJuQZxSWSsg6pACgu2EJOyzj+ik79QwzdTTfi8Qj aW0AoNayz4sniFVFxnQFgnxh7dPB6QOD =Uyap -----END PGP SIGNATURE-----
Josh Berkus <josh@agliodbs.com> writes: > Currently, PostgreSQL,conf and our set of GUC configurations suffer from > 4 large problems: > 1. Most people have no idea how to set these. > 2. The current postgresql.conf file is a huge mess of 194 options, the > vast majority of which most users will never touch. > 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, > and the settings.sgml), which are only synched with each other manually. > 4. We don't seem to be getting any closer to autotuning. The proposal doesn't actually solve any of those problems. > Here's a list of the things we want to change. It's all a package and > should make sense if you take all the changes as a whole. > 1) Add several pieces of extra information to guc.c in the form of extra > "gettext" commands: default value, subcategory, long description, > recommendations, enum lists. > 2) Incorporate this data into pg_settings > 3) Delete postgresql.conf.sample > 4) Add a script called pg_generate_conf to generate a postgresql.conf > based on guc.c and command-line switches (rather than > postgresql.conf.sample), which would be called automatically by initdb. I disagree with doing any of this. It doesn't result in any useful reduction in maintenance effort, and what it does do is make it impossible to keep control over the detailed layout, formatting, commenting etc in a sample postgresql.conf. Nor do I think that "generate a whole config file from scratch" is going to be a useful behavior for tuning problems --- how will you merge it with what you had before? regards, tom lane
On May 31, 2008, at 09:23, Tom Lane wrote: >> 1. Most people have no idea how to set these. >> 2. The current postgresql.conf file is a huge mess of 194 options, >> the >> vast majority of which most users will never touch. >> 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, >> and the settings.sgml), which are only synched with each other >> manually. >> 4. We don't seem to be getting any closer to autotuning. > > The proposal doesn't actually solve any of those problems. It solves #2 at least. > I disagree with doing any of this. It doesn't result in any useful > reduction in maintenance effort, and what it does do is make it > impossible to keep control over the detailed layout, formatting, > commenting etc in a sample postgresql.conf. Nor do I think that > "generate a whole config file from scratch" is going to be a useful > behavior for tuning problems --- how will you merge it with what > you had before? I'd love to see these issues resolved. The current postgresql.conf is way over the top. Might you have a better idea? Thanks, David
Simon, Tom, Greg, > Now, that will get a lot easier just by > virtue of having a smaller config file, but I think that adding something > into pg_settings that allows saving user-added commentary would be a nice > step toward some useful standardization on that side of things. It would > make future automated tools aimed at parsing and generating new files, as > part of things like version upgrades, a lot easier if there was a standard > way such comments were handled in addition to the raw data itself. Hmmm. What about a COMMENT ON SETTING? That seems like it would serve the purpose ... and make preserving user comments easier than the current file-conversion approach. > The other thing I'd like to see make its way into pg_settings, so that > tools can operate on it just by querying the database, is noting what file > the setting came from so that you can track things like include file > usage. I think with those two additions (comments and source file > tracking) it would even be concievable to clone a working facsimile of > even a complicated postgresql.conf file set remotely just by reading > pg_settings. Hmmm. Any idea how to do this? It sounds like a good idea to me. > So my viewpoint is that we should be aggressively adding new features, > yet be conservative in changing existing behaviour: provide options for > behaves-like-previous-release and keep the administrative interfaces as > similar as possible between releases. It's my viewpoint based on a lot of user feedback that the current postgresql.conf is fundamentally broken and a major roadblock to PostgreSQL adoption. This was a point with which there was pretty much universal agreement when I talked with people at pgCon. That is, I beleive that you're arguing for keeping .conf stable for the 5% of users who understand it and ignoring the 95% of users who are baffled by it. At this point, I think we are the only major SQL database without some form of basic autotuning for the most significant settings (certainly MySQL, Oracle, and SQL Server have it); we've been able to coast through that because autotuning features are new in other DBs, but it's going to start to hurt us pretty soon. Now, I do believe that we should plan any GUCS overhaul to happen in one postgresql version rather than phasing it in over multiple versions so that administrators only need to get used to a new way once. > > Currently, PostgreSQL,conf and our set of GUC configurations suffer from > > 4 large problems: > > > > 1. Most people have no idea how to set these. > > 2. The current postgresql.conf file is a huge mess of 194 options, the > > vast majority of which most users will never touch. > > 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, > > and the settings.sgml), which are only synched with each other manually. > > 4. We don't seem to be getting any closer to autotuning. > > The proposal doesn't actually solve any of those problems. OK, let me draw some lines: 1 & 2) by not having the settings be defined in a 500-line file, new users would no longer be baffled by scores of settings which probably don't concern them, trying to find the handful of settings which do. 3) We'd consolidate the GUC lists down from 3 places to 2, which is one less area to synchronize. Magnus and I looked to see if it might be possible to generate the docs from the same list, but it's not practical. 4) By shifting from a model where postgresql.conf is document-formatted and hand-edited to one where it's machine generated, it becomes vastly easier to write simple utilities to manage these settings. Right now, the big "obstacle" to things like SET PERSISTENT is "how to we preseve the hand-edited comments in the file" -- and the answer is we *don't.* > I disagree with doing any of this. It doesn't result in any useful > reduction in maintenance effort, and what it does do is make it > impossible to keep control over the detailed layout, formatting, > commenting etc in a sample postgresql.conf. Have you *looked* at postgresql.conf.sample lately, Tom? It's a disaster. Maintenance is already difficult, and becoming more so as we add settings. Further, you and Simon seem to think that the current "narrative docs inside the conf file" format has some kind of value which makes things easier for DBAs. I don't believe it does, and I have yet to meet a *single* new PostgreSQL user who wasn't confused and intimidated by the file. > Nor do I think that > "generate a whole config file from scratch" is going to be a useful > behavior for tuning problems --- how will you merge it with what > you had before? Who's merging? I don't think you get the proposal. The whole "hand-edited" approach to postgresql.conf should go away. It's not useful, it's not educational, and it doesn't make PostgreSQL easy to manage. Further, the lack of an easy way to manage settings via port access to PostgreSQL is a significant inhibitor to adopting PostgreSQL in environments with large numbers of servers. See prior arguments by the CPANEL folks about why they don't support PostgreSQL, which is in turn a major reason why PostgreSQL web hosting is hard to find. I agree that editing the data about settings in the guc.c file is not ideal; Magnus and I discussed that mainly because we wanted to preserve the translation framework with gettext strings. If someone can think of a better way to do this part, I'm all ears. -- Josh Berkus PostgreSQL @ Sun San Francisco
"David E. Wheeler" <david@kineticode.com> writes: > I'd love to see these issues resolved. The current postgresql.conf is way over > the top. Might you have a better idea? I don't think fiddling with surface issues like the formatting of the postgresql.conf is productive. Hiding parameters because you don't think beginners need them is only going to frustrate those people who do need to adjust them. What might be productive is picking up a group of parameters and thinking hard about what they mean in terms of user-visible real-world effects. If they can be recast in terms of behaviour the user wants instead of internal implementation details then that would translate into a massive simplification as well as being easier to explain to users. I think we do a pretty good job of this already. Witness things like effective_cache_size -- imagine if this were "nested_loop_cache_hit_rate" for example, good luck figuring out what to set it to. The vacuum cost delay factors are probably ripe for such a recast though. I think we need just one parameter "vacuum_io_bandwidth" or something like that. The bgwriter parameters might also be a candidate but I'm less certain. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
"Josh Berkus" <josh@agliodbs.com> writes: > It's my viewpoint based on a lot of user feedback that the current > postgresql.conf is fundamentally broken and a major roadblock to PostgreSQL > adoption. This was a point with which there was pretty much universal > agreement when I talked with people at pgCon. Actually as a new DBA when I was first starting out with Postgres I found it very convenient to have all the common parameters in one place where I could just uncomment and adjust them. Instead of having to search through documentation and find the default value from which > 1 & 2) by not having the settings be defined in a 500-line file, new users > would no longer be baffled by scores of settings which probably don't concern > them, trying to find the handful of settings which do. I'm not sure how an empty file is any less "baffling" than one listing the default value for parameters they don't need yet. > 3) We'd consolidate the GUC lists down from 3 places to 2, which is one less > area to synchronize. Magnus and I looked to see if it might be possible to > generate the docs from the same list, but it's not practical. This seems like a trivial gain and one which is unlikely to outweigh the pain of having to massage the info into C data structures. > 4) By shifting from a model where postgresql.conf is document-formatted and > hand-edited to one where it's machine generated, it becomes vastly easier to > write simple utilities to manage these settings. Right now, the big > "obstacle" to things like SET PERSISTENT is "how to we preseve the > hand-edited comments in the file" -- and the answer is we *don't.* What this sounds like is a sly way to try to get rid of postgresql.conf entirely and replace it with parameters stored in the database so admins would adjust the parameters using an SQL syntax rather than a text file. There are pros and cons of such a system but I think for newbie admins that would be a thousand times *more* baffling. You would have to learn new commands and have no holistic view of what parameters had been set, what related parameters might exist. You also have no way to keep the file in a version control system or sync across servers etc. > Have you *looked* at postgresql.conf.sample lately, Tom? It's a disaster. > Maintenance is already difficult, and becoming more so as we add settings. What difficulties? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On May 31, 2008, at 12:36, Gregory Stark wrote: > What this sounds like is a sly way to try to get rid of > postgresql.conf > entirely and replace it with parameters stored in the database so > admins would > adjust the parameters using an SQL syntax rather than a text file. > > There are pros and cons of such a system but I think for newbie > admins that > would be a thousand times *more* baffling. You would have to learn new > commands and have no holistic view of what parameters had been set, > what > related parameters might exist. You also have no way to keep the > file in a > version control system or sync across servers etc. FWIW, this has not been a barrier to MySQL adoption. Best, David
Josh Berkus wrote: > Currently, PostgreSQL,conf and our set of GUC configurations suffer from > 4 large problems: As we have talked about it before, you know that I agree that the GUC system could use some improvements. But I'm a bit surprised about some of your assessments. > 1. Most people have no idea how to set these. Could you clarify this? I can't really believe that people are incapable of editing a configuration file. > 2. The current postgresql.conf file is a huge mess of 194 options, the > vast majority of which most users will never touch. My opinion has always been that we should provide a default file with only the essential options instead of all of them. I see this as a the major problem, because people are overwhelmed and consequently don't set anything. > 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, > and the settings.sgml), which are only synched with each other manually. While this is not ideal, I can't really see who this is a major problem, at least from the perspective of the user. > 4. We don't seem to be getting any closer to autotuning. True. But how does your proposal address this?
On May 31, 2008, at 15:32, Peter Eisentraut wrote: >> 1. Most people have no idea how to set these. > > Could you clarify this? I can't really believe that people are > incapable of > editing a configuration file. I've been using PostgreSQL on and off, mostly on, for almost 10 years. I still have no idea what 75% of those settings in postgresql.conf mean or are for. There are an overwhelming number of them. I know that 5-8 of them I always touch, thanks largely to assistance now and then from Josh Berkus, but the rest are just complexity to me. I don't doubt that the vast majority of them are useful in one situation or another, but unless I'm in one of those situations, I really don't need to see them there and be confused by them. Does that help? >> 2. The current postgresql.conf file is a huge mess of 194 options, >> the >> vast majority of which most users will never touch. > > My opinion has always been that we should provide a default file > with only the > essential options instead of all of them. I see this as a the major > problem, > because people are overwhelmed and consequently don't set anything. +1 Best, David
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 In the interest of constructive criticism, here's some ways I think postgresql.conf could be improved. * A weak +1 to the #include idea. I'm much more inclined to simply add a new section to the bottom of the file and use version control, but I could see ways in which include would be useful. * Rearrange the settings to put the more "common" ones at the top, even if it means messing up the categories a bit. Having 'bonjour_name' at the top, and 'default_statistics_target' buried way down below is crazy. The sections should be looked at from a clean perspective: do we need them at all? Are their better ways to arrange things? What order should they be in? * Much more verbose comments. The abovementioned default_statistics_target is a very important settings, but there is zero explanation in the file of what it is. The only thing we're told is that it ranges from 10 - 1000. We can do better than that. Users would absolutely love it if each item had a clear explanation, and it would be well worth a slightly increased file size. See the postfix main.cf file for a good example of such. * Remove the confusing "commented out is default" bit entirely. Simply set each value explicitly. Why should new users have to confront this?: # The commented-out settings shown in this file represent the default values. # Re-commenting a setting is NOT sufficient to revert it to the default value; # you need to reload the server. * Lose the tabbed indenting, which looks bad, especially for multi-line comments. Just use spaces. * Create a tool, or at least a best practices, for controlling and tracking changes to the file. * Put some doc URLs in the file, at the minimum per major section. At the very bare minimum, a real URL at the top. * Indicate which values can be changed per-session or per-role. * Fix the disparity between the syntax in the file and the SET interface. For example, work_mem = 16MB works in the conf file, but you have to write SET work_mem = '16MB'. Easiest is probably just to quote everything in the conf. * Lose the post-value, end-of-line comments, they just get in the way when making changes and make the file harder to read by contributing to the wrap problem. * I'm tempted by the argument of creating a separate file for the obscure settings, but I think it would be too much pain, and nobody would ever agree on which settings are 'important' and which are 'obscure'. * It might be nice to mention other ways to reload the file, such as 'service postgresql reload', or whatever Windows uses. * The word 'paramters' is still misspelled. :) * That whole sentence about changing the parameters as command-line options needs to go away. * Since the executable is now named "postgres" (thank goodness we got rid of "postmaster"), the file should be named 'postgres.conf'. This would also be a way to quickly distinguish 'old' vs 'new' style conf files if we end up making major changes to it. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200805311911 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhB39sACgkQvJuQZxSWSshahQCg4V5QsO34HOhUDoPzT7STcR45 V5UAoPQxkmuk/oCYirTKxMAhV+Kh8Ytz =7Lgk -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: > > * A weak +1 to the #include idea. I'm much more inclined to simply > add a new section to the bottom of the file and use version > control, but I could see ways in which include would be useful. > > > > We already have include directives, and have had since 8.2. Thus spake the docs: "In addition to parameter settings, the postgresql.conf file can contain /include directives/, which specify another file to read and process as if it were inserted into the configuration file at this point. Include directives simply look like: include 'filename' If the file name is not an absolute path, it is taken as relative to the directory containing the referencing configuration file. Inclusions can be nested." cheers andrew
Greg Sabino Mullane wrote: > * Much more verbose comments. The abovementioned default_statistics_target > is a very important settings, but there is zero explanation in the file > of what it is. The only thing we're told is that it ranges from 10 - 1000. > We can do better than that. Users would absolutely love it if each item > had a clear explanation, and it would be well worth a slightly increased > file size. See the postfix main.cf file for a good example of such. I kind of agree with this but actually think we should have the bare minimum comments in the file. Why? Because our documentation links are static. Each setting should have the URL to the full documentation on a particular setting. > > * Create a tool, or at least a best practices, for controlling and tracking > changes to the file. > This I disagree with. There are plenty of tools to handle this should someone really want to. SVN, CVS, parrot, etc... Let systems management be the domain of systems management. > > * Put some doc URLs in the file, at the minimum per major section. At the > very bare minimum, a real URL at the top. > Hah! See above :) > > * Indicate which values can be changed per-session or per-role. > Agreed. Along with this although offtopic for this post is a grid in the docs that are explicit about this. > > * Fix the disparity between the syntax in the file and the SET interface. > For example, work_mem = 16MB works in the conf file, but you have to write > SET work_mem = '16MB'. Easiest is probably just to quote everything in the conf. Agreed. > > * I'm tempted by the argument of creating a separate file for the obscure > settings, but I think it would be too much pain, and nobody would ever agree on > which settings are 'important' and which are 'obscure'. > Actually I could buy into this. There really are only about a dozen must change settings (if that). I could see something like: Memory settings: network etc/network.conf include etc/memory.conf logging etc/logging.conf etc... > > * It might be nice to mention other ways to reload the file, such as > 'service postgresql reload', or whatever Windows uses. > I think a url to the docs is a better idea here. > > * The word 'paramters' is still misspelled. :) > Heh. > * Since the executable is now named "postgres" (thank goodness we got > rid of "postmaster"), the file should be named 'postgres.conf'. This would > also be a way to quickly distinguish 'old' vs 'new' style conf files if > we end up making major changes to it. It was never postmaster.conf (that I can recall). I don't see the issue here. Consider apache... It isn't apache.conf. I think postgresql.conf (as that is the name of the software) makes sense. Sincerely, Joshua D. Drake
Can I express one's user view: Greg Sabino Mullane writes: > * Much more verbose comments. The abovementioned default_statistics_target > is a very important settings, but there is zero explanation in the file > of what it is. The only thing we're told is that it ranges from 10 - 1000. > We can do better than that. Users would absolutely love it if each item > had a clear explanation, and it would be well worth a slightly increased > file size. See the postfix main.cf file for a good example of such. > Absolutely agreed :-). Total list of GUC and fully explanation each of them is sufficient. For "easy start", it might be a several postgresql.conf with distribution, like mysql doing it, or generate postgresql.conf by wizard-dialog before initdb starting, for example. But I think that all GUCs must present in configuration file. Thanks.
Gregory Stark wrote: > > I think we do a pretty good job of this already. Witness things like > effective_cache_size -- imagine if this were "nested_loop_cache_hit_rate" for > example, good luck figuring out what to set it to. I think either of these are fine if we describe how to measure them. Ideally if we had a GUC that said "log_nested_loop_cache_hit_rate" that enabled some timing code (understandably with lots of overhead) that made an attempt to measure the hit rate, it'd be easier to figure out than the effective cache size, no? > The vacuum cost delay factors are probably ripe for such a recast though. I > think we need just one parameter "vacuum_io_bandwidth" or something like that. +1; though perhaps the inverse of that is more useful. When my machines are idle I'd be happy if they vacuum more. Wouldn't we be better served specifying the I/O bandwidth of each device/tablespace and letting vacuum use whatever portion would be otherwise idle? > The bgwriter parameters might also be a candidate but I'm less certain.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Andrew Dunstan wrote: > We already have include directives, and have had since 8.2. Heh, thanks - which proves how useless they are to me. :) Joshua Drake wrote: > I kind of agree with this but actually think we should have the bare > minimum comments in the file. Why? Because our documentation links are > static. Each setting should have the URL to the full documentation on a > particular setting. Ugh, why so much context switching? Put the docs next to the setting. URLs are nice but not necessary. If you are arguing for minimum comments in conf files, please make a patch for pg_hba.conf ;) >> * Create a tool, or at least a best practices, for controlling and tracking >> changes to the file. > This I disagree with. There are plenty of tools to handle this should > someone really want to. SVN, CVS, parrot, etc... Let systems management > be the domain of systems management. Well, perhaps just a note in the docs at least that one might want to put postgresql.conf in version control. I've seen people not doing so more often than you would think. Perhaps because they are DBAs and not sysadmins? I also meant a tool to do things like verify that the changes are valid, as someone else mentioned elsewhere in this thread. >> * It might be nice to mention other ways to reload the file, such as >> 'service postgresql reload', or whatever Windows uses. > I think a url to the docs is a better idea here. Good point. Maybe a sort of "DBA basics" page in the docs is warranted for things like this. >> * Since the executable is now named "postgres" (thank goodness we got >> rid of "postmaster"), the file should be named 'postgres.conf'. This would >> also be a way to quickly distinguish 'old' vs 'new' style conf files if >> we end up making major changes to it. > It was never postmaster.conf (that I can recall). I don't see the issue > here. Consider apache... It isn't apache.conf. Not saying it ever was postmaster.conf: just that I'm glad we finally changed the name. As for the Apache project, the httpd executable reads the httpd.conf file. Hence, one might expect the postgres executable to read a postgres.conf file. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200806011656 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhDEJMACgkQvJuQZxSWSsgeogCfT0g69NDoxyWGiWmDcB3PxH8h wJ8AnjzssA7aIk0rBdJzL+bB5vSQSeBV =lgZG -----END PGP SIGNATURE-----
Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: RIPEMD160 > Joshua Drake wrote: > >> I kind of agree with this but actually think we should have the bare >> minimum comments in the file. Why? Because our documentation links are >> static. Each setting should have the URL to the full documentation on a >> particular setting. > > Ugh, why so much context switching? Put the docs next to the setting. URLs > are nice but not necessary. If you are arguing for minimum comments in > conf files, please make a patch for pg_hba.conf ;) Hah! Well I don't know that a minimum of comments is what I am arguing as much as not too much comments. The comments in general in the postgresql.conf are useless unless you have previous knowledge. I really think that if we take advantage of the fact that we have static URLs that life would be easier overall. > >>> * Create a tool, or at least a best practices, for controlling and tracking >>> changes to the file. > >> This I disagree with. There are plenty of tools to handle this should >> someone really want to. SVN, CVS, parrot, etc... Let systems management >> be the domain of systems management. > > Well, perhaps just a note in the docs at least that one might want to put > postgresql.conf in version control. I could certainly buy into this. No reason we can't help people better administrators. I would suggest a link to a static wiki page (on wiki.pg) that would link to each option? > I've seen people not doing so more often > than you would think. Perhaps because they are DBAs and not sysadmins? I also > meant a tool to do things like verify that the changes are valid, as someone > else mentioned elsewhere in this thread. pg_ctl -D data check? I would +1 that. Including (in later releases): WARNING: You specify 66536 for shared buffers but you only have 131072 of memory. Consider decreasing the parameter. Obviously we would need more non math friendly wording. > >>> * It might be nice to mention other ways to reload the file, such as >>> 'service postgresql reload', or whatever Windows uses. > >> I think a url to the docs is a better idea here. > > Good point. Maybe a sort of "DBA basics" page in the docs is warranted for > things like this. > Yeah I could buy into this. >>> * Since the executable is now named "postgres" (thank goodness we got >>> rid of "postmaster"), the file should be named 'postgres.conf'. This would >>> also be a way to quickly distinguish 'old' vs 'new' style conf files if >>> we end up making major changes to it. > >> It was never postmaster.conf (that I can recall). I don't see the issue >> here. Consider apache... It isn't apache.conf. > > Not saying it ever was postmaster.conf: just that I'm glad we finally > changed the name. As for the Apache project, the httpd executable reads the > httpd.conf file. Hence, one might expect the postgres executable to read a > postgres.conf file. Maybe, but I think I would need more convincing. Sincerely, Joshua D. Drake
"Joshua D. Drake" <jd@commandprompt.com> writes: > Greg Sabino Mullane wrote: >> Ugh, why so much context switching? Put the docs next to the setting. URLs >> are nice but not necessary. If you are arguing for minimum comments in >> conf files, please make a patch for pg_hba.conf ;) > Hah! Well I don't know that a minimum of comments is what I am arguing > as much as not too much comments. The comments in general in the > postgresql.conf are useless unless you have previous knowledge. I really > think that if we take advantage of the fact that we have static URLs > that life would be easier overall. Yeah ... this thread started out with the idea of reducing duplication, so how did we arrive at wanting to duplicate the SGML documentation into postgresql.conf.sample? I think we need to integrate, not duplicate. (pg_hba.conf.sample is a pretty good example of what not to do, I think ... and it used to be even worse ...) regards, tom lane
On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> wrote: >> I've seen people not doing so more often >> than you would think. Perhaps because they are DBAs and not sysadmins? I >> also >> meant a tool to do things like verify that the changes are valid, as >> someone >> else mentioned elsewhere in this thread. > > pg_ctl -D data check? > > I would +1 that. I would also really like to see that - though I'd also like to see an SQL interface so we can check a config before saving when editing via pgAdmin or similar. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Sun, 1 Jun 2008, Joshua D. Drake wrote: > Well I don't know that a minimum of comments is what I am arguing as > much as not too much comments. Josh's proposal included making three levels of documentation-level comments available: terse, normal, and verbose. The verbose comment level probably should include a web link to full documentation. The way the comments litter the existing file, the status quo that's called normal mode in this proposal, is IMHO a complete mess. Most use cases I can think of want either no comments or really verbose ones, the limited commentary in the current sample postgresql.conf seems a middle ground that's not right for anybody. The key thing thing here in my mind is that it should be possible to transform between those three different verbosity levels without losing any settings or user-added comments. They're really just different views on the same data, and which view you're seeing should be easy to change without touching the data. I just extracted the original design proposal and some of the relevent follow-up in this thread, made some additional suggestions, and put the result at http://wiki.postgresql.org/wiki/GUCS_Overhaul I think reading that version makes it a bit clearer what the proposed overhaul is aiming for. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Sun, 1 Jun 2008, Peter Eisentraut wrote: > Josh Berkus wrote: >> 1. Most people have no idea how to set these. > Could you clarify this? I can't really believe that people are incapable of > editing a configuration file. The big problem isn't the editing, it's knowing what to set the configuration values to. This is not to say that editing a configuration file should be considered reasonable. Any GUCS overhaul should include a design goal of being able to completely manage the configuration system using, say, pgadmin (the "manage settings via port access" part that Josh already mentioned). This is why I was suggesting additions aimed at assimilating all the things that are in the postgresql.conf file. Joshua has been banging a drum for a while now that all this data needs to get pushing into the database itself. The GUCS data is clearly structured like a database table. Josh's suggested changes are basically adding all the columns needed to it in order to handle everything you'd want to do to the table. If you think of it in those terms and make it possible to manipulate that data using the tools already available for updating tables, you'll open up the potential to add a whole new class of user-friendly applications for making configuration easier to manage. However, I don't fully agree with taking that idea as far as Joshua has suggested (only having the config data in the database), because having everything in a simple text file that can be managed with SCM etc. has significant value. It's nice to allow admins to be able to make simple changes with just a file edit. It's nice that you can look at all the parameters in one place and browse them. However, I do think that the internal database representation must be capable of holding everything in the original postgresql.conf file and producing an updated version of the file, either locally or remotely, as needed. >> 4. We don't seem to be getting any closer to autotuning. > True. But how does your proposal address this? The idea that Josh's suggestions are working toward is simplying the construction of tools that operate on the server configuration file, so that it's easier to write an autotuning tool. Right now, writing such a tool in a generic way gets so bogged down just in parsing/manipulating the postgresql.conf file that it's hard to focus on actually doing the tuning part. If we go back to his original suggestion: http://wiki.postgresql.org/wiki/GUCS_Overhaul >> Add a script called pg_generate_conf to generate a postgresql.conf >> based on guc.c and command-line switches (rather than >> postgresql.conf.sample) It's an easy jump from there to imagine a pg_generate_conf that provide a "wizard" interface to update a configuration file. I forsee a little GUI or web app that connects to a server on port 5432, finds out some basic information about the server, and gives something like this: Parameter Current Recommended Change? shared_buffers 32MB 1024MB [X] effective_cache_size 128MB 3000MB [ ] work_mem 1MB 16MB [ ] Josh has the actual brains behind such an app all planned out if you look at his presentations, but without the larger overhaul it's just not possible to make the implementation elegant. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: >Joshua has been banging a drum for a while now that all this data needs to >get pushing into the database itself. The GUCS data is clearly structured >tables, you'll open up the potential to add a whole new class of >user-friendly applications for making configuration easier to manage. >However, I don't fully agree with taking that idea as far as Joshua has >suggested (only having the config data in the database), because having >everything in a simple text file that can be managed with SCM etc. has >significant value. It's nice to allow admins to be able to make simple >changes with just a file edit. It's nice that you can look at all the >parameters in one place and browse them. However, I do think that the >internal database representation must be capable of holding everything in >the original postgresql.conf file and producing an updated version of the >file, either locally or remotely, as needed. Depending on the complexity you want to have inside the generator, one could imagine a middle ground solution like: include "database-generated.conf" include "local-overrides.conf" Where the "database-generated.conf" does not necessarily needs a lot of comments. >Josh has the actual brains behind such an app all planned out if you look >at his presentations, but without the larger overhaul it's just not >possible to make the implementation elegant. IMHO Greg's response is the most comprehensive and well-thought-through contribution in the whole GUC thread. -- Sincerely, Stephen R. van den Berg.
Greg Smith <gsmith@gregsmith.com> writes: > Joshua has been banging a drum for a while now that all this data needs to > get pushing into the database itself. This is, very simply, not going to happen. Shall we go over the reasons why not, one more time? 1. Quite a few of the GUC parameters are needed *before* one can ever read the database; in particular the ones about file locations and shared memory sizing. 2. Suppose you change a parameter in a way that breaks the DB (eg, set shared_buffers to a value larger than your kernel allows) and try to restart. Database doesn't start. If the parameter can only be changed back within an operating database, you're hosed. I have no objection to providing alternative ways to edit the configuration data, but the primary source of the settings is going to continue to be an editable text file. Any proposals for alternatives-to-a-text-editor have to work within that reality. regards, tom lane
Tom Lane wrote: >Greg Smith <gsmith@gregsmith.com> writes: >> Joshua has been banging a drum for a while now that all this data needs to >> get pushing into the database itself. >This is, very simply, not going to happen. Shall we go over the reasons >why not, one more time? >1. Quite a few of the GUC parameters are needed *before* one can ever >read the database; in particular the ones about file locations and >shared memory sizing. Obviously. >2. Suppose you change a parameter in a way that breaks the DB (eg, >set shared_buffers to a value larger than your kernel allows) and >try to restart. Database doesn't start. If the parameter can only >be changed back within an operating database, you're hosed. That's why those parameters will always have to be read from a textfile. Which doesn't prohibit that textfile to be generated from within the database (once up and running). And yes, if someone changes the parameter through the DB, then makes the DB write out the configfile, then restarts the DB remotely, and the change didn't work out (i.e. DB doesn't start, or crashes/hangs), he is hosed and needs to log in and change the textfile with a texteditor, no way around it. >I have no objection to providing alternative ways to edit the >configuration data, but the primary source of the settings is >going to continue to be an editable text file. Any proposals for >alternatives-to-a-text-editor have to work within that reality. I think everyone is aware of that. -- Sincerely, Stephen R. van den Berg.
Simon Riggs wrote: > > Some other problems I see with GUCs > > * It's not possible to set one parameter depending upon the setting of > another. > To me this is more critical.. Most people I have seen will increase one or few but not all parameters related to memory which can result in loss of performance and productivity in figuring out. What happened to AvailRAM setting and base all memory gucs on that. Ideally PostgreSQL should only create one big memory pool and allow all other variables to change runtime via dba or some tuner process or customized application as long as total is less than the allocated shared_memory and local_memory settings. (This will also reduce the need of restarting Postgres if a value needs to be changed) -Jignesh > * It's always unclear which GUCs can be changed, and when. That is much > more infrequently understood than the meaning of them. > > * We should rename effective_cache_size to something that doesn't sound > like it does what shared_buffers does > > * There is no config verification utility, so if you make a change and > then try to restart and it won't, you are in trouble. > >
On Mon, 2 Jun 2008, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: >> Joshua has been banging a drum for a while now that all this data needs to >> get pushing into the database itself. > > This is, very simply, not going to happen. Right, there are also technical challenges in the way of that ideal. I was only mentioning the reasons why it might not be the best idea even if it were feasible. However, I do not see why the limitations you bring up must get in the way of thinking about how to interact and manage the configuration data in a database context, even though it ultimately must be imported and exported to a flat file. The concerns you bring up again about leaving the database in an unstartable state are a particularly real danger in the "only has access to 5432" hosted provider case that this redesign is trying to satisfy. I added a "Gotchas" section to the wiki page so that this issue doesn't get forgotten about. The standard way to handle this situation is to have a known good backup configuration floating around. Adding something in that area may end up being a hard requirement before remote editing makes sense. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, 2008-06-02 at 11:59 -0400, Jignesh K. Shah wrote: > > Simon Riggs wrote: > > > > Some other problems I see with GUCs > > > > * It's not possible to set one parameter depending upon the setting of > > another. > > > > To me this is more critical.. Most people I have seen will increase one > or few but not all parameters related to memory which can result in loss > of performance and productivity in figuring out. > > What happened to AvailRAM setting and base all memory gucs on that. > Ideally PostgreSQL should only create one big memory pool and allow all > other variables to change runtime via dba or some tuner process or > customized application as long as total is less than the allocated > shared_memory and local_memory settings. (This will also reduce the need > of restarting Postgres if a value needs to be changed) Agreed. Right now, we can't even do that in code, let alone in config file. If we had a smart_memory_config = on then we'd be able to say in the backend:if (smart_memory_config){ other_thing = 0.1 * Nbuffers;} but the GUCs are evaluated in alphabetical order, without any way of putting dependencies between them. So they are notionally orthogonal. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 2 Jun 2008, Jignesh K. Shah wrote: > Most people I have seen will increase one or few but not all parameters > related to memory which can result in loss of performance and > productivity in figuring out. If it becomes easier to build a simple tool available to help people tune their configurations, that should help here without having to do anything more complicated than that. > What happened to AvailRAM setting and base all memory gucs on that. Like some of the other GUC simplification ideas that show up sometimes (unifying all I/O and limiting background processes based on that total is another), this is hard to do internally. Josh's proposal has a fair amount of work involved, but the code itself doesn't need to be clever or too intrusive. Unifying all the memory settings would require being both clever and intrusive, and I doubt you'll find anybody who could pull it off who isn't already overtasked with more important improvements for the 8.4 timeframe. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg, > Like some of the other GUC simplification ideas that show up sometimes > (unifying all I/O and limiting background processes based on that total > is another), this is hard to do internally. Josh's proposal has a fair > amount of work involved, but the code itself doesn't need to be clever > or too intrusive. Unifying all the memory settings would require being > both clever and intrusive, and I doubt you'll find anybody who could > pull it off who isn't already overtasked with more important > improvements for the 8.4 timeframe. Plus, I'm a big fan of "enable an API" rather than "write a feature". I think that there are people & companies out there who can write better autotuning tools than I can, and I'd rather give them a place to plug those tools in than trying to write autotuning into the postmaster. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
to add some complexity to this topic :-) Please note I admit upfront I am not familiar with every parameter out there, but during my quest in finding bottleneck while stressing the back-end I find many GUC parameters with names that show they should be interesting. I read the comments, the docs (that I know of), go into the source to learn the parameters and their units. And wondering if altering a setting would do miracles in my case. Indeed how can I measure the effect of a new setting? Enhanced throughput? I learned in the past that fiddling with parameter A in a database that was not properly setup/used --and there were many of these--, had side effects in other parts of the engine. Yes, increased throughput was observed. A new hype created. In the end it turned out the parameter A was not set correctly at all. That parameter B, once set to a sane value, cured the wrong behavior, and parameter A was not optimal at all after the cure. We were just side tracked because we did not know. Incorrect "knowledge" was borne (parameter A setting). Throughout the years this database product has matured, many more parameter realized and much, much more instrumentation been implemented. It still is quite a challenge to understand what is happening. But proper analysis is possible indeed. The black box is much more open now. One current example: wal_writer_delay. In my team there is an advise to set this parameter to 100. However, after implementing a counter (home grown instrumentation) I now know that the background log flush routine is never called when stressing the database. Therefore I now think the best setting is 10000 (its maximum) since it does not do useful work (in my context) and therefore should wake up as little times as possible. Without this instrumentation I can only guess about the usability of this parameter and spend many tests in order to get an impression of its validity to me. So overhauling the GUC parameters is one step, but adding proper instrumentation in order to really measure the impact of the new setting is necessary too. Especially when looking in the direction of auto tuning. Proper measurement is crucial to enable correct analysis. Of course I am in favor of doing this with DTrace, however not all platforms can benefit in that case :-) --Paul On 2 jun 2008, at 20:06, Josh Berkus wr > Greg, > >> Like some of the other GUC simplification ideas that show up >> sometimes >> (unifying all I/O and limiting background processes based on that >> total >> is another), this is hard to do internally. Josh's proposal has a >> fair >> amount of work involved, but the code itself doesn't need to be >> clever >> or too intrusive. Unifying all the memory settings would require >> being >> both clever and intrusive, and I doubt you'll find anybody who could >> pull it off who isn't already overtasked with more important >> improvements for the 8.4 timeframe. > > Plus, I'm a big fan of "enable an API" rather than "write a > feature". I > think that there are people & companies out there who can write better > autotuning tools than I can, and I'd rather give them a place to plug > those tools in than trying to write autotuning into the postmaster. > > -- > --Josh > > Josh Berkus > PostgreSQL @ Sun > San Francisco > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers Regards, Paul van den Bogaard --------------------------------------------------------------------------------------------- Paul van den Bogaard Paul.vandenBogaard@sun.com ISV-E -- ISV Engineering, Opensource Engineering group Sun Microsystems, Inc phone: +31 334 515 918 Saturnus 1 extentsion: x (70)15918 3824 ME Amersfoort mobile: +31 651 913 354 The Netherlands fax: +31 334 515 001
On Tue, 3 Jun 2008, Paul van den Bogaard wrote: > So overhauling the GUC parameters is one step, but adding proper > instrumentation in order to really measure the impact of the new setting > is necessary too. Correct, but completely off-topic regardless. One problem to be solved here is to take PostgreSQL tuning from zero to, say, 50% automatic. Wander the user lists for a few months; the number of completely misconfigured systems out there is considerable, partly because the default values for many parameters are completely unreasonable for modern hardware and there's no easy way to improve on that without someone educating themselves. Getting distracted by the requirements of the high-end systems will give you a problem you have no hope of executing in a reasonable time period. By all means bring that up as a separate (and much, much larger) project: "Database Benchmarking and Sensitivity Analysis of Performance Tuning Parameters" would make a nice PhD project for somebody, and there's probably a good patent in there somewhere. Even if you had such a tool, it wouldn't be usable by non-experts unless the mundate GUC generation issues are dealt with first, and that's where this is at right now. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Sat, May 31, 2008 at 08:36:54PM +0100, Gregory Stark wrote: > > 4) By shifting from a model where postgresql.conf is document-formatted and > > hand-edited to one where it's machine generated, it becomes vastly easier to > > write simple utilities to manage these settings. Right now, the big > > "obstacle" to things like SET PERSISTENT is "how to we preseve the > > hand-edited comments in the file" -- and the answer is we *don't.* > > What this sounds like is a sly way to try to get rid of postgresql.conf > entirely and replace it with parameters stored in the database so admins would > adjust the parameters using an SQL syntax rather than a text file. > > There are pros and cons of such a system but I think for newbie admins that > would be a thousand times *more* baffling. You would have to learn new > commands and have no holistic view of what parameters had been set, what > related parameters might exist. You also have no way to keep the file in a > version control system or sync across servers etc. It doesn't have to be in the database. I kinda like Oracle's method here; they default to storing config options in a binary file that (iirc) you can change via commands, but they also provide a way to turn that binary file into text (and back). That would allow for a lot of options when it comes to configuring stuff. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
On Mon, Jun 02, 2008 at 10:12:06AM -0400, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > Joshua has been banging a drum for a while now that all this data needs to > > get pushing into the database itself. > > This is, very simply, not going to happen. Shall we go over the reasons > why not, one more time? > <snip> > > I have no objection to providing alternative ways to edit the > configuration data, but the primary source of the settings is > going to continue to be an editable text file. Any proposals for > alternatives-to-a-text-editor have to work within that reality. There's no reason that the server has to deal with a text file. I completely agree that there must be a method to change settings even if the database isn't running, but that method does not necessarily need to be a text file. If we can come up with a standard API for reading and writing config changes, we (or anyone else) can write any number of tools to deal with the settings. And once we have an API, we can provide a SQL interface on top of it. Instead of focusing on exactly what the 'new postgresql.conf' is going to look like we should focus on creating a clean configuration API. Once we have that we can figure out what (if anything) we're doing with the existing .conf, and what a new one (if it exists) might look like. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Decibel! wrote: >> >> There's no reason that the server has to deal with a text file. I >> completely agree that there must be a method to change settings even if >> the database isn't running, but that method does not necessarily need to >> be a text file. If we can come up with a standard API for reading and >> writing config changes, we (or anyone else) can write any number of >> tools to deal with the settings. And once we have an API, we can provide >> a SQL interface on top of it. Once in a lifetime, a man should plant a tree, father a child, and write an editor... :-) Hiding the storage of config parameters opaquely behind an API is something I've been hating for a long time on win32. When reading this thread, I'm wondering if anybody ever saw a config file for a complex software product that was easily editable and understandable. I don't know one. If there was one, it'd be nice to know it so we can learn from it. IMHO the best compromise in machine and human readability is an XML format. It's easily decorateable with comments, easily interpreted and a pg_settings view could enhance it with even more comments, so an editor using pgsql functions (to read and write postresql.conf.xml) could be enabled to supply comprehensive help. Regards, Andreas
* Andreas Pflug <pgadmin@pse-consulting.de> [080604 10:20]: > Hiding the storage of config parameters opaquely behind an API is > something I've been hating for a long time on win32. ;-) > When reading this thread, I'm wondering if anybody ever saw a config > file for a complex software product that was easily editable and > understandable. I don't know one. If there was one, it'd be nice to know > it so we can learn from it. PostreSQL, Apache, X.org They are all easily editable, and "understandable", in the sense that I understand that I'm supposed to edit the line, changing the value (following the comments list of accepted values) They are "less understandable" if you mean that I know the implications of any change I make. But guess what, having those values inputed through some other mechanism (like a GUI config file editor, a SQL statement, or a nice pgadmin-SQL-hiding-interface isn't going to change that part of "understandable". That part of understandable only comes through good documentation and reference material, which is universally applicable to any config method. > IMHO the best compromise in machine and human readability is an XML > format. It's easily decorateable with comments, easily interpreted and a > pg_settings view could enhance it with even more comments, so an editor > using pgsql functions (to read and write postresql.conf.xml) could be > enabled to supply comprehensive help. Well, In my past, I've generally not got around to installing and using software that reqired me to edit some jumble of XML. Ya, maybe I'm lucky. And since I've got a lot invested in PG, I'ld be forced to of PG moved to an XML config, but I'ld be forced to kicking and screaming... I just *know* that I'ld reload/restart postmaster some time, and the config file wouldn't be quite correct, and I'ld search for 10 minutes trying to find the extra (or lack) ", or missing closing /... But maybe most people are better at parsing XML than me. And that also may be because I've actively avoided it for so long ;-) I don't know *how* I'ld decorate my XML config file with comments and history the way I do my text-based #-commented config files. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Aidan Van Dyk wrote: > * Andreas Pflug <pgadmin@pse-consulting.de> [080604 10:20]: > > >> Hiding the storage of config parameters opaquely behind an API is >> something I've been hating for a long time on win32. >> > > ;-) > > >> When reading this thread, I'm wondering if anybody ever saw a config >> file for a complex software product that was easily editable and >> understandable. I don't know one. If there was one, it'd be nice to know >> it so we can learn from it. >> > > PostreSQL, Apache, X.org > > They are all easily editable, and "understandable", in the sense that I > understand that I'm supposed to edit the line, changing the value > (following the comments list of accepted values) > > They are "less understandable" if you mean that I know the implications > of any change I make. But guess what, having those values inputed > through some other mechanism (like a GUI config file editor, a SQL statement, > or a nice pgadmin-SQL-hiding-interface isn't going to change that part > of "understandable". That part of understandable only comes through > good documentation and reference material, which is universally > applicable to any config method. > Right. On the editing side, a column "link" in pg_settings that can be used to construct an URL to postgresql.org/docs/xxx#yyy could help creating editors that support the user. Whatever a text config file will look like, you need to know exactly which parameter to use and where to locate it; even structuring parameters won't help too much for the typical starter task "I installed pgsql, what to do next". > >> IMHO the best compromise in machine and human readability is an XML >> format. It's easily decorateable with comments, easily interpreted and a >> pg_settings view could enhance it with even more comments, so an editor >> using pgsql functions (to read and write postresql.conf.xml) could be >> enabled to supply comprehensive help. >> > > Well, In my past, I've generally not got around to installing and using > software that reqired me to edit some jumble of XML. Ya, maybe I'm > lucky. And since I've got a lot invested in PG, I'ld be forced to of PG > moved to an XML config, but I'ld be forced to kicking and screaming... > > I just *know* that I'ld reload/restart postmaster some time, and the > config file wouldn't be quite correct, and I'ld search for 10 minutes > trying to find the extra (or lack) ", or missing closing /... But maybe > most people are better at parsing XML than me. And that also may be > because I've actively avoided it for so long ;-) > Well I'm an XML evangelist either. But the usual "commenting out a parameter will reset it to default on reload, no?" caveat isn't funny either, or duplicate parameter settings scattered throughout your file. This may be avoided by *preferably* editing the parameters through pgsql itself; the current postgresql.conf file format isn't too machine write friendly (as I know since I wrote the pgadmin config file editor). But having a config file that can't be used with simple editors at all is a nightmare. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Well I'm an XML evangelist either. But the usual "commenting out a > parameter will reset it to default on reload, no?" caveat isn't funny > either, or duplicate parameter settings scattered throughout your file. Surely everyone who is opining on this thread knows that we fixed that in 8.3. regards, tom lane
On Jun 3, 2008, at 20:48, Greg Smith wrote: > Correct, but completely off-topic regardless. One problem to be > solved here is to take PostgreSQL tuning from zero to, say, 50% > automatic. Wander the user lists for a few months; the number of > completely misconfigured systems out there is considerable, partly > because the default values for many parameters are completely > unreasonable for modern hardware and there's no easy way to improve > on that without someone educating themselves. Getting distracted by > the requirements of the high-end systems will give you a problem you > have no hope of executing in a reasonable time period. Exactly. The issue is that application developers, who are not DBAs, have no idea how to tune PostgreSQL, and postgresql.conf is daunting and confusing. So they use a different database that's "faster". I think that right now postgresql.conf is designed for full-time DBAs, rather than folks who might want to target PostgreSQL for an application they're developing. We want to attract the latter (without, of course, any expense with the former). Changing how configuration works so that it's easier to understand and, if possible, at least partly automatically tunable would go a long way towards making PostgreSQL friendlier for developers, IMHO. Best, David
On Jun 4, 2008, at 07:19, Andreas Pflug wrote: > IMHO the best compromise in machine and human readability is an XML > format. It's easily decorateable with comments, easily interpreted > and a pg_settings view could enhance it with even more comments, so > an editor using pgsql functions (to read and write > postresql.conf.xml) could be enabled to supply comprehensive help. I hope that was a joke… Best, David
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Exactly. The issue is that application developers, who are not DBAs, > have no idea how to tune PostgreSQL, and postgresql.conf is daunting > and confusing. So they use a different database that's "faster". Changing some of our defaults would go a long way as well. -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhGzCYACgkQvJuQZxSWSsi2AQCfZFa8PBZ16ljeFNmhY/YlkxF9 8KcAnjciYml1icuuxSPiKIyeE8YrVxOA =q4Cf -----END PGP SIGNATURE-----
"David E. Wheeler" <david@kineticode.com> writes: > Exactly. The issue is that application developers, who are not DBAs, > have no idea how to tune PostgreSQL, and postgresql.conf is daunting > and confusing. So they use a different database that's "faster". I take it you haven't looked at mysql's configuration file lately. They aren't actually in any better shape than we are, except that they supply several "preconfigured" sample files for people to choose from. regards, tom lane
On Jun 4, 2008, at 11:22, Tom Lane wrote: > "David E. Wheeler" <david@kineticode.com> writes: >> Exactly. The issue is that application developers, who are not DBAs, >> have no idea how to tune PostgreSQL, and postgresql.conf is daunting >> and confusing. So they use a different database that's "faster". > > I take it you haven't looked at mysql's configuration file lately. I'm not much into MySQL, but in the work I've done with it, I've had to create /etc/my.cnf myself. There *is* no configuration file configuring MySQL until that file is created, is there? So there is no configuration to learn at first. I'm not saying that this is necessarily admirable -- it's kind of the opposite end of the spectrum (PostgreSQL: "Here is every configuration tweak you could ever possibly want, have fun!" vs MySQL: "There is no configuration until you need one, then you have to find the docs for it." > They aren't actually in any better shape than we are, except that > they supply several "preconfigured" sample files for people to choose > from. Which would be a good start, if nothing else… Best, David
David E. Wheeler wrote: > On Jun 4, 2008, at 11:22, Tom Lane wrote: > >> "David E. Wheeler" <david@kineticode.com> writes: >>> Exactly. The issue is that application developers, who are not DBAs, >>> have no idea how to tune PostgreSQL, and postgresql.conf is daunting >>> and confusing. So they use a different database that's "faster". >> >> I take it you haven't looked at mysql's configuration file lately. > > I'm not much into MySQL, but in the work I've done with it, I've had > to create /etc/my.cnf myself. There *is* no configuration file > configuring MySQL until that file is created, is there? So there is no > configuration to learn at first. I'm not saying that this is > necessarily admirable -- it's kind of the opposite end of the spectrum > (PostgreSQL: "Here is every configuration tweak you could ever > possibly want, have fun!" vs MySQL: "There is no configuration until > you need one, then you have to find the docs for it." Tell me how that's better. If that's what you want, simply remove all the comment lines from your config file. Problem solved. > >> They aren't actually in any better shape than we are, except that >> they supply several "preconfigured" sample files for people to choose >> from. > > Which would be a good start, if nothing else… It's been suggested in the past. It is highly debatable that it would actually be an advance. cheers andrew
On Jun 4, 2008, at 12:48, Andrew Dunstan wrote: >> I'm not much into MySQL, but in the work I've done with it, I've >> had to create /etc/my.cnf myself. There *is* no configuration file >> configuring MySQL until that file is created, is there? So there is >> no configuration to learn at first. I'm not saying that this is >> necessarily admirable -- it's kind of the opposite end of the >> spectrum (PostgreSQL: "Here is every configuration tweak you could >> ever possibly want, have fun!" vs MySQL: "There is no configuration >> until you need one, then you have to find the docs for it." > > Tell me how that's better. > > If that's what you want, simply remove all the comment lines from > your config file. Problem solved. I didn't say it was better. The point is that it seems to be less confusing to non-DBAs. >> Which would be a good start, if nothing else… > > It's been suggested in the past. It is highly debatable that it > would actually be an advance. Agreed; we've bandied around some better ideas here. Well, I haven't, I've just kibbitzed. But we can surely do better. Best, David
"David E. Wheeler" <david@kineticode.com> writes: > I'm not much into MySQL, but in the work I've done with it, I've had > to create /etc/my.cnf myself. There *is* no configuration file > configuring MySQL until that file is created, is there? So there is no > configuration to learn at first. Postgres will start happily with an empty configuration file, too. What's your point? regards, tom lane
2008/6/4 David E. Wheeler <david@kineticode.com>: > On Jun 3, 2008, at 20:48, Greg Smith wrote: > >> Correct, but completely off-topic regardless. One problem to be solved >> here is to take PostgreSQL tuning from zero to, say, 50% automatic. Wander >> the user lists for a few months; the number of completely misconfigured >> systems out there is considerable, partly because the default values for >> many parameters are completely unreasonable for modern hardware and there's >> no easy way to improve on that without someone educating themselves. >> Getting distracted by the requirements of the high-end systems will give >> you a problem you have no hope of executing in a reasonable time period. > > Exactly. The issue is that application developers, who are not DBAs, have no > idea how to tune PostgreSQL, and postgresql.conf is daunting and confusing. > So they use a different database that's "faster". > do you thing, so any better config can help? It's not possible. And you can't tune database without well knowledge of applications that use database. Any automatic tools are joy for child. But some default PostgreSQL parameters are not optimal. > I think that right now postgresql.conf is designed for full-time DBAs, > rather than folks who might want to target PostgreSQL for an application > they're developing. We want to attract the latter (without, of course, any > expense with the former). Changing how configuration works so that it's > easier to understand and, if possible, at least partly automatically tunable > would go a long way towards making PostgreSQL friendlier for developers, > IMHO. > > Best, > > David > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On Wed, 4 Jun 2008, Andreas Pflug wrote: > When reading this thread, I'm wondering if anybody ever saw a config file for > a complex software product that was easily editable and understandable. I would recommend Apache's httpd.conf as an example of something that's easy to edit and follow. Like any complex product, the comments in the configuration file itself can't possibly be sufficient by themselves. But in general I've found Apache's config file to have enough comments to jog my memory when I'm editing it while not being overwhelming. They provide enough detail that when I run into a setting I don't understand there's enough context provided that it's easy to search for more information. Poking around with Google for a bit, here's a reasonable sample: http://webdav.org/goliath/dav_on_x/apache.conf > IMHO the best compromise in machine and human readability is an XML format. If the primary PostgreSQL configuration file becomes XML I will quit working with the project. I'm not kidding. If you think XML is easy to generate, edit by hand, and use revision control on, we are at such an fundamental disagreement that I wouldn't even try and directly argue with you. Instead I'll quote Eric Raymond: "The most serious problem with XML is that it doesn't play well with traditional Unix tools. Software that wants to read an XML format needs an XML parser; this means bulky, complicated programs." http://www.catb.org/esr/writings/taoup/html/ch05s02.html#id2907018 Let me suggest the following requirement instead which naturally rules it out: it should be possible for a DBA-level coder to write a simple shell script that does something useful with the configuration file in order for having a text-based configuration to be useful in this context. To give a simple example, I can write a single line [sed|awk|perl] command that will let me update the value for one parameter in the current postgresql.conf file. When you can give me a one-liner that does that on an XML file in any shell language in that class, then we might have something to talk about. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jun 4, 2008, at 13:31, Pavel Stehule wrote: > do you thing, so any better config can help? It's not possible. And > you can't tune database without well knowledge of applications that > use database. Any automatic tools are joy for child. But some default > PostgreSQL parameters are not optimal. Agreed. But, speaking as an app developer, I'm child-like in my DBA abilities. I could use a toy to help me with it. :-) Best, David
On Jun 4, 2008, at 13:12, Tom Lane wrote: > "David E. Wheeler" <david@kineticode.com> writes: >> I'm not much into MySQL, but in the work I've done with it, I've had >> to create /etc/my.cnf myself. There *is* no configuration file >> configuring MySQL until that file is created, is there? So there is >> no >> configuration to learn at first. > > Postgres will start happily with an empty configuration file, too. > What's your point? That it's less daunting for inexperienced users to start with that. I'm not talking about how things work, I'm talking about what configurations are present to start with. That's all. Best, David
Greg Smith <gsmith@gregsmith.com> writes: > On Wed, 4 Jun 2008, Andreas Pflug wrote: >> IMHO the best compromise in machine and human readability is an XML format. > If the primary PostgreSQL configuration file becomes XML I will quit > working with the project. I'm not kidding. I have no particular use for XML in this scenario either, but really this thread seems to be arguing about mostly-irrelevant details. There is not anything fundamentally broken about keeping configuration in a text file, as is proven by the fact that all those other packages do it. The real problem we need to solve is how to allow newbies to have the system auto-configured to something that more or less solves their problems. Putting the config settings in XML does not accomplish that, and neither does putting them inside the database. It might knock a day or two off the time needed to develop a tool that actually does solve the newbie's problem ... but it's unlikely that the effort of changing Postgres to use some other configuration representation would get repaid through easier tool development. So I think we should stop worrying about the file format and think about these two problems: * Can we present the config options in a more helpful way (this is 99% a documentation problem, not a code problem)? * Can we build a "configuration wizard" to tell newbies what settings they need to tweak? regards, tom lane
On Jun 4, 2008, at 1:57 PM, Tom Lane wrote: > * Can we build a "configuration wizard" to tell newbies what settings > they need to tweak? Probably. Given the demographics of a lot of the newbies is Windows this likely needs to be a pointy-clicky sort of thing if it's going to be widely useful. Doing it in a smart way is likely hard, but even a very simple, crude one would likely be helpful. Doing it as a standalone GUI editor / Wizard wouldn't be too hard, but it sounds more like something that should be integrated in pgadmin, perhaps? Cheers, Steve
If I can add my 2 cents as the end user, IMHO having the configuration file in the xml format is unnecessary and only makes it less user-friendly. Thanks, Michael. -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane Sent: Wednesday, June 04, 2008 4:58 PM To: Greg Smith Cc: Andreas Pflug; Decibel!; Peter Eisentraut; pgsql-hackers@postgresql.org; Josh Berkus Subject: Re: [HACKERS] Overhauling GUCS Greg Smith <gsmith@gregsmith.com> writes: > On Wed, 4 Jun 2008, Andreas Pflug wrote: >> IMHO the best compromise in machine and human readability is an XML format. > If the primary PostgreSQL configuration file becomes XML I will quit > working with the project. I'm not kidding. I have no particular use for XML in this scenario either, but really this thread seems to be arguing about mostly-irrelevant details. There is not anything fundamentally broken about keeping configuration in a text file, as is proven by the fact that all those other packages do it. The real problem we need to solve is how to allow newbies to have the system auto-configured to something that more or less solves their problems. Putting the config settings in XML does not accomplish that, and neither does putting them inside the database. It might knock a day or two off the time needed to develop a tool that actually does solve the newbie's problem ... but it's unlikely that the effort of changing Postgres to use some other configuration representation would get repaid through easier tool development. So I think we should stop worrying about the file format and think about these two problems: * Can we present the config options in a more helpful way (this is 99% a documentation problem, not a code problem)? * Can we build a "configuration wizard" to tell newbies what settings they need to tweak? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Jun 4, 2008, at 13:57, Tom Lane wrote: > So I think we should stop worrying about the file format and think > about > these two problems: > > * Can we present the config options in a more helpful way (this is 99% > a documentation problem, not a code problem)? > > * Can we build a "configuration wizard" to tell newbies what settings > they need to tweak? Amen, Tom. Nicely put. Thanks, David
"Pavel Stehule" wrote: >2008/6/4 David E. Wheeler <david@kineticode.com>: >> >> Exactly. The issue is that application developers, who are not DBAs, have no >> idea how to tune PostgreSQL, and postgresql.conf is daunting and confusing. >> So they use a different database that's "faster". > >do you thing, so any better config can help? It's not possible. And >you can't tune database without well knowledge of applications that >use database. Any automatic tools are joy for child. But some default >PostgreSQL parameters are not optimal. I think it would be an enourmous help for beginners if they had a "simple" tuning tool which would tell them which values where altered (and possibly why) from Postgres' default settings based on some basic information. Like: - machine hardware (disk layout, OS, installed memory, etc.) - application usage (no. of clients, read/write activity, etc) I don't think that such a tool could tune the database perfectly (or even very good), but at least people new to Postgres would know where to start looking for tuning it to their needs. And I am not speaking about end users running an application that uses Postgres. I talk about application developers like me that port their application to use Postgres. Rainer
Tom Lane wrote: > > * Can we present the config options in a more helpful way (this is 99% > a documentation problem, not a code problem)? > > * Can we build a "configuration wizard" to tell newbies what settings > they need to tweak? It's certainly one thing to create an initial postgresql.conf from scratch after some inquiry, but a different level of problems to deal with when offering to change the settings. IMHO initial creation isn't enough, users will feel even more left alone if there are no tools helping them further. I guess most users will start tweaking after the server is already running for a while, with some config already in place. That's when file format and/or APIs come into play. Preserving comments and/or using them in a wizard isn't too easy with the current format. Regards, Andreas
Tom Lane wrote: > > * Can we build a "configuration wizard" to tell newbies what settings > they need to tweak? > > > That would trump all the other suggestions conclusively. Anyone good at expert systems? cheers andrew
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Tom Lane wrote: >> * Can we build a "configuration wizard" to tell newbies what settings >> they need to tweak? > It's certainly one thing to create an initial postgresql.conf from > scratch after some inquiry, but a different level of problems to deal > with when offering to change the settings. IMHO initial creation isn't > enough, users will feel even more left alone if there are no tools > helping them further. I guess most users will start tweaking after the > server is already running for a while, with some config already in place. Indeed, the wizard should be designed to assist with tweaking an existing installation. I see no value at all to trying to run it before/during initdb. > That's when file format and/or APIs come into play. Preserving comments > and/or using them in a wizard isn't too easy with the current format. [ shrug... ] We have already wasted more effort arguing about this issue than it would take to do something good enough for a wizard tool. The great thing about simple text formats is that they're pretty robust. I think all we need to do is comment out any existing setting(s) of the target variable and add the new setting before the first such, perhaps with a comment indicating what added it and when. If the user finds this too ugly, he can tweak it with a text editor. This is even assuming that the tool needs to edit the file itself, rather than just give advice. The advice is the hard part, folks; could we stop obsessing about trivia? regards, tom lane
On Wed, 4 Jun 2008, Tom Lane wrote: > The real problem we need to solve is how to allow newbies to have the > system auto-configured to something that more or less solves their > problems. Putting the config settings in XML does not accomplish that, > and neither does putting them inside the database. The subtle issue here is that what makes sense for the database configuration changes over time; there's not just one initial generation and you're done. postgresql.conf files can end up moving from one machine to another for example. I think something that doesn't recognize that reality and move toward a "tune-up" capability as well as initial generation wouldn't be as useful, and that's where putting the settings inside the database helps so much. Also, there's a certain elegance to having a optimization tool that works again either a new installation or an existing one. I personally have zero interest in a one-shot config generator. It just doesn't solve the problems I see in the field. Performance starts out just fine even with the default settings when people first start, and then goes to hell after the system has been running for a while (and possibly moved to another machine). By that point nobody wants to mess with their configuration file unless it's one simple change at a time. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> * Can we build a "configuration wizard" to tell newbies what settings >> they need to tweak? > That would trump all the other suggestions conclusively. Anyone good at > expert systems? How far could we get with the answers to just three questions: * How many concurrent queries do you expect to have? * How much RAM space are you willing to let Postgres use? * How much "overhead" disk space are you willing to let Postgres use? concurrent queries drives max_connections, obviously, and RAM space would drive shared_buffers and effective_cache_size, and both of them would be needed to size work_mem. The third one is a bit weird but I don't see any other good way to set the checkpoint parameters. If those aren't enough questions, what else must we ask? Or maybe they aren't the right questions at all --- maybe we should ask "is this a dedicated machine or not" and try to extrapolate everything else from what we (hopefully) can find out about the hardware. regards, tom lane
Greg Smith <gsmith@gregsmith.com> writes: > On Wed, 4 Jun 2008, Tom Lane wrote: >> The real problem we need to solve is how to allow newbies to have the >> system auto-configured to something that more or less solves their >> problems. Putting the config settings in XML does not accomplish that, >> and neither does putting them inside the database. > The subtle issue here is that what makes sense for the database > configuration changes over time; there's not just one initial generation > and you're done. postgresql.conf files can end up moving from one machine > to another for example. I think something that doesn't recognize that > reality and move toward a "tune-up" capability as well as initial > generation wouldn't be as useful, As I just mentioned to someone else, I don't see any point in framing it as an "initial generation" problem at all. initdb will already give you settings that work, for some value of "work". The config wizard really only needs to deal with the case of tuning an existing installation. > and that's where putting the settings > inside the database helps so much. How does it help, pray tell? If you mean being able to see what the existing settings are, pg_settings already does that. regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [080604 20:46]: > If those aren't enough questions, what else must we ask? Or maybe they > aren't the right questions at all --- maybe we should ask "is this a > dedicated machine or not" and try to extrapolate everything else from > what we (hopefully) can find out about the hardware. For these three things: 1) max connections/workmem 2) shared buffers/effective cache 3) bgwriter/checkpoint parameters What are the stats or measures that PostgreSQL produces (or that you want it too currently doesn't, but you would like it to produce) that the "masters" (i.e. people who tune PostgreSQL effectively, like you, Greg, Simon, Robert, Peter, Josh, Jim, etc - sorry if I missed others) actually use to decide whether to increase or decrease a value? I tune my postgresql.conf mainly on folklore, and "assimilated understanding" from reading the lists and blogs... But I haven't come across (or rather, haven't come across and remembered/bookmarked) anything that helps someone sample/read any stats or counts to find bottleneck points which to start tuning. Stuff like (remembering that I've never had to really work at tuning because PG has always been "fast enough" for my needs, so take this with a grain of salt) * Are backends always writing out dirty buffers because there are no free ones? This might mean tweaking settings affectingbgwriter. * Are the evicted buffers ones with really high usage counts? This might mean an increase shared buffers would help? * Are we always spilling small amounts of data to disk for sorting? A a small work_mem increase might help... * Are all our reads from disk really quick? This probably means OS pagecache has our whole DB, and means random_page_costcould be tweaked? If we could get a definitive list of things like this, or maybe just comprehensive, or even at least agreed-to-not-be-wrong things to look at, that would go a long way to documentation *how* to tune PG effectively, and could lead to any projects that want to tackle examining a running cluster and suggesting some config changes... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Jun 4, 2008, at 5:23 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Tom Lane wrote: >>> * Can we build a "configuration wizard" to tell newbies what >>> settings >>> they need to tweak? > >> That would trump all the other suggestions conclusively. Anyone >> good at >> expert systems? > > How far could we get with the answers to just three questions: > > * How many concurrent queries do you expect to have? > > * How much RAM space are you willing to let Postgres use? > > * How much "overhead" disk space are you willing to let Postgres use? > > concurrent queries drives max_connections, obviously, and RAM space > would drive shared_buffers and effective_cache_size, and both of them > would be needed to size work_mem. The third one is a bit weird but > I don't see any other good way to set the checkpoint parameters. > > If those aren't enough questions, what else must we ask? Or maybe > they > aren't the right questions at all --- maybe we should ask "is this a > dedicated machine or not" and try to extrapolate everything else from > what we (hopefully) can find out about the hardware. I'd be interested in putting together a framework+GUI client to do this cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare minimum) sort of way, if no-one else already has such a thing. A framework doesn't get you all the way there, but it makes it a whole lot easier to work on what base data and information you need, and how easy it is to map pgsql-performance and #postgresql gut feel onto something more algorithmic. Cheers, Steve
On Wed, 4 Jun 2008, Andrew Dunstan wrote: > Tom Lane wrote: >> * Can we build a "configuration wizard" to tell newbies what settings >> they need to tweak? > > That would trump all the other suggestions conclusively. Anyone good at > expert systems? Sigh. I guess we need to start over again. Last year around this time, there was one of the recurring retreads of this topic named "PostgreSQL Configuration Tool for Dummies": http://archives.postgresql.org/pgsql-performance/2007-06/msg00386.php Josh Berkus pointed out that he already had the "expert system" part of this problem solved pretty well with a spreadsheet: http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc (that's in the OpenOffice Calc format if you don't know the extension) That particular spreadsheet has more useful tuning suggestions in this area than 99.9% of PostgreSQL users have or will ever know. You can nitpick the exact recommendations, but the actual logic and thinking involved is pretty well solved. It could use a touch of tweaking and modernization but it's not too far off from being as good as you're likely to get at making guesses without asking the user too many questions. There is one ugly technical issue, that you can't increase shared_buffers usefully in many situations because of SHMMAX restrictions, and that issue will haunt any attempt to be completely automatic. Where Josh got hung up, where I got hung up, where Lance Campbell stopped at with his Dummies tool, and what some unknown number of other people have been twarted by, is that taking that knowledge and turning it into a tool useful to users is surprisingly difficult. The reason for that is the current postgresql.conf file and how it maps internally to GUC information isn't particularly well suited to automated generation, analysis, or updates. I think Josh got lost somewhere in the parsing the file stage. The parts I personally got stuck on were distinguishing user-added comments from ones the system put in, plus being completely dissatisfied with how lossy the internal GUC process was (I would like a lot more information out of pg_settings than are currently there). Lance's helper tool was hobbled by the limitations of being a simple web application. That's the background to Josh's proposal. It has about an 80% overlap with what I was working on suggesting, which is why I jumped on his bandwagon so fast. The outline at http://wiki.postgresql.org/wiki/GUCS_Overhaul includes the superset of our respective thinking on the first step here toward straightening out this mess, further expanded with observations made in this thread. I would respectively point out that comments about the actual tuning itself have no bearing whatsoever on this proposal. This is trying to nail down all the features needed to support both doing an initial generation and subsequent incremental improvements to the postgresql.conf file, while also reducing some redundancy in the code itself. Reducing the scope to only handling initial generation would make this a smaller task. But it happens to fall out that the work required to cut down on the redundancy and that required to better support incremental updates as well happen to be almost the same. Josh's stated agenda is to get this right in one swoop, with only one version worth of disruption to the format, and that goal is served better IMHO as well by addressing all these changes as one batch. I will attempt to resist further outbursts about non-productive comments here, and each time I am tempted instead work on prototyping the necessary code I think this really needs instead. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jun 4, 2008, at 6:28 PM, Greg Smith wrote: > > > Josh Berkus pointed out that he already had the "expert system" part > of this problem solved pretty well with a spreadsheet: > > http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc > (that's in the OpenOffice Calc format if you don't know the extension) On Jun 4, 2008, at 6:20 PM, Steve Atkins wrote: > I'd be interested in putting together a framework+GUI client to do > this > cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare > minimum) sort of way, if no-one else already has such a thing. /me makes go together motions, if nobody has any objection Cheers, Steve
On Wed, 4 Jun 2008, Aidan Van Dyk wrote: > * Are backends always writing out dirty buffers because there are no free > ones? This might mean tweaking settings affecting bgwriter. What you mean on the first one is "are backends always writing out dirty buffers becuase there are no *clean* ones"; the server operates with no *free* buffers as standard operations. Figuring that out is now easy in 8.3 with the pg_stat_bgwriter view. > * Are the evicted buffers ones with really high usage counts? This > might mean an increase shared buffers would help? Evicted buffers must have a 0 usage count. The correct question to ask is "are buffers never getting high usage counts because they keep getting evicted too fast?". You can look at that in 8.3 using pg_buffercache, I've got suggested queries as part of my buffer cache presentation at http://www.westnet.com/~gsmith/content/postgresql/ > * Are we always spilling small amounts of data to disk for sorting? A > a small work_mem increase might help... I was just talking to someone today about building a monitoring tool for this. Not having a clear way to recommend people monitor use of work_mem and its brother spilled to disk sorts is an issue right now, I'll whack that one myself if someone doesn't beat me to it before I get time. > * Are all our reads from disk really quick? This probably means OS > pagecache has our whole DB, and means random_page_cost could be > tweaked? This is hard to do with low overhead in an OS-independant way. The best solution available now would use dtrace to try and nail it down. There's movement in this area (systemtap for Linux, recent discussion at the PGCon Developer Meeting of possibly needing more platform-specific code) but it's not quite there yet. So everything you mentioned is either recently added/documented or being actively worked on somewhere, and the first two were things I worked on myself after noticing they were missing. Believe me, I feel the items that still aren't there, but they're moving along at their own pace. There's already more tuning knowledge available than tools to help apply that knowledge to other people's systems, which is why I think a diversion to focus just on that part is so necessary. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
* Greg Smith <gsmith@gregsmith.com> [080604 22:14]: > So everything you mentioned is either recently added/documented or being > actively worked on somewhere, and the first two were things I worked on > myself after noticing they were missing. Believe me, I feel the items > that still aren't there, but they're moving along at their own pace. > There's already more tuning knowledge available than tools to help apply > that knowledge to other people's systems, which is why I think a diversion > to focus just on that part is so necessary. But as an administrator/developer, I don't understand the focus on a new API for "writing my config" for me... I'ld love a tool that helped me "analyze" my current running PG database (and yes, that includes getting *current* settings), and "suggest" config changes, ideally in order that the tool thinks will make a difference... I can make that change, and distribute it. That's the easy part. If I really trust the tool then I'll just blindly run it (depending on it's output format):$TOOL > $PGDATA/postgresql.conf in which case, I don't care if it groked any of my previous comments and cruft. Otherwise, I'll look at it, and integrate some (or all) of the changes into postgresql.conf using my preferred method of commenting/SCM/quirks. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Wed, 4 Jun 2008, Aidan Van Dyk wrote: > I'd love a tool that helped me "analyze" my current running PG database > (and yes, that includes getting *current* settings), and "suggest" > config changes Fine. To demonstrate why the overhaul is needed, let's start designing a simple tool whose sole purpose in life is to suggest new settings for shared_buffers and work_mem. Say we want to show people their current setting and what we'd recommend instead. I just created a new cluster on my laptop. It has the following in the postgresql.conf: shared_buffers = 32MB # min 128kB or max_connections*16kB # (changerequires restart) #work_mem = 1MB # min 64kB Say I first start designing such a tool by trying to read the postgresql.conf file to figure out what these values are set to. In order to accomplish that, I need to parse the whole file correctly, doing things like turning "32MB" into the actual numeric value so my program can make decisions based on its value[1]. This basically requires someone writing a tuning tool replicate the GUC parsing code, which is crazy; at this point you've already lost most potential tool authors. But you're smarter than that; instead you use pg_settings: psql=# select name,setting,unit from pg_settings where name='shared_buffers' or name='work_mem'; name | setting| unit ----------------+---------+------ shared_buffers | 4096 | 8kB work_mem | 1024 | kB Now: what do you tell the user their current value is? The way the postgresql.conf is parsed into memory is lossy; at this point you don't know anymore what units where specified in the original file. If someone sees their current setting shown as "4096" but they know they set it to "32MB", they'll end up confused unless they understand the whole page size concept--and if they understood that, they'd automatically be disqualified from being the type of user our theoretical tool is targeted toward. If you try and make things more user-friendly by always showing the most human readable version, what about the person who ended up setting this parameter because they copied an old config file that recommended setting it to 4096. You show it to them as "32MB"; they'll also be confused and blame the tool for being bad. And even if you work through all that, to give good advice here you have to know things like that shared_buffers is a server parameter requiring restart, while work_mem is a per-session parameter. Right now, the only way to know all that is for tool authors to assemble their own database and keep it up to date with each release. And you just lost another set of potential authors with that realization. Next up, we manage to work through all those issues, and someone happily follows our advice and gets their file updated with a much larger value for work_mem. Yeah, we are heroes! Or not. You see, in the config file we just helpful updated for them was this comment just above that setting: # OMG don't set this too high or the Linux OOM killer will # take down the server! (This is not a theoretical example; I have found variations on that text in two postgresql.conf files and I vaguely recall Josh mentioned running into it as well). And now you just *crashed their server* by giving bad advice that was clearly against the invaluable comment history in the file already. Oh, but where are those comments located at? Before the setting? After the setting? In a group at the top? Who can say? Since there's no strong standard, people put them all over the place. I don't know about you, but I'm too busy to spend a bunch of time writing a tool to work around all these issues knowing it is always going to be fragile, incomplete, and need updating with every GUC change no matter what. Instead, reconsider http://wiki.postgresql.org/wiki/GUCS_Overhaul , having gotten a taste of the motivation behind those changes, and consider how this would play out after those improvements. The approach where you connect to the database and query is the easy path. There is never a reason to consider parsing postgresql.conf. Anybody who knows how to write a simple script that connects to a database and reads a table (basically anyone who's written the database client equivilent of "hello, world") can feel like a potential tool author. All the information about the postgresql.conf side of every setting is retained in case you want to show where they came from, or to generate a new file that's as similar as possible to the original. Any recommendations you suggest can be trivially annotated with whether you need to consider max_connections because it's per-session, and whether people need to restart the server or can just send it a signal, and that will continue to be the case in the future with minimal work on the tool author's part. The defaults are now available, so that it's easy to figure out what people changed. That is sometimes handy to include as part of this sort of analysis, and it's necessary to provide improvements like a "strip the unnecessary junk out of this file" that many people would like from this sort of tool. When you show people that you recommend increasing a value to something larger, any comments about that setting will be shown and they'll know not to follow the tool's advice if there's a history there. This seems like such a better place to be that I'd rather drive toward the server-side changes necessary to support it rather than fight the difficult tool creation problems. That's why the "focus on a new API for 'writing my config' for me"; that particular goal is just one part of a set of revisions that streamline the tool creation process in a not necessarily obvious way. Unless, of course, you've tried to write a full-circle config tuning tool, in which case most of the proposed changes in this overhaul jump right out at you. [1] In the shared_buffers case, it may be possible to just recommend a value without caring one bit what the current one is. But for work_mem, you really need to actually understand the value if you want any real intelligence that combines that information with the maximum connections, so that you can compute how much memory is left over for things like effective_cache_size. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom Lane wrote: > * How much "overhead" disk space are you willing to let Postgres use? > > ... The third one is a bit weird but > I don't see any other good way to set the checkpoint parameters. The way I think about the checkpoint settings is: 1. Set checkpoint_timeout to the max. time you're willing to spend in recovery in case of crash or power loss. 2. Set checkpoint_segments to a high value. "High" meaning high enough that you'll never reach it in practice. The purpose is just to keep you from running out of disk space if something weird happens. The amount of downtime one is willing to accept in case of power loss is a good question to ask because it doesn't require any knowledge of how PostgreSQL works; it can be answered directly from the application requirements. And if the DBA/developer don't know the answer, he needs to figure it out, because it's a very important question not only for the database but in general. I believe checkpoint_timeout correlates quite well with the max. time required in recovery. If it took 10 minutes to generate X amount of WAL, replaying that WAL will need to do at most the same amount of I/O, which should take roughly the same amount of time, regardless of whether the I/O was sequential or random. If the system wasn't busy doing updates during between the checkpoints, it will of course take less. As with all settings, the tool will need to explain the tradeoff. Smaller checkpoint_timeout means more checkpointing which means more I/O which means less average TPS and more WAL generated (because of full_page_writes). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Guys, A configuration wizard would be nice, but it would be a good start to add a section to the manual on how to do the basic tuning. AFAICS we don't have one. Clear instructions on how to set the few most important settings like shared_buffers and checkpoint_timeout/segments would probably be enough, with a link to the main configuration section that explains the rest of the settings. If people don't read the manual, we can add a link to it from postgresql.conf.sample, add a screen to the Windows installer suggesting to read it, or even open postgresql.conf in Notepad. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 5 Jun 2008, Heikki Linnakangas wrote: > A configuration wizard would be nice, but it would be a good start to add a > section to the manual on how to do the basic tuning. AFAICS we don't have > one. Clear instructions on how to set the few most important settings like > shared_buffers and checkpoint_timeout/segments would probably be enough, with > a link to the main configuration section that explains the rest of the > settings. It hasn't gelled yet but I'm working on that. Most of the text needed is now linked to at http://wiki.postgresql.org/wiki/Performance_Optimization I already talked with Chris Browne about merging his document I put first in that list with useful pieces from some of mine into one more comprehensive document on the Wiki, covering everything you mention here. If we took a snapshot of that when it's done and dumped that into the manual, I don't think that would be a problem to wrap up before 8.4 is done. I'd like to include a link to the above performance page in that section of the manual as well, both so that people are more likely to find fresh content as well as to give them pointers toward more resources than the manual can possibly cover. > If people don't read the manual, we can add a link to it from > postgresql.conf.sample, add a screen to the Windows installer suggesting > to read it, or even open postgresql.conf in Notepad. They don't. Putting pointers toward a relatively simple performance tuning document a bit more in people's faces might help lower some of the criticism the project takes over providing low defaults for so many things. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, Jun 05, 2008 at 01:23:53AM +0200, Rainer Bauer wrote: > I think it would be an enourmous help for beginners if they had a "simple" > tuning tool which would tell them which values where altered (and possibly > why) from Postgres' default settings based on some basic information. > Like: > - machine hardware (disk layout, OS, installed memory, etc.) > - application usage (no. of clients, read/write activity, etc) It would be possible to make a program that worked like: # pg_autotune Detected 4GB memory, 2.3GB free Measured random_page_cost=2.3 Select expected usage: (d)edicated, (n)ormal, (t)iny > t Shared_buffers 64MB Configuration stored to /etc/postgresql/8.3/postgresql.conf # This would probably solve most issues. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > A configuration wizard would be nice, but it would be a good start to > add a section to the manual on how to do the basic tuning. +1. If we can't write an explanation of what to do, we certainly aren't going to be able to implement it in a wizard. Agreeing on what to put in the manual would also go a long way towards providing an implementation spec for the wizard ... regards, tom lane
Greg Smith wrote: > Where Josh got hung up, where I got hung up, where Lance Campbell stopped > at with his Dummies tool, and what some unknown number of other people > have been twarted by, is that taking that knowledge and turning it into a > tool useful to users is surprisingly difficult. The reason for that is > the current postgresql.conf file and how it maps internally to GUC > information isn't particularly well suited to automated generation, > analysis, or updates. What I think this says is that we should be pushing Magnus more to continue work on the configuration API thing he was designing. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Greg Smith wrote: > >> Where Josh got hung up, where I got hung up, where Lance Campbell stopped >> at with his Dummies tool, and what some unknown number of other people >> have been twarted by, is that taking that knowledge and turning it into a >> tool useful to users is surprisingly difficult. The reason for that is >> the current postgresql.conf file and how it maps internally to GUC >> information isn't particularly well suited to automated generation, >> analysis, or updates. > > What I think this says is that we should be pushing Magnus more to > continue work on the configuration API thing he was designing. There's some magic in pgAdmin to parse and write the file. You could take a look a that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Alvaro Herrera wrote: >> What I think this says is that we should be pushing Magnus more to >> continue work on the configuration API thing he was designing. > > There's some magic in pgAdmin to parse and write the file. You could > take a look a that. That's what they want to get rid of. It's a problem that every GUI writer needs to reimplement that functionality; ISTM this is something worth having in the backend. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Heikki Linnakangas wrote: > Alvaro Herrera wrote: >> Greg Smith wrote: >> >>> Where Josh got hung up, where I got hung up, where Lance Campbell >>> stopped at with his Dummies tool, and what some unknown number of >>> other people have been twarted by, is that taking that knowledge and >>> turning it into a tool useful to users is surprisingly difficult. >>> The reason for that is the current postgresql.conf file and how it >>> maps internally to GUC information isn't particularly well suited to >>> automated generation, analysis, or updates. >> >> What I think this says is that we should be pushing Magnus more to >> continue work on the configuration API thing he was designing. > > There's some magic in pgAdmin to parse and write the file. You could > take a look a that. Not really. You don't want to go there. We really need a "proper API" for it, and the stuff in pgAdmin isn't even enough to base one on. //Magnus
On Thu, 5 Jun 2008, Magnus Hagander wrote: > We really need a "proper API" for it, and the stuff in pgAdmin isn't > even enough to base one on. I would be curious to hear your opinion on whether the GUC overhaul discussed in this thread is a useful precursor to building such a proper API. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > On Thu, 5 Jun 2008, Magnus Hagander wrote: > >> We really need a "proper API" for it, and the stuff in pgAdmin isn't >> even enough to base one on. > > I would be curious to hear your opinion on whether the GUC overhaul > discussed in this thread is a useful precursor to building such a > proper API. Since I'm the guy who initially wrote that config file editing stuff, I feel somehow addressed. The answer is a clear ABSOLUTELY. - The current implementation is able to edit the file directly or through pgsql functions; any format change will affect that function immediately. - If documentation is enhanced by adding more comments in the postgresql.conf file, this won't help the editor because it can't rely on it to present help and hints to the user. It needs the comments/help in pg_settings or alike. Regards, Andreas
Greg Smith wrote: > On Thu, 5 Jun 2008, Magnus Hagander wrote: > >> We really need a "proper API" for it, and the stuff in pgAdmin isn't >> even enough to base one on. > > I would be curious to hear your opinion on whether the GUC overhaul > discussed in this thread is a useful precursor to building such a proper > API. I must say that I am confused by this thread. What's the discussed GUC overhaul? Things that I vaguely recall being proposed are (in the order they came to mind): (1) Add a lot more comments to each setting (2) Add documentation links to each setting (3) Move more frequently used settings to the top of the file (4) Ship different sample config files (5) Create an expert system to suggest tuning (6) Other random ideas (XML, settings in database, others?) To me, there are two ideas that are doable right now, which are (2) and (4). (1) seems to be a step backwards in pg_hba.conf experience, and we would have to maintain duplicate documentation. (3) seems messy. (5) is a lot of work; do we have volunteers? As for (6), the two examples I give can be easily dismissed. (2) and (4) do not seem necessary to get the config API built. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Steve Atkins wrote: > > I'd be interested in putting together a framework+GUI client to do this > cleanly in a cross-platform (Windows, Linux, Solaris, OS X as a bare > minimum) sort of way, if no-one else already has such a thing. > This is a great idea, and I was thinking along the same line. The framework can provide generic interfaces for the GUI/Web client, and leave it up to the OS to provide the needed data. -Robert
Tom Lane wrote: > This is even assuming that the tool needs to edit the file itself, > rather than just give advice. The advice is the hard part, folks; > could we stop obsessing about trivia? +1. IMHO, the tool doesn't need to worry about generating a prettied version of postgresql.conf. It should just inform the user about the appropriate settings or create a postgresql.conf.recommend and have the user update postgresql.conf himself. -Robert
On Thu, 5 Jun 2008, Alvaro Herrera wrote: > I must say that I am confused by this thread. What's the discussed GUC > overhaul? http://wiki.postgresql.org/wiki/GUCS_Overhaul I drop that URL in every other message in hopes that people might start commenting on it directly if they see it enough; the fact that you're confused says I may need to keep that up :( > (1) Add a lot more comments to each setting > (2) Add documentation links to each setting > (3) Move more frequently used settings to the top of the file > (4) Ship different sample config files > (5) Create an expert system to suggest tuning > (6) Other random ideas (XML, settings in database, others?) > > To me, there are two ideas that are doable right now, which are (2) and > (4). (1) seems to be a step backwards in pg_hba.conf experience, and we > would have to maintain duplicate documentation. (3) seems messy. (5) > is a lot of work; do we have volunteers? As for (6), the two examples I > give can be easily dismissed. > (2) and (4) do not seem necessary to get the config API built. (1) is in that proposal but is strictly optional as something to put in the configuration file itself. The idea behind (2) is to enable tool authors to have an easier way to suggest where to head for more information. I'd like for it to be trivial for a tool to say "Suggested value for <x> is <y>; see http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html for more information". I know what most of the settings I tinker with do, but even I'd like it to be easier to find the right spot in the manual; for newbies it's vital. You are correct that (2) isn't strictly necessary here, but it's valuable and will be easier to wrap into this than to bolt on later. (3) (4) (5) and (6) were off-topic diversions. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom Lane wrote: > If those aren't enough questions, what else must we ask? Or maybe they > aren't the right questions at all --- maybe we should ask "is this a > dedicated machine or not" and try to extrapolate everything else from > what we (hopefully) can find out about the hardware. > I think probably a combination of high and low-level questions and make the low-level (more specific) questions optional since some users may not be able to provide low-level info. Here's a rough idea of how I envision this tool should work. $ pg_config_wizard Is this machine dedicated to Postgres? (y/n) n (now tool auto-discovers available HW resources) Your system has 32GB memory. What percentage do you want to allocate to Postgres? (1=50%, 2=33%, 3=25%, etc) 1 What type of workload? (OLTP, DSS, etc) ... At the end, the tool runs for a while checking to see if certain thresholds are reached to determine which parameters need to be increased. The tool would change the parameters causing the bottlenecks, rerun Postgres/workload, and iterate a few times until the results are satfisfactory. Write the recommended settings to postgresql.conf.recommend and let the user update postgresql.conf himself or whatever. I update my postgresql.conf, rerun the app and see 100% increased in throughput :-) -Robert
Tom Lane wrote: > How far could we get with the answers to just three questions: > > * How many concurrent queries do you expect to have? > * How much RAM space are you willing to let Postgres use? > * How much "overhead" disk space are you willing to let Postgres use? +1 to this approach - these are the kinds of questions that make sense to me when first setting up a new installation. They sound useful for both large servers and tiny (salesguy laptop for demos) installations. > If those aren't enough questions, what else must we ask? * Perhaps something to guess FSM settings? I think FSM is tunable I most often get wrong with more painful consequences(bloat) than other tunables. My approach is to have cron run database-wide vacuums even on systems with autovacuumjust to see the log messages about FSM. * Something to tune vacuum delay? Perhaps: How much I/O bandwidth can be dedicated to Postgres background activities?
Steve Atkins wrote: > ... cross-platform (Windows, Linux, Solaris, OS X as a bare > minimum) I wonder how cross-platform the tuning algorithm itself is. I could also imagine that decisions like "do I let the OS page cache, or postgres's buffer cache get most of the memory" are extremely OS dependent. Also, the configuration tool would be even more useful if it could offer extra platform-specific advice like "hey, I see you're using this filesystem. You should be using this journaling option for WAL and this other one for data", or "on your system you should be using this fsync method", or "use relatime or noatime when mounting your disk".
* Greg Smith <gsmith@gregsmith.com> [080605 15:17]: > On Thu, 5 Jun 2008, Alvaro Herrera wrote: > > >I must say that I am confused by this thread. What's the discussed GUC > >overhaul? > > http://wiki.postgresql.org/wiki/GUCS_Overhaul > > I drop that URL in every other message in hopes that people might start > commenting on it directly if they see it enough; the fact that you're > confused says I may need to keep that up :( I've read it. A couple times now. And I like lots of it. > >(1) Add a lot more comments to each setting > >(2) Add documentation links to each setting > >(3) Move more frequently used settings to the top of the file > >(4) Ship different sample config files > >(5) Create an expert system to suggest tuning > >(6) Other random ideas (XML, settings in database, others?) > (3) (4) (5) and (6) were off-topic diversions. But, right from the above mentioned page: *Goals* By shifting from a model where postgresql.conf is document-formatted and hand-edited to one where it's machine generated,it becomes vastly easier to write simple utilities to manage these settings. Right now, the big "obstacle" to thingslike SET PERSISTENT is "how to we preserve the hand-edited comments in the file" -- and the answer is we don't. This little goal leads to: * By having a generated postgresql.conf and an easy way to generate it, writing autoconfiguration scripts (as well asshortcuts like SET PERSISTENT) become vastly easier. And later: There needs to be a way to modify the underlying settings and save that into a new machine-generated postgresql.conf file.Is implementing SET PERSISTENT sufficient for that? I think that these parts, seemingly "snuck into" the GUC overhaul proposal is what people like me a wary of. People like me don't want to have postgresql.conf be *only* a machine-generated file, which I am not allowed to edit anymore because next DBA doing a "SET PERSISTANT" type of command is going to cause postgres to write out something else, over-writing my carefully documented reason for some particular setting. But the big issue I have (not that it really matters, because I'm not one of the ones working on it, so I please don't take this as me telling anyone what they can or can't do) is that that goal doesn't solve any of the listed problems stated in the proposal: 1. Most people have no idea how to set these. 2. The current postgresql.conf file is a huge mess of 194 options, the vast majority of which most users will nevertouch. 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, and settings.sgml), which are only synched witheach other manually. 4. We don't seem to be getting any closer to autotuning. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Thu, 5 Jun 2008, Aidan Van Dyk wrote: > People like me don't want to have postgresql.conf be *only* a > machine-generated file, which I am not allowed to edit anymore because > next DBA doing a "SET PERSISTANT" type of command is going to cause > postgres to write out something else, over-writing my carefully > documented reason for some particular setting. This is why there's the emphasis on preserving comments as they pass into the GUC structure and back to an output file. This is one of the implementation details I haven't fully made up my mind on: how to clearly label user comments in the postgresql.conf to distinguish them from verbose ones added to the file. I have no intention of letting manual user edits go away; what I'm trying to do here (and this part is much more me than Josh) is make them more uniform such that they can co-exist with machine edits without either stomping on the other. Right now doing that is difficult, because it's impossible to tell the default comments from the ones the users added and the current comment structure bleeds onto the same lines as the settings. > But the big issue I have (not that it really matters, because I'm not > one of the ones working on it, so I please don't take this as me telling > anyone what they can or can't do) is that that goal doesn't solve any of > the listed problems stated in the proposal > 1. Most people have no idea how to set these. Making it much easier to build recommendation tools is how this helps here. > 2. The current postgresql.conf file is a huge mess of 194 options, > the vast majority of which most users will never touch. The proposed pg_generate_conf tool includes options to spit out a basic configuration file instead of the complete one. > 3. GUCS lists are kept in 3 different places (guc.c, postgresql.conf, > and settings.sgml), which are only synched with each other manually. The proposal throws away having a separate postgresql.conf file, so that reduces it from 3 places to 2. That's moving in the right direction > 4. We don't seem to be getting any closer to autotuning. If you try to build a tuning tool, these areas end up being the unnecessarily hard parts. Thanks for the comments on the proposal. I'm only bothering to respond to messages like yours now, am deleting all of the continuing attemps to divert the discussion over to parameter tuning details or expanding the scope here. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jun 5, 2008, at 14:47, Greg Smith wrote: > This is why there's the emphasis on preserving comments as they pass > into the GUC structure and back to an output file. This is one of > the implementation details I haven't fully made up my mind on: how > to clearly label user comments in the postgresql.conf to distinguish > them from verbose ones added to the file. I have no intention of > letting manual user edits go away; what I'm trying to do here (and > this part is much more me than Josh) is make them more uniform such > that they can co-exist with machine edits without either stomping on > the other. Right now doing that is difficult, because it's > impossible to tell the default comments from the ones the users > added and the current comment structure bleeds onto the same lines > as the settings. How about a simple rule, such as that machine-generated comments start with "##", while user comments start with just "#"? I think that I've seen such a rule used before. At any rate, I think that, unless you have some sort of line marker for machine-generated comments, there will be no way to tell them apart from user comments. Other possibilities for machine-comments: ## Machine comment ### Machine comment #! Machine comment #@ Machine comment #$ Machine comment #^ Machine comment # Machine comment I actually kinda like "#!". It's distinctive and unlikely to appear in a user comment. Anyway, just food for thought. Best, David
David E. Wheeler wrote: > How about a simple rule, such as that machine-generated comments start > with "##", while user comments start with just "#"? I think that I've > seen such a rule used before. At any rate, I think that, unless you have > some sort of line marker for machine-generated comments, there will be no > way to tell them apart from user comments. FWIW smb.conf uses ; for one purpose and # for the other. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, 5 Jun 2008, Alvaro Herrera wrote: > FWIW smb.conf uses ; for one purpose and # for the other. They're actually combining the way UNIX files use # with how Windows INI files use ; in a config file context, which I personally find a little weird. I was already considering keeping user comments as # while making all system-inserted ones #! ; many people are already used to #! having a special system-related meaning from its use in UNIX shell scripting which makes it easier to remember. I think the next step to this whole plan is to generate a next-gen postgresql.conf mock-up showing what each of the outputs from the pg_generate_conf tool might look like to get feedback on that; it will make what is planned here a bit easier to understand as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jun 5, 2008, at 17:53, Greg Smith wrote: > I was already considering keeping user comments as # while making > all system-inserted ones #! ; many people are already used to #! > having a special system-related meaning from its use in UNIX shell > scripting which makes it easier to remember. Oooh, yeah. I hadn't even thought of that! I was just looking at characters on my keyboard and typing them in to see which ones I thought were most distinctive. This may be part of the reason I thought that #! was distinctive. :-) Best, David
David E. Wheeler wrote: > On Jun 5, 2008, at 14:47, Greg Smith wrote: > >> This is why there's the emphasis on preserving comments as they pass >> into the GUC structure and back to an output file. This is one of the >> implementation details I haven't fully made up my mind on: how to >> clearly label user comments in the postgresql.conf to distinguish them >> from verbose ones added to the file. I have no intention of letting >> manual user edits go away; what I'm trying to do here (and this part >> is much more me than Josh) is make them more uniform such that they >> can co-exist with machine edits without either stomping on the other. >> Right now doing that is difficult, because it's impossible to tell the >> default comments from the ones the users added and the current comment >> structure bleeds onto the same lines as the settings. > > How about a simple rule, such as that machine-generated comments start > with "##", while user comments start with just "#"? I think that I've > seen such a rule used before. At any rate, I think that, unless you have > some sort of line marker for machine-generated comments, there will be > no way to tell them apart from user comments. What comments do we consider machine-generated? Just the ones used to comment out settings, like #shared_buffers = 32MB or something else? If the automatic tool lets alone all other kind of comments, I think we're fine. In fact, it wouldn't necessarily need to modify those comments either, it could simply add a new setting line below that: #shared_buffers = 32MB shared_buffers = 1024MB For extra safety, it could comment out old settings, perhaps with something like this: #shared_buffers = 32MB #shared_buffers = 1024MB # commented out by wizard on 2008-06-05 shared_buffers = 2048MB This would preserve a full change history in the file. It would become quite messy after a lo of changes, of course, but a user can trim the history by hand if he wants to. Or perhaps we should explicitly mark the settings the tool has generated, and comment out: #shared_buffers = 32MB # commented out by wizard on 2008-06-05 shared_buffers = 1024MB # automatically set by wizard on 2008-06-05 That way the tool could safely replace automatically set settings, without replacing manually set ones without leaving a clear trace of what happened. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
David E. Wheeler wrote: > > How about a simple rule, such as that machine-generated comments start > with "##", while user comments start with just "#"? I think that I've > seen such a rule used before. At any rate, I think that, unless you > have some sort of line marker for machine-generated comments, there > will be no way to tell them apart from user comments. Two heretical questions: Do we need user generated comments at all? I can't remember ever having used any comment in postgresql.conf. Why do so many people here insist on editing postgresql.conf as primary means of changing config params? Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as sed'ing postgresql.conf or doing it manually? Looking around for different approaches, network appliances come to my mind, e.g. Cisco routers and PIX. You have 3 ways to configure a pix: - use a command line (using ssh or telnet, eqivalent to psql); WRITE MEMORY to make the changes survive a reboot. - use a web interface (or similar tool) - use tftp to up/download the complete config in and out, editing the file. User comments will be lost, with the exception of those that have been applied with special comment commands (equivalent to "comment on"). Regards, Andreas
"Andreas Pflug" <pgadmin@pse-consulting.de> writes: > Why do so many people here insist on editing postgresql.conf as primary means > of changing config params? > Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as sed'ing > postgresql.conf or doing it manually? no, it's awful. > Looking around for different approaches, network appliances come to my mind, > e.g. Cisco routers and PIX. You have 3 ways to configure a pix: > - use a command line (using ssh or telnet, eqivalent to psql); WRITE MEMORY to > make the changes survive a reboot. > - use a web interface (or similar tool) > - use tftp to up/download the complete config in and out, editing the file. > User comments will be lost, with the exception of those that have been applied > with special comment commands (equivalent to "comment on"). And in every major installation I've seen people use the last option. They treat the original text file which is kept elsewhere -- normally checked into some revision control system, tracked and managed like source code -- as the canonical and authoritative version. The fact that you have to go through an especially complicated and awkward process to load it onto the switches is just a technical detail to be worked around. And the fact that somebody could have altered the configuration directly on the switch and forgotten to write it to the configuration file is a danger to be worked around with policy, not a benefit. Essentially what good cisco network admins are doing here is working around the awkward interface using policy and replacing it with a static text configuration file kept elsewhere. Experience with other attempts at automatic tools to edit configuration files like webmin etc are that editing text files which are also under user control is just a terrible idea. You can make it almost work but it's always fragile and can't keep up with the flexibility of free-form text and the creativity of users. What I would suggest is going back to the plan when we added include files. Have an automatically generated config file which is freshly re-generated each time and doesn't try to preserve any user-formatting. Users then include that file at the top of postgresql.conf and override any of the settings by setting them in postgresql.conf after the include. The important thing here is to separate files which are "under postgres control" and those "under user control". That doesn't necessarily mean we have to ban users from touching the automatically generated config file or switch formats, but it relieves us of any responsibility for maintaining free-form text. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Gregory Stark wrote: > "Andreas Pflug" <pgadmin@pse-consulting.de> writes: > > >> Why do so many people here insist on editing postgresql.conf as primary means >> of changing config params? >> Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as sed'ing >> postgresql.conf or doing it manually? >> > > no, it's awful. > > So I wonder why you accept it when configuring schemas. What's the big difference between setting a config param, and creating a table? And ultimately, the config param file format may well look like an SQL command file, restricted to SET only. > And in every major installation I've seen people use the last option. They > treat the original text file which is kept elsewhere -- normally checked into > some revision control system, tracked and managed like source code -- as the > canonical and authoritative version. > That's how you'd have to manage the schema sources too, no? Your comments are lost as well after schema creation scripts are executed, and manual changes may interfere with that. Regards, Andreas
"Andreas Pflug" <pgadmin@pse-consulting.de> writes: > Gregory Stark wrote: >> "Andreas Pflug" <pgadmin@pse-consulting.de> writes: >> >>> Why do so many people here insist on editing postgresql.conf as primary means >>> of changing config params? >>> Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as sed'ing >>> postgresql.conf or doing it manually? >> >> no, it's awful. > > So I wonder why you accept it when configuring schemas. What's the big > difference between setting a config param, and creating a table? Yes, this is a big impedence mismatch between the old-school DBMS (with emphasis on the MS part of the acronym) architecture and more modern approaches. You don't configure your web site with "ALTER PAGE home.html SET FORM contactus INPUT TYPE SUBMIT" either... This is partly out of necessity though as those database objects contain *data*. So merely replacing them with new data objects doesn't give the system enough information to understand what to do with that data. The DDL commands which modify the schema give that kind of data mutating instruction. And I'll note that DBAs go to *great* lengths to do exactly the same kind of thing I described the CISCO DBAs doing. Usually that means doing an ALTER command and separately editing a creation script and trying to keep the two in sync. It's a huge burden. > And ultimately, the config param file format may well look like an SQL command > file, restricted to SET only. So all you have is our existing file except with an additional layer of quoting to deal with, a useless SET keyword to annoy users, and a file that you need a bison parser to deal instead of a simple keyword-value syntax that sed can manipulate. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark wrote: > So all you have is our existing file except with an additional layer of > quoting to deal with, a useless SET keyword to annoy users, and a file that > you need a bison parser Don't you think that's a little over the top, throwing bison at the simple task to extend postgresql.conf scanning so it accepts --, /**/ and SET? > to deal instead of a simple keyword-value syntax that > sed can manipulate. > > sed seems to be THE killer application... Apart from the fact that sed easily could identify SET, my answer would be to use psql to modify the config, not sed. Text config files are NOT friendly for beginner and mediocre users. IMHO the current restriction on GUC changes is a major obstacle towards pgsql tuning tools, e.g. written as a Google SoC project. Graphic tools aren't too popular at pgsql-hackers, but please contemplate a little how much pgadmin may have contributed to the pgsql usage boost, esp. on windows. Regards, Andreas
Am Mittwoch, 4. Juni 2008 schrieb Aidan Van Dyk: > > When reading this thread, I'm wondering if anybody ever saw a config > > file for a complex software product that was easily editable and > > understandable. I don't know one. If there was one, it'd be nice to know > > it so we can learn from it. > > PostreSQL, Apache, X.org And note that one of the major advances in X.org over XFree86 was that all the useless garbage was removed from the configuration file, so that the final and usable configuration fits on one screen, and you can even write it from memory if you dare to.
Am Mittwoch, 4. Juni 2008 schrieb Tom Lane: > * Can we present the config options in a more helpful way (this is 99% > a documentation problem, not a code problem)? ack > * Can we build a "configuration wizard" to tell newbies what settings > they need to tweak? Some questions to clarify this: - What settings do "newbies" (or anyone else) typically need to change? Please post a list. - What values would you set those settings to? Please provide a description for arriving at a value, which can later be transformed into code. Note that in some cases, not even the documentation provides more than handwaving help. - If we know better values, why don't we set them by default? Another orthogonal stumbling block on the way to making all of this automatic is that the surely criticial shared_buffers setting will in any useful configuration require messing around with kernel settings that no PostgreSQL tool can really help with.
Am Donnerstag, 5. Juni 2008 schrieb Tom Lane: > How far could we get with the answers to just three questions: > > * How many concurrent queries do you expect to have? > > * How much RAM space are you willing to let Postgres use? > > * How much "overhead" disk space are you willing to let Postgres use? This is surely a good start. We could optimize this even more by saying, disk space is cheap, so let's just use a much higher default setting for checkpoint_segments. (If PostgreSQL is installed but not actually used, not all the space is actually going to be used anyway.) Then, increase max_connections a bit; that should be OK for most users. Then you are left with the memory settings, and those need kernel tuning in most cases, so any automation tool loses. Hmm.
* Andreas Pflug <pgadmin@pse-consulting.de> [080606 04:50]: > David E. Wheeler wrote: > > > >How about a simple rule, such as that machine-generated comments start > >with "##", while user comments start with just "#"? I think that I've > >seen such a rule used before. At any rate, I think that, unless you > >have some sort of line marker for machine-generated comments, there > >will be no way to tell them apart from user comments. > > Two heretical questions: > Do we need user generated comments at all? > I can't remember ever having used any comment in postgresql.conf. Well, I have, mainly to leave traces as to what and why I've changed something from a default/previous value, based on "chagne, hope, and test" style tuning. And the one that Greg brought up earlier: ## Don't make this too high, or linux OOM will kill ther server!!! I'm guessing that comment was put in for a reason too. > Why do so many people here insist on editing postgresql.conf as primary > means of changing config params? > Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as sed'ing > postgresql.conf or doing it manually? I would guess main for 3 reasons: 1) It's always been that way, it's the traditional "unix" way, and many admins are comfortable being able to comment/makequick test changes with simple text files. 2) Their postgresql.conf are distrubuted/synced/generated from central provisioning/SCM system 3) PostgreSQL the server isn't even running > Looking around for different approaches, network appliances come to my > mind, e.g. Cisco routers and PIX. You have 3 ways to configure a pix: > - use a command line (using ssh or telnet, eqivalent to psql); WRITE > MEMORY to make the changes survive a reboot. > - use a web interface (or similar tool) > - use tftp to up/download the complete config in and out, editing the > file. User comments will be lost, with the exception of those that have > been applied with special comment commands (equivalent to "comment on"). And, of course, other server software comes to mind too: apache, bind, postfix, sendmail, dhcpd, sshd, cron, xinetd... ;-) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
* Peter Eisentraut <peter_e@gmx.net> [080606 08:25]: > Am Mittwoch, 4. Juni 2008 schrieb Aidan Van Dyk: > > > When reading this thread, I'm wondering if anybody ever saw a config > > > file for a complex software product that was easily editable and > > > understandable. I don't know one. If there was one, it'd be nice to know > > > it so we can learn from it. > > > > PostreSQL, Apache, X.org > > And note that one of the major advances in X.org over XFree86 was that all the > useless garbage was removed from the configuration file, so that the final > and usable configuration fits on one screen, and you can even write it from > memory if you dare to. Ya, and unfortunately, I haven't got my dual-head to work with recent distro X.org either, probably because I haven't found how how/where/what to set in Xorg.conf for new version (and my old one doesn't "just work" on new X.org either). So I've got an old x.org version pinned, even though I'm pretty sure that a newer X.org is probably better, dispite my lack of ability to configure it. Go figure ;-) -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
"Andreas Pflug" <pgadmin@pse-consulting.de> writes: > Gregory Stark wrote: >> So all you have is our existing file except with an additional layer of >> quoting to deal with, a useless SET keyword to annoy users, and a file that >> you need a bison parser > > Don't you think that's a little over the top, throwing bison at the simple task > to extend postgresql.conf scanning so it accepts --, /**/ and SET? Well you're the one who proposed generalising the simple text file format to SET commands. The whole point of a machine-editable format would be to make it more restricted and predictable, not less so. We had a proposal to do a postgresql.conf.auto which was automatically generated included from a free-form postgresql.conf. That file could generated in a simple format without worrying about formatting, comments, ordering, etc. > Text config files are NOT friendly for beginner and mediocre users. IMHO the > current restriction on GUC changes is a major obstacle towards pgsql tuning > tools, e.g. written as a Google SoC project. Graphic tools aren't too popular > at pgsql-hackers, but please contemplate a little how much pgadmin may have > contributed to the pgsql usage boost, esp. on windows. Like it or not computers actually have to store state when you're done entering it via the GUI. If you design around the GUI you end up with system that can *only* be used via a GUI and spend years trying to work around that (witness Windows which is only now with a lot of effort recovering from that mistake). People got all focused on comments and sed, but in fact there are a host of reasons to want to be able to directly edit the state stored in a simple text file. You might like to group together related options, for example, perhaps grouping together the options that your site have to adjust between dev and production -- and probably have instructions on how to make the transition in a commented section. I happen to think being presented with 200 options is less confusing than being presented with an empty file and being told to go read the documentation to find out whether I might want to put anything in it. The documentation is a good reference but isn't good as a "here's what you'll want to check before you start" guide. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
"Peter Eisentraut" <peter_e@gmx.net> writes: > And note that one of the major advances in X.org over XFree86 was that all the > useless garbage was removed from the configuration file, so that the final > and usable configuration fits on one screen, and you can even write it from > memory if you dare to. This is a good point, though I think the big advance wasn't in actually removing the garbage so much as making it irrelevant. That is, the advances were in making the server actually work automatically with less manual configuration. That's always a noble goal. I would be interested in what parameters people thought we could eliminate from the initial config file because the server normally gets it right anyways and users shouldn't have to adjust it anyways. (I'm specifically *not* referring to sections like the kerberos parameters which even if most users can leave them alone those who do use that feature *do* need to adjust those parameters because the server does *not* get them right automatically) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
Andreas Pflug <pgadmin@pse-consulting.de> writes: > Text config files are NOT friendly for beginner and mediocre users. IMHO > the current restriction on GUC changes is a major obstacle towards pgsql > tuning tools, e.g. written as a Google SoC project. Graphic tools aren't > too popular at pgsql-hackers, but please contemplate a little how much > pgadmin may have contributed to the pgsql usage boost, esp. on windows. I grow weary of this thread. I will say it once more: I do not believe for one instant that the current formatting of postgresql.conf is the major impediment, or even a noticeable impediment, to producing a useful configuration wizard. If you wish to prove otherwise, provide a complete wizard except for the parts that touch the config file, and I will promise to finish it. I will not read or respond to any further discussion of changing the config file format. It's a waste of bandwidth. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > - If we know better values, why don't we set them by default? The problem is: better for what? In particular, I'm uncomfortable with any changes in the direction of trying to make Postgres take over the entire machine by default. I'd want some fairly explicit permission from the user for that ... regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> - If we know better values, why don't we set them by default? > > The problem is: better for what? In particular, I'm uncomfortable with > any changes in the direction of trying to make Postgres take over the > entire machine by default. I'd want some fairly explicit permission > from the user for that ... That is where some 80% solution sample config files come in. Joshua D. Drake > > regards, tom lane >
Am Freitag, 6. Juni 2008 schrieb Tom Lane: > Peter Eisentraut <peter_e@gmx.net> writes: > > - If we know better values, why don't we set them by default? > > The problem is: better for what? In particular, I'm uncomfortable with > any changes in the direction of trying to make Postgres take over the > entire machine by default. I'd want some fairly explicit permission > from the user for that ... Yes, those are decisions we are going to have to make, eventually. But recall the three step process: 1. What values need changing? 2. What to change them to? 3. Could that be the new default value? I haven't seen a proposal for item 1 yet, so the rest is idle discussion at this time.
Peter Eisentraut wrote: > Am Freitag, 6. Juni 2008 schrieb Tom Lane: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> - If we know better values, why don't we set them by default? >> The problem is: better for what? In particular, I'm uncomfortable with >> any changes in the direction of trying to make Postgres take over the >> entire machine by default. I'd want some fairly explicit permission >> from the user for that ... > > Yes, those are decisions we are going to have to make, eventually. But recall > the three step process: > > 1. What values need changing? shared_buffers work_mem maintenance_work_mem checkpoint_segments wal_sync_method effective_cache_size > I haven't seen a proposal for item 1 yet, so the rest is idle discussion at > this time. I think those cover the biggest low hanging fruit, async_commit is arguable. Sincerely, Joshua D. Drake
Am Freitag, 6. Juni 2008 schrieb Joshua D. Drake: > That is where some 80% solution sample config files come in. Considering that writing a sample configuration file is trivial, yet I haven't seen a single one posted in the six or more years of GUC, I have no faith in this plan until I actually see it implemented.
Peter Eisentraut wrote: > Am Freitag, 6. Juni 2008 schrieb Joshua D. Drake: >> That is where some 80% solution sample config files come in. > > Considering that writing a sample configuration file is trivial, yet I haven't > seen a single one posted in the six or more years of GUC, I have no faith in > this plan until I actually see it implemented. I fail to see why anyone would bother with the amount of noise on this thread. If people start leaning toward an actual solution, I am sure the work will get done. Joshua D. Drake
On Jun 5, 2008, at 23:08, Heikki Linnakangas wrote: > What comments do we consider machine-generated? Just the ones used > to comment out settings, like > > #shared_buffers = 32MB > > or something else? Those and documentation comments. > If the automatic tool lets alone all other kind of comments, I think > we're fine. In fact, it wouldn't necessarily need to modify those > comments either, it could simply add a new setting line below that: > > #shared_buffers = 32MB > shared_buffers = 1024MB Well, we've been talking about having varying levels of documentation in the comments of the file based on the options passed to the configuration program. I think that these are the primary concern, though Greg, please do correct me if I'm mistaken. > For extra safety, it could comment out old settings, perhaps with > something like this: > > #shared_buffers = 32MB > #shared_buffers = 1024MB # commented out by wizard on 2008-06-05 > shared_buffers = 2048MB > > This would preserve a full change history in the file. It would > become quite messy after a lo of changes, of course, but a user can > trim the history by hand if he wants to. I guess that could be a feature. Personally, I use a vcs system for that. Best, David
On Jun 6, 2008, at 01:50, Andreas Pflug wrote: > Two heretical questions: > Do we need user generated comments at all? > I can't remember ever having used any comment in postgresql.conf. That's a valid point. I've used comments to note by whom and when when a setting was changed. > Why do so many people here insist on editing postgresql.conf as > primary means of changing config params? > Isn't a psql -c "SET foo=bar; MAKE PERSISTENT" just as good as > sed'ing postgresql.conf or doing it manually? I think that there has been enough pushback against housing all the settings in the database, not to mention that it calls for an API, that just starting with something simpler to parse the file and rewrite it from the command-line might be a better first step. > Looking around for different approaches, network appliances come to > my mind, e.g. Cisco routers and PIX. You have 3 ways to configure a > pix: > - use a command line (using ssh or telnet, eqivalent to psql); WRITE > MEMORY to make the changes survive a reboot. > - use a web interface (or similar tool) > - use tftp to up/download the complete config in and out, editing > the file. User comments will be lost, with the exception of those > that have been applied with special comment commands (equivalent to > "comment on"). I think the first option there is the one that's been getting the most support here. Best, David
On Wednesday 04 June 2008 15:48:47 Andrew Dunstan wrote: > simply remove all the comment lines from your > config file. +1. That would clear up a lot of confusion on it's own. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Saturday 31 May 2008 17:34:27 David E. Wheeler wrote: > On May 31, 2008, at 12:36, Gregory Stark wrote: > > What this sounds like is a sly way to try to get rid of > > postgresql.conf > > entirely and replace it with parameters stored in the database so > > admins would > > adjust the parameters using an SQL syntax rather than a text file. > > > > There are pros and cons of such a system but I think for newbie > > admins that > > would be a thousand times *more* baffling. You would have to learn new > > commands and have no holistic view of what parameters had been set, > > what > > related parameters might exist. You also have no way to keep the > > file in a > > version control system or sync across servers etc. > > FWIW, this has not been a barrier to MySQL adoption. > agreed. using sql like syntax is often a more familiar operation than editing a text file for DBAs. Consider the number of DBA's that manage other DB's strictly through graphical interfaces (something we are rather lacking in ability to do comparativly). consider also that if this is built atop the current pg_settings, all the normal sql tools can still help... ie. pg_dump -t if you want to check something into svn. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
* David E. Wheeler <david@kineticode.com> [080606 12:22]: > I guess that could be a feature. Personally, I use a vcs system for > that. Bugger... Now we only need to make postgresql check postmaster.conf into git everytime it makes a change... ;-) -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Fri, 2008-06-06 at 13:07 -0400, Aidan Van Dyk wrote: > * David E. Wheeler <david@kineticode.com> [080606 12:22]: > > > I guess that could be a feature. Personally, I use a vcs system for > > that. > > Bugger... Now we only need to make postgresql check postmaster.conf > into git everytime it makes a change... Just work with the FAM api to automate it :P Joshua D. Drake > > ;-) > >
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Bugger... Now we only need to make postgresql check postmaster.conf > into git everytime it makes a change... Been there, wrote that. (for postgresql.conf anyway). - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200806061351 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhJeXQACgkQvJuQZxSWSshS3QCfQMC092pMGm/FiRJkYmu9j68e VDYAoM051BtIF8Hsemsrs//jSv3dkGel =otzg -----END PGP SIGNATURE-----
Gregory Stark wrote: > >> Text config files are NOT friendly for beginner and mediocre users. IMHO the >> current restriction on GUC changes is a major obstacle towards pgsql tuning >> tools, e.g. written as a Google SoC project. Graphic tools aren't too popular >> at pgsql-hackers, but please contemplate a little how much pgadmin may have >> contributed to the pgsql usage boost, esp. on windows. >> > > Like it or not computers actually have to store state when you're done > entering it via the GUI. If you design around the GUI you end up with system > that can *only* be used via a GUI and spend years trying to work around that > (witness Windows which is only now with a lot of effort recovering from that > mistake). > I never advocated a file format that isn't editable any more; au contraire. And the statement that a GUI configuration contradicts manual editing is plain wrong, even with most windows software (if written nicely to the registry, you could dump the key, edit and reload it, or use regedit). OTOH, you can't make most windows users happy with a text file version only. Regards, Andreas
Tom Lane wrote: > I grow weary of this thread. I will say it once more: I do not believe > for one instant that the current formatting of postgresql.conf is the > major impediment, or even a noticeable impediment, to producing a useful > configuration wizard. If you wish to prove otherwise, provide a > complete wizard except for the parts that touch the config file, and > I will promise to finish it. > > I will not read or respond to any further discussion of changing the > config file format. It's a waste of bandwidth. > Your statement doesn't really surprise me. Apparently you kind of celebrate misunderstanding my point, which isn't primarily about the file format, but about config param accessibility via api/SQL. I personally wouldn't even think about starting such a wizard, unless I have an idea how to push the result into the database. No, not a file, but via SQL! So your statement you won't react unless a wizard is almost ready is prohibitive, apart from the fact that not only wizards (featuring AI) are interesting, but simple config tools as well. Regards, Andrads
On Monday 02 June 2008 10:12:06 Tom Lane wrote: > I have no objection to providing alternative ways to edit the > configuration data, but the primary source of the settings is > going to continue to be an editable text file. Any proposals for > alternatives-to-a-text-editor have to work within that reality. > I think there is some disagreement that using text files needs to be the "primary" way to edit configurations. Some people are of the opinion that we should focus on text files as a secondary method, generally to be used only when a more sql/gui oriented way wont work (ie. your shared buffer restart scenario). I think most of the user base would like to approach administration from that point-of-view, and as of yet I haven't seen a technical reason why that world view is wrong, only philosphical ones. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Fri, 6 Jun 2008, Peter Eisentraut wrote: > - What settings do "newbies" (or anyone else) typically need to change? > Please post a list. > - What values would you set those settings to? Please provide a description > for arriving at a value, which can later be transformed into code. Note that > in some cases, not even the documentation provides more than handwaving help. Josh's spreadsheet at http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc provides five different models for setting the most critical parameters based on different types of workloads. Everyone can quibble over the fine tuning, but having a good starter set of reasonable settings for these parameters is a solved problem. It's just painful to build a tool to apply the available expert knowledge that is already around. > - If we know better values, why don't we set them by default? Because there's not enough information available; the large differences between how you tune for different workloads is one example. Another is that people tune for peak and projected activity rather than just what's happening right now. Every model suggested for a tuning wizard recognizes you need to ask some set of questions to nail things down. I continue to repeat in broken-record style, exactly what a tuning tool will ask about and what settings it will suggest is not important, and getting into that is an entirely different discussion (one that gets hashed out every single day on pgsql-performance). The fact that writing such a tool is harder than it should be is the issue here. > Another orthogonal stumbling block on the way to making all of this > automatic is that the surely criticial shared_buffers setting will in > any useful configuration require messing around with kernel settings > that no PostgreSQL tool can really help with. Yes. So? All you can do is point this out to users. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wednesday 04 June 2008 22:04:54 Greg Smith wrote: > On Wed, 4 Jun 2008, Aidan Van Dyk wrote: > > * Are we always spilling small amounts of data to disk for sorting? A > > a small work_mem increase might help... > > I was just talking to someone today about building a monitoring tool for > this. Not having a clear way to recommend people monitor use of work_mem > and its brother spilled to disk sorts is an issue right now, I'll whack > that one myself if someone doesn't beat me to it before I get time. > I remember *years* ago, someone wrote a perl script to poll pgsql_tmp and print out anytime something showed up... you could probably find that in the archives if you look around. of course to me this sounds like an excellent idea for a dtrace probe ;-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Jun 6, 2008, at 12:22 PM, Greg Smith wrote: > On Fri, 6 Jun 2008, Peter Eisentraut wrote: > >> - What settings do "newbies" (or anyone else) typically need to >> change? >> Please post a list. >> - What values would you set those settings to? Please provide a >> description >> for arriving at a value, which can later be transformed into code. >> Note that >> in some cases, not even the documentation provides more than >> handwaving help. > > Josh's spreadsheet at http://pgfoundry.org/docman/view.php/1000106/84/calcfactors.sxc > provides five different models for setting the most critical > parameters based on different types of workloads. Everyone can > quibble over the fine tuning, but having a good starter set of > reasonable settings for these parameters is a solved problem. > It's just painful to build a tool to apply the available expert > knowledge that is already around. I hope to have something in a week or so that's a first cut at that tool. I'm aiming at a GUI tool at first, as I think that's the main need, though once the basics are done an interactive or non- interactive CLI version shouldn't be a big deal. Cheers, Steve
"Andreas Pflug" <pgadmin@pse-consulting.de> writes: > I personally wouldn't even think about starting such a wizard, unless I have an > idea how to push the result into the database. No, not a file, but via SQL! So > your statement you won't react unless a wizard is almost ready is prohibitive, > apart from the fact that not only wizards (featuring AI) are interesting, but > simple config tools as well. Well there's a perfectly good place to start today. Dump out a config file with the settings your wizard or config tool wants to change from their current values. We could come up with a canonical filename for such a file and include an include directive for it in the standard configuration file or you could just include instructions saying what include line to add to your config file in the installation instructions for your tool. Great, I'm glad we've resolved that issue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On Fri, 6 Jun 2008, Heikki Linnakangas wrote: > Or perhaps we should explicitly mark the settings the tool has generated, and > comment out: > > #shared_buffers = 32MB # commented out by wizard on 2008-06-05 > shared_buffers = 1024MB # automatically set by wizard on 2008-06-05 What I would like to do is make the tool spit out a revision history in the same way I find all "big IT" shops handling this already: by putting a revision history style commentary above the current setting. Here's a sample: # 2008-03-02 : 32MB : postgres : Database default # 2008-05-02 : 512MB : pg_autotune : Wizard update # 2008-05-15 : 1024MB : gsmith : Increased after benchmark tests shared_buffers = 1024MB If the first tuning tool that comes into existance used this format, and the format was reasonable, I think it would be possible to get people making manual edits to adopt it as well. The exact details of how this should look are off-topic for the main discussion here, though, so I'd prefer if this whole line of discussion died off. Anyone who wants to comment on this whole area, feel free to contact me off-list or edit the Wiki page (which has a section on this topic now) to hash out suggestions in this area, I'm trying to keep this somewhat thread focused now. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thursday 05 June 2008 15:15:14 Greg Smith wrote: > (1) is in that proposal but is strictly optional as something to put in > the configuration file itself. The idea behind (2) is to enable tool > authors to have an easier way to suggest where to head for more > information. I'd like for it to be trivial for a tool to say "Suggested > value for <x> is <y>; see > http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html > for more information". I know what most of the settings I tinker with do, > but even I'd like it to be easier to find the right spot in the manual; > for newbies it's vital. You are correct that (2) isn't strictly necessary > here, but it's valuable and will be easier to wrap into this than to bolt > on later. > One idea I have been kicking around is having every guc have a anchor in the website (rahter than anchoring on parameter family). It might be enough to just populate the search bot with every guc anchored to family though... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Friday 06 June 2008 08:35:00 Peter Eisentraut wrote: > Am Mittwoch, 4. Juni 2008 schrieb Tom Lane: > > * Can we present the config options in a more helpful way (this is 99% > > a documentation problem, not a code problem)? > > ack > > > * Can we build a "configuration wizard" to tell newbies what settings > > they need to tweak? > > Some questions to clarify this: > > - What settings do "newbies" (or anyone else) typically need to change? > Please post a list. > i do have a defined set (sort of a "top ten you need to know") that i tell people about in my postgres 101 tutorials... off the top of my head it looks like this: shared_buffers effective_cache_size default_stats_target work_mem maintainance_work_mem listen_address max_connections the fsm parameters checkpoint_segements random_page_cost i think others close to the list are constraint_exclusion (unless it defaults to on now), max_prepared_transactions (set it to 0 unless you use it... i may stop recommending this after noticing its effects on max # system locks allowed), and then the logging parameters (which is to say, you need to set up logging that works, however you want to do it). There are a couple more that might go on the list, like synchronous_commit, but its on the fence for now. > - What values would you set those settings to? Please provide a > description for arriving at a value, which can later be transformed into > code. Note that in some cases, not even the documentation provides more > than handwaving help. > I can provide this if you want, will need to look over my notes. one issue we've faced in the past with this is something like shared_buffers, where the settings is based on 1) dedicated server?, 2) available ram, 3) amount of red/write traffic, 4) disk subsystem. Those types of input are hard to quantify in code. > - If we know better values, why don't we set them by default? > There is a saying, something like "The accumulation of annecdotes is not data". Well, we seem to have a high bar on what proof we need to actually change a default GUC settings. default_statistics_target is a prime example, where almost no one i know has ever recommended 10 as a default, or suggests setting it to 10 as an way to improve performance, but the effort to get it changed to something more reasonable has been monumental. > Another orthogonal stumbling block on the way to making all of this > automatic is that the surely criticial shared_buffers setting will in any > useful configuration require messing around with kernel settings that no > PostgreSQL tool can really help with. yep. seems it might be possible to just compare the shared_buffer setting with the kernel parameters before making the change though... not sure in which way you would slant the output though. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Fri, 6 Jun 2008, Tom Lane wrote: > I grow weary of this thread. If we keep it up for, oh, another three years, then maybe you'll be as weary as I am of struggling with problems in this area. Strinking a balance between the wants and needs of people who want a fancy GUI tool for configuring database settings with those who want to edit things manually is a difficult problem that is not going away. If this didn't keep coming back to haunt me all the time I'd like to forget about it myself. > I will say it once more: I do not believe for one instant that the > current formatting of postgresql.conf is the major impediment, or even a > noticeable impediment, to producing a useful configuration wizard. Arguments about formatting change to postgresql.conf are a tangent to the central questions here, and having just closed some open comments on that I am with you on ignoring those as off-topic the same way I keep minimizing "what are the parameters to tune?" comments. Here are the relevant questions around since the first message that are not attracting discussion: 1) Is it worthwhile to expand the information stored in the GUC structure to make it better capable of supporting machine generation and to provide more information for tool authors via pg_settings? The exact fields that should or shouldn't be included remains controversial; consider "default value", "per-session/runtime/restart", and "enum lists" as the list of things that are most needed there. 2) Should the sample postgresql.conf file be replaced by a program that generates it using that beefed up structure instead, therefore removing one file that has to be manually kept in sync with the rest of the code base right now? 3) What now makes sense for a way to update database parameters for users whose primary (or only in some cases) access to the server is over the database port, given the other changes have improved automatic config file generation? > If you wish to prove otherwise, provide a complete wizard except for the > parts that touch the config file, and I will promise to finish it. You do realize that if I provided you with such a sample, the not implemented yet "config API" stubs it needs to work would be exactly what are suggested to add in the proposal page, right? I (and Josh) didn't just make them all up out of nowhere you know. I wrote a message here already about what the seemingly inevitable path the budding "wizard tool hacker" follows and why that leads into some of the changes suggested. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Robert Treat wrote: > One idea I have been kicking around is having every guc have a anchor in the > website (rahter than anchoring on parameter family). It might be enough to > just populate the search bot with every guc anchored to family though... +1 on the anchor per variable. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Joshua D. Drake wrote: > Tom Lane wrote: >> Peter Eisentraut <peter_e@gmx.net> writes: >>> - If we know better values, why don't we set them by default? >> >> The problem is: better for what? > > That is where some 80% solution sample config files come in. +1. At work I use 3 templates. * One for salespeople's demo laptops. * One for a developer's desktop. * One for our bigger production servers. The old old default postgresql.conf used to be nice for the first group. The newer set of defaults is nicer for the second group. Emailing the lists here's the current best way of tuning for that last case. I wonder if the fastest way to generate the configurator would be to simply ask everyone to post their tuned postgresql.conf files along with a brief description of the use case for that file. The we could group the use-cases into various classes; and average the values of the submitted files. Then the configurator's one question "choose which use case most closely matches yours from this list".
"Greg Smith" <gsmith@gregsmith.com> writes: > 1) Is it worthwhile to expand the information stored in the GUC structure to > make it better capable of supporting machine generation and to provide more > information for tool authors via pg_settings? The exact fields that should or > shouldn't be included remains controversial; consider "default value", > "per-session/runtime/restart", and "enum lists" as the list of things that are > most needed there. Isn't that a list of what's *already* there? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
On Thu, 05 Jun 2008 12:53:55 -0700 Ron Mayer wrote: > Steve Atkins wrote: > > ... cross-platform (Windows, Linux, Solaris, OS X as a bare > > minimum) > > I wonder how cross-platform the tuning algorithm itself is. > > I could also imagine that decisions like "do I let the OS page > cache, or postgres's buffer cache get most of the memory" are > extremely OS dependent. But you can hide most of the internal stuff from the user, either by generating the config file for the platform the tool is running on or by an option like "generate config for platform xyz". If you have cross-platform already in place, this should not be much overhead. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group
Robert Treat wrote: > On Wednesday 04 June 2008 22:04:54 Greg Smith wrote: > >> I was just talking to someone today about building a monitoring tool for >> this. Not having a clear way to recommend people monitor use of work_mem >> and its brother spilled to disk sorts is an issue right now, I'll whack >> that one myself if someone doesn't beat me to it before I get time. >> > > I remember *years* ago, someone wrote a perl script to poll pgsql_tmp and > print out anytime something showed up... you could probably find that in the > archives if you look around. > > of course to me this sounds like an excellent idea for a dtrace probe ;-) > > Actually, you can find out from the sort-end probe now whether or not the sort spilled to disk and number of disk blocks used. This is one of the probes from Simon. TRACE_POSTGRESQL_SORT_END(state->tapeset, (state->tapeset ? LogicalTapeSetBlocks(state->tapeset) : (state->allowedMem - state->availMem + 1023) / 1024)); -Robert
On Fri, 6 Jun 2008, Gregory Stark wrote: > "Greg Smith" <gsmith@gregsmith.com> writes: > >> 1) Is it worthwhile to expand the information stored in the GUC structure to >> make it better capable of supporting machine generation and to provide more >> information for tool authors via pg_settings? The exact fields that should or >> shouldn't be included remains controversial; consider "default value", >> "per-session/runtime/restart", and "enum lists" as the list of things that are >> most needed there. > > Isn't that a list of what's *already* there? I should have been clearer there. Some of the items suggested are already in the structure, but aren't visible via pg_settings. In those cases it's just exporting information that's already there. In others (like the suggestion to add a URL to the documentation) it is actually a new field being added as well as its corresponding entry in the settings view. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
gsmith@gregsmith.com (Greg Smith) writes: > On Fri, 6 Jun 2008, Heikki Linnakangas wrote: > >> Or perhaps we should explicitly mark the settings the tool has >> generated, and comment out: >> >> #shared_buffers = 32MB # commented out by wizard on 2008-06-05 >> shared_buffers = 1024MB # automatically set by wizard on 2008-06-05 > > What I would like to do is make the tool spit out a revision history > in the same way I find all "big IT" shops handling this already: by > putting a revision history style commentary above the current setting. > Here's a sample: > > # 2008-03-02 : 32MB : postgres : Database default > # 2008-05-02 : 512MB : pg_autotune : Wizard update > # 2008-05-15 : 1024MB : gsmith : Increased after benchmark tests > shared_buffers = 1024MB > > If the first tuning tool that comes into existance used this format, > and the format was reasonable, I think it would be possible to get > people making manual edits to adopt it as well. My first reaction to this is... "And why is it that you seem to imagine it a good idea to recreate RCS, badly?" While that may be an unkind reaction, I don't think it's a particularly wrong one! Getting into "comment formatting" certainly *does* seem like an attempt to do, poorly, what RCS does. And I am pointing at a tool from 1985 as opposed to "modern" ones to point out that there were better approaches to managing versioning of simple text documents *THIRTY YEARS AGO*. Indeed, CVS, which we are still using doesn't fundamentally do anything, in this context, that RCS didn't. I kind of wish that there was some sort of library for "doing RCS"; even in the absence of that, it would be perfectly logical for a tool that manages this to DEMAND that there be a "diff" utility or some equivalent available. In the absence of that, let me suggest that it would be preferable to follow VMS's lead and, if you "automagically" make any changes, create a version called "postgresql.conf;[n]" where "[n]" is either: a) 1, if "postgresql.conf;1" does not exist, or b) 1 + thelargest integer found. Replace ";" with some other character as need be. Add a line (or a few lines) at the top of the current version that indicates some bit of commented-out-metadata that is obviously formatted and easily recognizable such as: ### [pgauto] postgresql.conf-autotuner - ran 2008-07-02 ### [pgauto] little bit of safely-turfable header ### [pgauto] further little bit of safely-turfable header In effect, what it does is thus: - Keep the old copy as postgresql.conf;24 (because we have some 24 elder versions)- Strips off any leading lines that beginwith "### [pgauto]"- Puts any interesting comments in a set of leading lines that begin with "### [pgauto]" Merits, over some wacky rewrite-comment-formatting:- This requires only the thinnest bit of "special handling" of comments.-It doesn't force in a dependancy on [pick-someone's-pet-SCM that you happen to hate]- Even in the absence of anSCM, it keeps versioning information around- If you have a "diff," it'll work perfectly well, and be reasonably informative-It doesn't throw ANY old data away! Sorry, that's more than enough rant for today! :-) -- "cbbrowne","@","linuxdatabases.info" http://cbbrowne.com/info/rdbms.html Rules of the Evil Overlord #220. "Whatever my one vulnerability is, I will fake a different one. For example, ordering all mirrors removed from the palace, screaming and flinching whenever someone accidentally holds up a mirror, etc. In the climax when the hero whips out a mirror and thrusts it at my face, my reaction will be ``Hmm...I think I need a shave.''" <http://www.eviloverlord.com/>
Gregory Stark wrote: > "Andreas Pflug" <pgadmin@pse-consulting.de> writes: > >> I personally wouldn't even think about starting such a wizard, unless I have an >> idea how to push the result into the database. No, not a file, but via SQL! So >> your statement you won't react unless a wizard is almost ready is prohibitive, >> apart from the fact that not only wizards (featuring AI) are interesting, but >> simple config tools as well. > > Well there's a perfectly good place to start today. Dump out a config file I think I made my point very clear when stating "not a file, but via SQL". Though I'm not a native English speaker, and I'm sure you understood. I must assume you're polluting this thread deliberately in order to sabotage the original intention of this thread. I find this disgusting. Ok, trying to contribute gui tools for pgsql is for masochists. We have vi, sed and grep, no more tools required, right? Regards, Andreas
On Sat, 2008-06-07 at 01:30 +0200, Andreas Pflug wrote: > Gregory Stark wrote: > > "Andreas Pflug" <pgadmin@pse-consulting.de> writes: > I think I made my point very clear when stating "not a file, but via > SQL". Though I'm not a native English speaker, and I'm sure you > understood. I must assume you're polluting this thread deliberately in > order to sabotage the original intention of this thread. I find this > disgusting. > > Ok, trying to contribute gui tools for pgsql is for masochists. We have > vi, sed and grep, no more tools required, right? Hold on guys, let's not get all out of whack. No one in their right mind is going to suggest that vi, sed and grep are the only tools we need to do this. However a GUI wizard isn't really a way to go either. SQL? Yes it would be very nice to be able to: BEGIN; SET shared_buffers to 65536 COMMENT IS 'Since I have 4G of ram I want 512 megs of shared buffers'; SET effective_cache_size to 262144 COMMENT IS 'Since I have 4G of ram I want 1 gig of effective_cache'; COMMIT; When that is done, it would write out the postgresql.conf. Regardless of all of this, this thread is way out of hand. We went from overhauling the configuration file to wizards, sample postgresql.conf files and possibly even half of us migrating to MySQL :P. Can we refine the topic a bit? Joshua D. Drake
Robert Treat <xzilla@users.sourceforge.net> writes: > There is a saying, something like "The accumulation of annecdotes is not > data". Well, we seem to have a high bar on what proof we need to actually > change a default GUC settings. default_statistics_target is a prime example, > where almost no one i know has ever recommended 10 as a default, or suggests > setting it to 10 as an way to improve performance, but the effort to get it > changed to something more reasonable has been monumental. Actually, the reason it's still 10 is that the effort expended to get it changed has been *ZERO*. I keep asking for someone to make some measurements, do some benchmarking, anything to make a plausible case for a specific higher value as being a reasonable place to set it. The silence has been deafening. regards, tom lane
Greg Smith <gsmith@gregsmith.com> writes: > On Fri, 6 Jun 2008, Gregory Stark wrote: >> "Greg Smith" <gsmith@gregsmith.com> writes: >>> 1) Is it worthwhile to expand the information stored in the GUC structure to >>> make it better capable of supporting machine generation and to provide more >>> information for tool authors via pg_settings? The exact fields that should or >>> shouldn't be included remains controversial; consider "default value", >>> "per-session/runtime/restart", and "enum lists" as the list of things that are >>> most needed there. >> >> Isn't that a list of what's *already* there? > I should have been clearer there. Some of the items suggested are already > in the structure, but aren't visible via pg_settings. Well, you can't see the default or reset values in pg_settings, only the current value. However, I fail to see the use of either of those for a configure wizard. It'll presumably be attached to a fresh connection so the reset value is not different from the current; and any decent wizard is going to know perfectly well what the defaults are. > ... In others (like the > suggestion to add a URL to the documentation) it is actually a new field > being added as well as its corresponding entry in the settings view. Offhand I would argue that we should choose the URLs in such a way that they can be derived automatically if you know the PG version number and GUC variable name. (We're almost there already, except that you have to know which section of the chapter it's in.) So the need to have a pg_settings column seems questionable --- seems like it would mostly be clutter. Perhaps instead of a view column, it would be useful to encapsulate the algorithm as a function? pg_variable_help_url(name) regards, tom lane
On Fri, 6 Jun 2008, Tom Lane wrote: > Well, you can't see the default or reset values in pg_settings, only the > current value. However, I fail to see the use of either of those for > a configure wizard. I'm under the impression that the primary reason to put the default in there is to make it easier for a file generator program to be decoupled a bit from the internal representation. Regardless, these values should be exposed for tool writers. If you build a prototype interface for an interactive settings changing tool, you quickly discover that showing the default, range, and recommended setting are all valuable things people would like to see when deciding what the change a setting to. And there's no reason accumulating all that info should be the responsibility of a tool writer when it's easy to expose and keep up to date inside the database itself. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > Actually, the reason it's still 10 is that the effort expended to get it > changed has been *ZERO*. I keep asking for someone to make some > measurements, do some benchmarking, anything to make a plausible case > for a specific higher value as being a reasonable place to set it. > The silence has been deafening. Not surprising really. It is a simple adjustment to make and it also is easy to spot when its a problem. However it is not trivial to test for (in terms of time and effort). I know 10 is wrong and so do you. If you don't I am curious why I see so many posts from you saying, "Your estimates are off, what is your default_statistics_target?" with yet even more responses saying, "Uhh 10." Sincerely, Joshua D. Drake
"Joshua D. Drake" <jd@commandprompt.com> writes: > Not surprising really. It is a simple adjustment to make and it also is > easy to spot when its a problem. However it is not trivial to test for > (in terms of time and effort). I know 10 is wrong and so do you. Sure. But what is right? I'm afraid to just push it to (say) 100 because of the possibility of O(N^2) behavior in eqjoinsel. Somebody needs to do some measurements on somewhat realistic scenarios. regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> writes: > On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote: >> Robert Treat <xzilla@users.sourceforge.net> writes: > >> Actually, the reason it's still 10 is that the effort expended to get it >> changed has been *ZERO*. I keep asking for someone to make some >> measurements, do some benchmarking, anything to make a plausible case >> for a specific higher value as being a reasonable place to set it. > >> The silence has been deafening. > > Not surprising really. It is a simple adjustment to make and it also is > easy to spot when its a problem. However it is not trivial to test for > (in terms of time and effort). I know 10 is wrong and so do you. If you > don't I am curious why I see so many posts from you saying, "Your > estimates are off, what is your default_statistics_target?" with yet > even more responses saying, "Uhh 10." Ah, but we only ever hear about the cases where it's wrong of course. In other words even if we raised it to some optimal value we would still have precisely the same experience of seeing only posts on list about it being insufficient. What's needed is some speed benchmarks for complex queries with varying size statistics so we can see how badly large statistic tables hurt planning time. The flip side of seeing how much larger tables help planning accuracy is much harder to measure. Offhand I don't see any systematic way to go about it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
On Sunday 08 June 2008 19:07:21 Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote: > >> Robert Treat <xzilla@users.sourceforge.net> writes: > >> > >> Actually, the reason it's still 10 is that the effort expended to get it > >> changed has been *ZERO*. I keep asking for someone to make some > >> measurements, do some benchmarking, anything to make a plausible case > >> for a specific higher value as being a reasonable place to set it. > >> > >> The silence has been deafening. > > > > Not surprising really. It is a simple adjustment to make and it also is > > easy to spot when its a problem. However it is not trivial to test for > > (in terms of time and effort). I know 10 is wrong and so do you. If you > > don't I am curious why I see so many posts from you saying, "Your > > estimates are off, what is your default_statistics_target?" with yet > > even more responses saying, "Uhh 10." > > Ah, but we only ever hear about the cases where it's wrong of course. In > other words even if we raised it to some optimal value we would still have > precisely the same experience of seeing only posts on list about it being > insufficient. > The slipside to this is that we're not trying to find the perfect setting, we're just trying to determine a number that will cause more benefit than harm compared to the number we have now. While I am sure there are cases where 100 is too low as well, I cannot recall ever having seen someone suggest lowering the default_stats_target to something less than 100. (I know sit back and wait for someone to comb the archives, just to find that 1 time). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Tom, >>> Actually, the reason it's still 10 is that the effort expended to get it >>> changed has been *ZERO*. I keep asking for someone to make some >>> measurements, do some benchmarking, anything to make a plausible case >>> for a specific higher value as being a reasonable place to set it. >>> The silence has been deafening. >> Not surprising really. It is a simple adjustment to make and it also is >> easy to spot when its a problem. However it is not trivial to test for >> (in terms of time and effort). I know 10 is wrong and so do you. If you >> don't I am curious why I see so many posts from you saying, "Your >> estimates are off, what is your default_statistics_target?" with yet >> even more responses saying, "Uhh 10." I tried (back in 7.4) to do some systematic testing of this. The problem is that the cases were higher d_s_t are required are specifically ones with complex, unbalanced data distributions and/or very large databases. This makes test cases extremely difficult and time-consuming to generate; further, I found that the test cases I had from my clients' databases were not portable (in addition to being confidential). Also, I'd actually assert that "10" seems to be perfectly adequate for the majority of users. That is, the number of users where I've recommended increasing d_s_t for the whole database is smaller than the number where I don't, and of course we never hear from most users at all. So I'm pretty happy recommending "Leave the default. If you encounter problem queries, increase it to 100, and analyse the database. If you're running a data warehouse, increase itto 1000." Where analyze does systematically fall down is with databases over 500GB in size, but that's not a function of d_s_t but rather of our tiny sample size. --Josh
"Josh Berkus" <josh@agliodbs.com> writes: > Where analyze does systematically fall down is with databases over 500GB in > size, but that's not a function of d_s_t but rather of our tiny sample size. Speak to the statisticians. Our sample size is calculated using the same theory behind polls which sample 600 people to learn what 250 million people are going to do on election day. You do NOT need (significantly) larger samples for larger populations. In fact where those polls have difficulty is the same place we have some problems. For *smaller* populations like individual congressional races you need to have nearly the same 600 sample for each of those small races. That adds up to a lot more than 600 total. In our case it means when queries cover a range much less than a whole bucket then the confidence interval increases too. Also, our estimates for n_distinct are very unreliable. The math behind sampling for statistics just doesn't work the same way for properties like n_distinct. For that Josh is right, we *would* need a sample size proportional to the whole data set which would practically require us to scan the whole table (and have a technique for summarizing the results in a nearly constant sized data structure). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!
On 6/9/08, Gregory Stark <stark@enterprisedb.com> wrote:
"Josh Berkus" <josh@agliodbs.com> writes:
> Where analyze does systematically fall down is with databases over 500GB in
> size, but that's not a function of d_s_t but rather of our tiny sample size.
n_distinct. For that Josh is right, we *would* need a sample size proportional
to the whole data set which would practically require us to scan the whole
table (and have a technique for summarizing the results in a nearly constant
sized data structure).
Hi,
is this (summarizing results in a constant sized data structure) something which could be achived by Bloom-Filters ?
http://archives.postgresql.org/pgsql-general/2008-06/msg00076.php
Kind regards
Hakan Kocaman
"Hakan Kocaman" <hkocam@googlemail.com> writes: > On 6/9/08, Gregory Stark <stark@enterprisedb.com> wrote: >> >> n_distinct. For that Josh is right, we *would* need a sample size >> proportional to the whole data set which would practically require us to >> scan the whole table (and have a technique for summarizing the results in a >> nearly constant sized data structure). > > is this (summarizing results in a constant sized data structure) something > which could be achived by Bloom-Filters ? Uhm, it would be a bit of a strange application of them but actually it seems to me that would be a possible approach. It would need a formula for estimating the number of distinct values given the number of bits set in the bloom filter. That should be a tractable combinatorics problem (in fact it's pretty similar to the combinatorics I posted a while back about getting all the drives in a raid array busy). And if you have a dynamic structure where the filter size grows then it would overestimate because extra copied bits would be set. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Greg, > Speak to the statisticians. Our sample size is calculated using the same > theory behind polls which sample 600 people to learn what 250 million > people are going to do on election day. You do NOT need (significantly) > larger samples for larger populations. Your analogy is bad. For elections, the voters have only a few choices. In a 300 million row table, there could be 300 million different values, and the histogram becomes less accurate for every order of magnitude smaller than 300 million it is. > Also, our estimates for n_distinct are very unreliable. The math behind > sampling for statistics just doesn't work the same way for properties > like n_distinct. For that Josh is right, we *would* need a sample size > proportional to the whole data set which would practically require us to > scan the whole table (and have a technique for summarizing the results > in a nearly constant sized data structure). Actually, a number of papers have shown block-based algorithms which can arrive a reasonably confident (between 50% and 250% of accurate) estimates based on scanning only 5% of *blocks*. Simon did some work on this a couple years ago, but he and I had difficultly convincing -hackers that a genuine problem existed. You're correct that we'd need to change pg_statistic, though. For one thing, we need to separate the sample size from the histogram size. Also, we seem to be getting pretty far away from the original GUC discussion. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
"Josh Berkus" <josh@agliodbs.com> writes: > Greg, > >> Speak to the statisticians. Our sample size is calculated using the same >> theory behind polls which sample 600 people to learn what 250 million >> people are going to do on election day. You do NOT need (significantly) >> larger samples for larger populations. > > Your analogy is bad. For elections, the voters have only a few choices. > In a 300 million row table, there could be 300 million different values, > and the histogram becomes less accurate for every order of magnitude > smaller than 300 million it is. I think you're right that you need, for example, 600 people *for each answer* to give good poll results. So for a two-way election which is about even that's about 1,200 people. If one of the candidates is much less popular you might have to sample many more people before you have 600 people in that bucket. The analogous case in our situation is not having 300 million distinct values, since we're not gathering info on specific values, only the buckets. We need, for example, 600 samples *for each bucket*. Each bucket is chosen to have the same number of samples in it. So that means that we always need the same number of samples for a given number of buckets. >> Also, our estimates for n_distinct are very unreliable. The math behind >> sampling for statistics just doesn't work the same way for properties >> like n_distinct. For that Josh is right, we *would* need a sample size >> proportional to the whole data set which would practically require us to >> scan the whole table (and have a technique for summarizing the results >> in a nearly constant sized data structure). > > Actually, a number of papers have shown block-based algorithms which can > arrive a reasonably confident (between 50% and 250% of accurate) estimates > based on scanning only 5% of *blocks*. Simon did some work on this a > couple years ago, but he and I had difficultly convincing -hackers that a > genuine problem existed. Really? Could you send references? The paper I read surveyed previous work and found that you needed to scan up to 50% of the table to get good results. 50-250% is considerably looser than what I recall it considering "good" results so these aren't entirely inconsistent but I thought previous results were much worse than that. > You're correct that we'd need to change pg_statistic, though. For one > thing, we need to separate the sample size from the histogram size. That amounts to giving users control over the sample size per bucket. Which allows them to get a more or less accurate estimate for a range covering a single bucket without changing the size of the bucket. I'm a bit puzzled which direction you want to go. > Also, we seem to be getting pretty far away from the original GUC > discussion. Thank heavens :) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Greg, > The analogous case in our situation is not having 300 million distinct > values, since we're not gathering info on specific values, only the > buckets. We need, for example, 600 samples *for each bucket*. Each bucket > is chosen to have the same number of samples in it. So that means that we > always need the same number of samples for a given number of buckets. I think that's plausible. The issue is that in advance of the sampling we don't know how many buckets there *are*. So we first need a proportional sample to determine the number of buckets, then we need to retain a histogram sample proportional to the number of buckets. I'd like to see someone with a PhD in this weighing in, though. > Really? Could you send references? The paper I read surveyed previous work > and found that you needed to scan up to 50% of the table to get good > results. 50-250% is considerably looser than what I recall it considering > "good" results so these aren't entirely inconsistent but I thought previous > results were much worse than that. Actually, based on my several years selling performance tuning, I generally found that as long as estimates were correct within a factor of 3 (33% to 300%) the correct plan was generally chosen. There are papers on block-based sampling which were already cited on -hackers; I'll hunt through the archives later. -- Josh Berkus PostgreSQL @ Sun San Francisco
Ron, > I wonder if the fastest way to generate the configurator > would be to simply ask everyone to post their tuned > postgresql.conf files along with a brief description of > the use case for that file. The we could group the > use-cases into various classes; and average the values > of the submitted files. Then the configurator's one > question "choose which use case most closely matches > yours from this list". This has been proposed a number of times, but is still infeasable for the simple reason that a lot of settings are dependant on how much RAM you have on the machine. "Sample configs" would just result in people trying to set their shared_buffers to 150% of system RAM if they have a lightweight system, or in having 500 different sample configurations, which would produce even *more* user confusion. A generator makes more sense. -- Josh Berkus PostgreSQL @ Sun San Francisco
Robert, > shared_buffers > effective_cache_size > default_stats_target > work_mem > maintainance_work_mem > listen_address > max_connections > the fsm parameters > checkpoint_segements > random_page_cost My list is very similar, execept that I drop random_page_cost and add synchronous_commit, autovaccum (on or off) and offer 4 "packages" of log settings. So I think that we can get consensus on what are the "newbie" parameters pretty easily. -- Josh Berkus PostgreSQL @ Sun San Francisco
On Tuesday 10 June 2008 09:37, Josh Berkus wrote: > Robert, > > > shared_buffers > > effective_cache_size > > default_stats_target > > work_mem > > maintainance_work_mem > > listen_address > > max_connections > > the fsm parameters > > checkpoint_segements > > random_page_cost > > My list is very similar, execept that I drop random_page_cost and add > synchronous_commit, autovaccum (on or off) and offer 4 "packages" of log > settings. Oh, and wal_buffers, the default for which we should just change if it weren't for SHMMAX. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Oh, and wal_buffers, the default for which we should just change if it > weren't for SHMMAX. Uh, why? On a workload of mostly small transactions, what value is there in lots of wal_buffers? regards, tom lane
Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: >> ...default_statistics_target?"..."Uhh 10." > > Ah, but we only ever hear about the cases where it's wrong of course. In other > words even if we raised it to some optimal value we would still have precisely > the same experience of seeing only posts on list about it being insufficient. Yet if it was raised above the optimal, wouldn't you expect to hear cases where it was too high? It seems a slow painful way of finding a good setting might be to keep increasing it until we had cases where the lists start telling people they need to lower the parameter?
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Oh, and wal_buffers, the default for which we should just change if it >> weren't for SHMMAX. > > Uh, why? On a workload of mostly small transactions, what value is > there in lots of wal_buffers? None. But there's also little to no harm in having a higher setting; at worst you waste a few megabytes of memory. Besides, most databases are initialized from some outside source in the beginning, and data loading does benefit from a higher wal_buffers setting. Ideally, of course, there would be no wal_buffers setting, and WAL buffers would be allocated from shared_buffers pool on demand... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Greg Sabino Mullane wrote: > * The word 'paramters' is still misspelled. :) Corrected for 8.4. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, 2008-06-11 at 11:45 -0400, Bruce Momjian wrote: > Greg Sabino Mullane wrote: > > * The word 'paramters' is still misspelled. :) > > Corrected for 8.4. Technically this is a bug fix... why not backpatch it too? > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + >
Heikki, > Ideally, of course, there would be no wal_buffers setting, and WAL > buffers would be allocated from shared_buffers pool on demand... > +1 --Josh
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Oh, and wal_buffers, the default for which we should just change if it >> weren't for SHMMAX. > > Uh, why? On a workload of mostly small transactions, what value is > there in lots of wal_buffers? Actually, it's also useful for any workload with many connections. Any time you have high throughput, really. We've seen this on DBT2, SpecJAppserver and iGen. --Josh
Josh Berkus wrote: > Heikki, > >> Ideally, of course, there would be no wal_buffers setting, and WAL >> buffers would be allocated from shared_buffers pool on demand... Same for pg_subtrans, pg_clog, etc (as previously discussed) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Robert Lor wrote: > Robert Treat wrote: > > On Wednesday 04 June 2008 22:04:54 Greg Smith wrote: > > > >> I was just talking to someone today about building a monitoring tool for > >> this. Not having a clear way to recommend people monitor use of work_mem > >> and its brother spilled to disk sorts is an issue right now, I'll whack > >> that one myself if someone doesn't beat me to it before I get time. > >> > > > > I remember *years* ago, someone wrote a perl script to poll pgsql_tmp and > > print out anytime something showed up... you could probably find that in the > > archives if you look around. > > > > of course to me this sounds like an excellent idea for a dtrace probe ;-) > > > > > > Actually, you can find out from the sort-end probe now whether or not > the sort spilled to disk and number of disk blocks used. This is one of > the probes from Simon. > > TRACE_POSTGRESQL_SORT_END(state->tapeset, > (state->tapeset ? > LogicalTapeSetBlocks(state->tapeset) : > (state->allowedMem - state->availMem + 1023) / > 1024)); 8.3 has GUC log_temp_files. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Joshua D. Drake wrote: > > > On Wed, 2008-06-11 at 11:45 -0400, Bruce Momjian wrote: > > Greg Sabino Mullane wrote: > > > * The word 'paramters' is still misspelled. :) > > > > Corrected for 8.4. > > Technically this is a bug fix... why not backpatch it too? That might show up as a diff for people doing upgrades where the minor version changed the spelling. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Joshua D. Drake wrote: >> >> On Wed, 2008-06-11 at 11:45 -0400, Bruce Momjian wrote: >>> Greg Sabino Mullane wrote: >>>> * The word 'paramters' is still misspelled. :) >>> Corrected for 8.4. >> Technically this is a bug fix... why not backpatch it too? > > That might show up as a diff for people doing upgrades where the minor > version changed the spelling. People upgrading won't see it, I think. You only see it when you do a new initdb.. //Magnus
Josh Berkus wrote: > Heikki, > >> Ideally, of course, there would be no wal_buffers setting, and WAL >> buffers would be allocated from shared_buffers pool on demand... >> > > +1 > > --Josh > +1 -Jignesh
Dave Page wrote: > On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > > >> I've seen people not doing so more often > >> than you would think. Perhaps because they are DBAs and not sysadmins? I > >> also > >> meant a tool to do things like verify that the changes are valid, as > >> someone > >> else mentioned elsewhere in this thread. > > > > pg_ctl -D data check? > > > > I would +1 that. > > I would also really like to see that - though I'd also like to see an > SQL interface so we can check a config before saving when editing via > pgAdmin or similar. Should this be a TODO? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Wed, Jun 11, 2008 at 6:56 PM, Bruce Momjian <bruce@momjian.us> wrote: > Dave Page wrote: >> On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> wrote: >> >> > pg_ctl -D data check? >> > >> > I would +1 that. >> >> I would also really like to see that - though I'd also like to see an >> SQL interface so we can check a config before saving when editing via >> pgAdmin or similar. > > Should this be a TODO? Yes please. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Dave Page wrote: > On Wed, Jun 11, 2008 at 6:56 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Dave Page wrote: > >> On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > >> > >> > pg_ctl -D data check? > >> > > >> > I would +1 that. > >> > >> I would also really like to see that - though I'd also like to see an > >> SQL interface so we can check a config before saving when editing via > >> pgAdmin or similar. > > > > Should this be a TODO? > > Yes please. Added to TODO: * Add pg_ctl option to do a syntax check of postgresql.conf -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera <alvherre@commandprompt.com> writes: > Josh Berkus wrote: >> Ideally, of course, there would be no wal_buffers setting, and WAL >> buffers would be allocated from shared_buffers pool on demand... > Same for pg_subtrans, pg_clog, etc (as previously discussed) I agree with that for pg_clog and friends, but I'm much more leery of folding WAL into the same framework. Its access pattern is *totally* unlike standard caches, so the argument that this would be good for performance is resting on nothing but imagination. Also I'm concerned about possible deadlocks, because WAL is customarily accessed while holding one or more exclusive buffer locks. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Tom Lane wrote: > >> * Can we build a "configuration wizard" to tell newbies what settings > >> they need to tweak? > > > That would trump all the other suggestions conclusively. Anyone good at > > expert systems? > > How far could we get with the answers to just three questions: > > * How many concurrent queries do you expect to have? > > * How much RAM space are you willing to let Postgres use? > > * How much "overhead" disk space are you willing to let Postgres use? > > concurrent queries drives max_connections, obviously, and RAM space > would drive shared_buffers and effective_cache_size, and both of them > would be needed to size work_mem. The third one is a bit weird but > I don't see any other good way to set the checkpoint parameters. > > If those aren't enough questions, what else must we ask? Or maybe they > aren't the right questions at all --- maybe we should ask "is this a > dedicated machine or not" and try to extrapolate everything else from > what we (hopefully) can find out about the hardware. Having returned from Japan, I read through this thread. It had lots of ideas (new format for postgresql.conf, more/less comments in postgresql.conf) but I didn't see any of the ideas getting a majority. I think we do a good job of making many settings automatic (meaning no one even sees them), but we don't to a great job of making the visible settings easy to set, both in the process (no GUI) and in knowing the proper value. There are two ideas I did think had merit. First, using ## for system-supplied comments, so user comments would be easier to identify. There might be value in doing that even if it were not helpful for scripts. The second idea is the idea of having one parameter depend on another. Not only could we do that for some of our existing parameters, but we could have pseudo-parameters like concurrent_queries, memory_usage, and extra_disk_space that could be at the top of postgresql.conf and then affect the other settings. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > The second idea is the idea of having one parameter depend on another. > Not only could we do that for some of our existing parameters, but we > could have pseudo-parameters like concurrent_queries, memory_usage, and > extra_disk_space that could be at the top of postgresql.conf and then > affect the other settings. We have tried to do that in the past, and it didn't work well *at all*. The idea has a fundamental logical flaw, which is that it's not clear which parameter wins if the user changes both. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > The second idea is the idea of having one parameter depend on another. > > Not only could we do that for some of our existing parameters, but we > > could have pseudo-parameters like concurrent_queries, memory_usage, and > > extra_disk_space that could be at the top of postgresql.conf and then > > affect the other settings. > > We have tried to do that in the past, and it didn't work well *at all*. We have? When? > The idea has a fundamental logical flaw, which is that it's not clear > which parameter wins if the user changes both. Yes, you could get into problems by having variable dependency loops, but I see no way to easily improve configuration without it. My idea was to have: memory_usage = 128MB ## pseudo-parametershared_buffers = $memory_usage * 0.75 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Josh Berkus wrote: >>> Ideally, of course, there would be no wal_buffers setting, and WAL >>> buffers would be allocated from shared_buffers pool on demand... > >> Same for pg_subtrans, pg_clog, etc (as previously discussed) > > I agree with that for pg_clog and friends, but I'm much more leery of > folding WAL into the same framework. Its access pattern is *totally* > unlike standard caches, so the argument that this would be good for > performance is resting on nothing but imagination. Also I'm concerned > about possible deadlocks, because WAL is customarily accessed while > holding one or more exclusive buffer locks. Well it may still be worthwhile stealing buffers from shared_buffers even if we set a special flag marking them as owned by WAL and out of bounds for the normal buffer manager. At least that way we could always steal more if we want or return some, as long as we're careful about when we do it. That would open the door to having these parameters be dynamically adjustable. That alone would be worthwhile even if we bypass all bells and whistles of the buffer manager. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Greg, > At least that way we could always steal more if we want or return some, as > long as we're careful about when we do it. That would open the door to having > these parameters be dynamically adjustable. That alone would be worthwhile > even if we bypass all bells and whistles of the buffer manager. > One hitch, though, is that asynchronous commit could consume big chunks of shared_buffers. So we might still need a limit for people who are using async. --Josh
Magnus Hagander wrote: > Bruce Momjian wrote: > > Joshua D. Drake wrote: > >> > >> On Wed, 2008-06-11 at 11:45 -0400, Bruce Momjian wrote: > >>> Greg Sabino Mullane wrote: > >>>> * The word 'paramters' is still misspelled. :) > >>> Corrected for 8.4. > >> Technically this is a bug fix... why not backpatch it too? > > > > That might show up as a diff for people doing upgrades where the minor > > version changed the spelling. > > People upgrading won't see it, I think. You only see it when you do a > new initdb.. The problem is that people doing initdb with different minor versions will have different stock postgresql.conf files. That isn't a huge problem, but I don't see a need to create the problem just to fix a spelling mistake that few have observed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> The second idea is the idea of having one parameter depend on another. >> We have tried to do that in the past, and it didn't work well *at all*. > We have? When? Just a couple months ago we had to give up enforcing an interrelationship between NBuffers and MaxConnections, because it got too complicated and un-explainable. I seem to recall some other interactions in the distant past, but a quick look through the CVS logs didn't find any smoking guns. >> The idea has a fundamental logical flaw, which is that it's not clear >> which parameter wins if the user changes both. > Yes, you could get into problems by having variable dependency loops, Who said anything about loops? What I am talking about is what happens duringset memory_usage = X; // implicitly sets work_mem = X/100, sayset work_mem = Y;set memory_usage = Z; What is work_mem now, and what's your excuse for saying so, and how will you document the behavior so that users can understand it? (Just to make things interesting, assume that some of the above SETs happen via changing postgresql.conf rather than directly.) If the objective is to make configuration easier to understand, I don't believe that behind-the-scenes changes of configuration values will advance that goal. > but I see no way to easily improve configuration without it. The higher-level concepts should be things that a configuration wizard works with, and then tells you how to set the postmaster parameters. They should not end up in the configure file (unless maybe as comments?) regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I agree with that for pg_clog and friends, but I'm much more leery of >> folding WAL into the same framework. > Well it may still be worthwhile stealing buffers from shared_buffers even if > we set a special flag marking them as owned by WAL and out of bounds for > the normal buffer manager. > At least that way we could always steal more if we want or return some, as > long as we're careful about when we do it. ... and as long as you can acquire the WAL per-buffer management space out of nowhere ... regards, tom lane
"Josh Berkus" <josh@agliodbs.com> writes: > Greg, > >> At least that way we could always steal more if we want or return some, as >> long as we're careful about when we do it. That would open the door to having >> these parameters be dynamically adjustable. That alone would be worthwhile >> even if we bypass all bells and whistles of the buffer manager. >> > > One hitch, though, is that asynchronous commit could consume big chunks of > shared_buffers. So we might still need a limit for people who are using async. Well currently we use a fixed number of fixed-sized buffers, no? I doubt we'll change that even if we take this tact of making wal_buffers resizable by stealing buffers from the buffer manager for precisely the reasons Tom was describing. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
On Wed, 11 Jun 2008, Tom Lane wrote: > Who said anything about loops? What I am talking about is what happens > during > set memory_usage = X; // implicitly sets work_mem = X/100, say > set work_mem = Y; > set memory_usage = Z; > What is work_mem now, and what's your excuse for saying so, and how > will you document the behavior so that users can understand it? > (Just to make things interesting, assume that some of the above SETs > happen via changing postgresql.conf rather than directly.) People are already exposed to issues in this area via things like the include file mechanism. You can think of that two ways. You can say, "there's already problems like this so who cares if there's another one". Or, you can say "let's not add even more confusion like that". Having a mini programming language for setting parameters is interesting and all, and it might be enough to do a good job of handling the basic newbie setup chores. But I don't think it's a complete solution and therefore I find moving in that direction a bit of a distraction; your concerns about ambiguity just amplify that feeling. It's unlikely that will get powerful enough to enable the "one true config file" that just works for everybody. There's too many things that depend a bit on both data access pattern and on overall database size/structure no matter what you do. [If only there were some technology that did workload profiling and set the server parameters based on that. Some sort of dynamic tuning tool; wouldn't that be great? Oh well, that's just a dream right now I guess.] I'm not sure if I've stated this explicitly yet, but I personally have no interest in just solving the newbie problem. I want a tool to help out tuning medium to large installs, and generating a simple config file is absolutely something that should come out of that as a bonus. Anything that just targets the simple installs, though, I'm not very motivated to chase after. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> The idea has a fundamental logical flaw, which is that it's not clear > >> which parameter wins if the user changes both. > > > Yes, you could get into problems by having variable dependency loops, > > Who said anything about loops? What I am talking about is what happens > during > set memory_usage = X; // implicitly sets work_mem = X/100, say > set work_mem = Y; > set memory_usage = Z; My initial thought was that this would behave like a shell script variable, meaning once you set something it would affect all references to it below in postgresql.conf. The problem with that is that we comment out all settings, so there isn't a logical order like you would have in a shell script. I was not thinking of memory_usage implicity changing anything. I figured postgresql.conf would have: memory_usage = 100work_mem = $memory_usage * 0.75 If you change memory_usage via SET, it will not change work_mem at all because you are not re-initializing the variables. I am kind of lost how this would work logically and am willing to think about it some more, but I do think we aren't going to simplify postgresql.conf without such a facility. The big problem I see is that right now everything has a constant default. If we allowed memory_usage to change some of the defaults, how would we signal that we want the variables based on it to change their values? This is your behind-the-scenes problem you mentioned. > What is work_mem now, and what's your excuse for saying so, and how > will you document the behavior so that users can understand it? > (Just to make things interesting, assume that some of the above SETs > happen via changing postgresql.conf rather than directly.) > > If the objective is to make configuration easier to understand, > I don't believe that behind-the-scenes changes of configuration values > will advance that goal. > > > but I see no way to easily improve configuration without it. > > The higher-level concepts should be things that a configuration wizard > works with, and then tells you how to set the postmaster parameters. > They should not end up in the configure file (unless maybe as comments?) I am concerned that each wizzard is going to have to duplicate the same logic each time, and adjust to release-based changes. I thought having the configuration file itself help with setting some values would be helpful. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Greg Smith wrote: > On Wed, 11 Jun 2008, Tom Lane wrote: > > > Who said anything about loops? What I am talking about is what happens > > during > > set memory_usage = X; // implicitly sets work_mem = X/100, say > > set work_mem = Y; > > set memory_usage = Z; > > What is work_mem now, and what's your excuse for saying so, and how > > will you document the behavior so that users can understand it? > > (Just to make things interesting, assume that some of the above SETs > > happen via changing postgresql.conf rather than directly.) > > People are already exposed to issues in this area via things like the > include file mechanism. You can think of that two ways. You can say, > "there's already problems like this so who cares if there's another one". > Or, you can say "let's not add even more confusion like that". > > Having a mini programming language for setting parameters is interesting > and all, and it might be enough to do a good job of handling the basic > newbie setup chores. But I don't think it's a complete solution and > therefore I find moving in that direction a bit of a distraction; your > concerns about ambiguity just amplify that feeling. It's unlikely that > will get powerful enough to enable the "one true config file" that just > works for everybody. There's too many things that depend a bit on both > data access pattern and on overall database size/structure no matter what > you do. You are right that the complete solution is going to have to ask users questions, and my idea of using variables is not going to get us that far. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Also, I'd actually assert that "10" seems to be perfectly adequate for > the majority of users. That is, the number of users where I've > recommended increasing d_s_t for the whole database is smaller than the > number where I don't, and of course we never hear from most users at > all. So I'm pretty happy recommending "Leave the default. If you > encounter problem queries, increase it to 100, and analyse the database. Really? I'm the opposite: I never leave a client's setting at 10, that's just asking for trouble. Making it 100 *after* you encounter problem queries is reactive; I prefer being proactive. Nor is a setting of 10 "perfectly adequate": I think you might be the last person on the lists who thinks so. That train has left the station, we've been trying to decide what a better default should be other than 10, and, more to the point, how to quantitatively measure it. The problem is, you really can't. Sure, you can graph a tiny increase in ANALYZE time and disk space, but there are no stock queries we can use to measure an increase in planning time. Frankly, I'd be shocked if there is any significant difference and all compared to the actual query run time. The orders of magnitude speed up of certain queries when the d_s_t goes above 98 is what spawned my original thread proposing a change to 100: http://markmail.org/message/tun3a3juxlsyjbsw While it's easy to get bogged down in theory about what things d_s_t should measure, the optimal size of buckets, etc., it's still a severe performance regression bug that should be fixed, IMO. Changing the subject line as well: this is only tangentially related to overhauling GUCS, although I'll point out again that this particular config is a good example of one that needs more comments. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200806121213 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhRTKYACgkQvJuQZxSWSsjGvACeJkXZJ8cP385W9UXKzLHdzhvw gqQAoJWdrepFbkxR2be7oetK8/o/yd9I =w469 -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > The orders of magnitude speed up of certain queries when the d_s_t goes > above 98 is what spawned my original thread proposing a change to 100: > http://markmail.org/message/tun3a3juxlsyjbsw That was a pretty special case (LIKE/regex estimation), and we've since eliminated the threshold change in the LIKE/regex estimates anyway, so there's no longer any reason to pick 100 as opposed to any other number. So we're still back at "what's a good value and why?". > Frankly, I'd be shocked if there is any significant difference and all > compared to the actual query run time. I'm still concerned about the fact that eqjoinsel() is O(N^2). Show me some measurements demonstrating that a deep nest of equijoins doesn't get noticeably more expensive to plan --- preferably on a datatype with an expensive equality operator, eg numeric --- and I'm on board. regards, tom lane
"Greg Sabino Mullane" <greg@turnstep.com> writes: >> Also, I'd actually assert that "10" seems to be perfectly adequate for >> the majority of users. That is, the number of users where I've >> recommended increasing d_s_t for the whole database is smaller than the >> number where I don't, and of course we never hear from most users at >> all. So I'm pretty happy recommending "Leave the default. If you >> encounter problem queries, increase it to 100, and analyse the database. > > Really? I'm the opposite: I never leave a client's setting at 10, that's > just asking for trouble. Making it 100 *after* you encounter problem > queries is reactive; I prefer being proactive. Have you ever measured the system speed before and after? > Nor is a setting of 10 "perfectly adequate": What percentage of your plans actually change with the larger statistics? How many for the better? How many were massively improved? I suspect you're looking at some single-digit percentage slowdown for planning across the board. In exchange if you a) have simple queries you probably see none improving. If you b) have moderately complex queries you probably get some single-digit percentage of them with improvements. And if you c) have very complex queries you probably have a handful of them which see massive improvements. Across the internet there are a whole lot more applications of type (a) than the others... > Frankly, I'd be shocked if there is any significant difference and all > compared to the actual query run time. Well you might start preparing to be shocked. Note that retrieving the statistics is a query itself so it's not hard for it to be comparable to a similarly simple query. It's not hard for a simple query using multiple columns to be using more records of statistics than it is from the actual data. And things can look much worse if that data is TOASTed and requires further lookups and/or decompression... > The orders of magnitude speed up of certain queries when the d_s_t goes > above 98 is what spawned my original thread proposing a change to 100: > > http://markmail.org/message/tun3a3juxlsyjbsw > > While it's easy to get bogged down in theory about what things > d_s_t should measure, the optimal size of buckets, etc., it's still > a severe performance regression bug that should be fixed, IMO. It was, three months ago: http://archives.postgresql.org/pgsql-committers/2008-03/msg00129.php 100 is no longer a magic threshold for LIKE queries (in CVS HEAD) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
Bruce, > I am concerned that each wizzard is going to have to duplicate the same > logic each time, and adjust to release-based changes. I think that's a feature, not a bug. Right now, I'm not at all convinced that my algorithms for setting the various major dials are great (I just think that nobody yet has better). So I think we should *encourage* people to write their own wizards until we find one that works reasonably well. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > > > I am concerned that each wizard is going to have to duplicate the same > > logic each time, and adjust to release-based changes. > > I think that's a feature, not a bug. Right now, I'm not at all convinced that > my algorithms for setting the various major dials are great (I just think > that nobody yet has better). So I think we should *encourage* people to > write their own wizards until we find one that works reasonably well. I am thinking a web-based wizard would make the most sense. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, > I am thinking a web-based wizard would make the most sense. I'd prefer command-line, so that people could run it on their own servers. For one thing, we need to generate at least two files on many platforms; a postgresql.conf and a sysctl. -- Josh Berkus PostgreSQL @ Sun San Francisco
Josh Berkus wrote: > Bruce, > > > I am thinking a web-based wizard would make the most sense. > > I'd prefer command-line, so that people could run it on their own servers. > For one thing, we need to generate at least two files on many platforms; a > postgresql.conf and a sysctl. They can just download the files the need from the web page, no? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Jun 12, 2008, at 11:21 AM, Bruce Momjian wrote: > Josh Berkus wrote: >> Bruce, >> >>> I am concerned that each wizard is going to have to duplicate the >>> same >>> logic each time, and adjust to release-based changes. >> >> I think that's a feature, not a bug. Right now, I'm not at all >> convinced that >> my algorithms for setting the various major dials are great (I just >> think >> that nobody yet has better). So I think we should *encourage* >> people to >> write their own wizards until we find one that works reasonably well. > > I am thinking a web-based wizard would make the most sense. There's a definite need for an interactive GUI wizard (bundle with the Windows and OS X installers, at least). And a commandline wizard would certainly be nice, both interactive and non-interactive. Mostly for including in install scripts on unix platforms. And a web-based wizard would be useful too. And all of them would benefit from being able to both modify an existing configuration file, and to generate one from scratch. It looks like it's going to be reasonably easy to abstract away the interface to the user such that the first two (and likely the third) can be built from the same codebase, meaning that the smarts about how to set the various GUC settings (based on RAM available, estimates of database size and usage) can be maintained in one place. Cheers, Steve
On Thu, 12 Jun 2008, Bruce Momjian wrote: > I am thinking a web-based wizard would make the most sense. I have not a single customer I work with who could use an external web-based wizard. Way too many companies have privacy policy restrictions that nobody dare cross by giving out any info about their server, or sometimes that they're even using PostgreSQL inside the firewall. If it's not a tool that you can run on the same server you're running PostgreSQL on, I'd consider that another diversion that's not worth pursuing. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Jun 11, 2008, at 9:34 PM, Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> Tom Lane wrote: >>>> The idea has a fundamental logical flaw, which is that it's not >>>> clear >>>> which parameter wins if the user changes both. >> >>> Yes, you could get into problems by having variable dependency >>> loops, >> >> Who said anything about loops? What I am talking about is what >> happens >> during >> set memory_usage = X; // implicitly sets work_mem = X/100, say >> set work_mem = Y; >> set memory_usage = Z; > > My initial thought was that this would behave like a shell script > variable, meaning once you set something it would affect all > references > to it below in postgresql.conf. The problem with that is that we > comment out all settings, so there isn't a logical order like you > would > have in a shell script. > > I was not thinking of memory_usage implicity changing anything. I > figured postgresql.conf would have: > > memory_usage = 100 > work_mem = $memory_usage * 0.75 > > If you change memory_usage via SET, it will not change work_mem at all > because you are not re-initializing the variables. Why? That's the exact opposite of what I'd expect. If I want a setting's value to be $memory_usage * .75, that's what I want it to be. Not some value based on whatever $memory_usage was set to when work_mem happened to be changed. Of course, if you set something to a hard value with no variables, then that's what it's set to. > I am kind of lost how this would work logically and am willing to > think > about it some more, but I do think we aren't going to simplify > postgresql.conf without such a facility. Agreed. And I think it's a lot more elegant for dealing with day-to- day tuning than some kind of external wizzard. > The big problem I see is that right now everything has a constant > default. If we allowed memory_usage to change some of the > defaults, how > would we signal that we want the variables based on it to change their > values? This is your behind-the-scenes problem you mentioned. I would suggest that we just re-evaluate everything whenever any setting is changed (this assumes that we store postgresql.conf internally in some fashion, so that we're not actually hitting the file all the time and possibly picking up random edits). -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Decibel! <decibel@decibel.org> writes: > On Jun 11, 2008, at 9:34 PM, Bruce Momjian wrote: >> I am kind of lost how this would work logically and am willing to >> think >> about it some more, but I do think we aren't going to simplify >> postgresql.conf without such a facility. > Agreed. And I think it's a lot more elegant for dealing with day-to- > day tuning than some kind of external wizzard. You guys call this "simplification"? You're out of your minds. This proposal is ridiculously complicated, and yet it still fails even to consider adjusting non-numeric parameters. And what about things that require more than a trivial arithmetic expression to compute? It's not hard at all to imagine wanting log, sqrt, etc. We do not need to put computational capability into GUC. Any computations needed to determine a parameter setting should be done by a wizard. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > That was a pretty special case (LIKE/regex estimation), and we've since > eliminated the threshold change in the LIKE/regex estimates anyway, so > there's no longer any reason to pick 100 as opposed to any other number. > So we're still back at "what's a good value and why?". Glad to hear that, although I think this is only in HEAD, not backpatched, right? Well at any rate, I withdraw my strong support for 100 and join in the quest for a good number. The "anything but 10" campaign. > I'm still concerned about the fact that eqjoinsel() is O(N^2). Show me > some measurements demonstrating that a deep nest of equijoins doesn't > get noticeably more expensive to plan --- preferably on a datatype with > an expensive equality operator, eg numeric --- and I'm on board. I hope someone else on the list can do this, because I can't. :) - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200806122054 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhRxToACgkQvJuQZxSWSsj0OwCfel+zN/jQth79RvIHtxpUefQD APMAmQEKIDS6BzqUjn4eTMzP9NDlxTbE =JZTe -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 >> Really? I'm the opposite: I never leave a client's setting at 10, that's >> just asking for trouble. Making it 100 *after* you encounter problem >> queries is reactive; I prefer being proactive. > Have you ever measured the system speed before and after? Yes. No change (see below on caching) or faster (better plans). >> Nor is a setting of 10 "perfectly adequate": > What percentage of your plans actually change with the larger statistics? How > many for the better? How many were massively improved? It matters not if there is a slight increase in planning time: the penalty of choosing a Bad Plan far outweighs any increased analyze or planning cost, period. Are you arguing that 10 is a good default, or just against larger values in general? > I suspect you're looking at some single-digit percentage slowdown for planning > across the board. In exchange if you a) have simple queries you probably see > none improving. If you b) have moderately complex queries you probably get > some single-digit percentage of them with improvements. And if you c) have > very complex queries you probably have a handful of them which see massive > improvements. Across the internet there are a whole lot more applications of > type (a) than the others... I'm still skeptical that it's the case, but I wouldn't mind seeing some figures about how slowed down a "simple" database gets going from 10 to 100 (or larger). Robert, any chance we can use Pagila for some sort of test for that? >> Frankly, I'd be shocked if there is any significant difference and all >> compared to the actual query run time. > Well you might start preparing to be shocked. Note that retrieving the > statistics is a query itself so it's not hard for it to be comparable to a > similarly simple query. It's not hard for a simple query using multiple > columns to be using more records of statistics than it is from the actual > data. And things can look much worse if that data is TOASTed and requires > further lookups and/or decompression... Even if all you say above is true, and I think we'll have to agree to disagree on that, there's an important point to remember: query plans can be (and very often are) cached. Queries and query results cannot (although I'm working on that... :) Plans to queries is a 1-N, where N can be very, very large, and further boosts the query time vs planning time ratio. ... > 100 is no longer a magic threshold for LIKE queries (in CVS HEAD) That's great, but doesn't help existing releases (unless that was backpatched, but I don't recall it if so) But that's a battle I'm going to stop fighting, and concentrate on helping to find a replacement for 10 that may or may not be 100. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200806122100 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkhRyXIACgkQvJuQZxSWSshZpwCeOdLZCu0gSQGpOLciQ6H29Tsd iNgAn3AuoisE8zSbMjLuDL4aWzP6NAth =ujTa -----END PGP SIGNATURE-----
> You guys call this "simplification"? You're out of your minds. > This proposal is ridiculously complicated, and yet it still fails > even to consider adjusting non-numeric parameters. And what about > things that require more than a trivial arithmetic expression to > compute? It's not hard at all to imagine wanting log, sqrt, etc. > > We do not need to put computational capability into GUC. Any > computations needed to determine a parameter setting should be done > by a wizard. +1 (save the comment speculating about anyone's location relative to their mind ;) Additionally, obvious as it may be, there's nothing stopping anyone from developing a tool to generate the configuration file from a more "interesting" source. Whether that's XML or some DSL that supports computations, doesn't matter. I would think if such a tool showed dramatic merit it would provoke another discussion about core integration, but ISTM that leaving it dead simple is best. [mm, hook into the postgres startup script, shouldn't be that hard to administer..]
On Wednesday 11 June 2008 16:54:23 Bruce Momjian wrote: > Dave Page wrote: > > On Wed, Jun 11, 2008 at 6:56 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Dave Page wrote: > > >> On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> wrote: > > >> > pg_ctl -D data check? > > >> > > > >> > I would +1 that. > > >> > > >> I would also really like to see that - though I'd also like to see an > > >> SQL interface so we can check a config before saving when editing via > > >> pgAdmin or similar. > > > > > > Should this be a TODO? > > > > Yes please. > > Added to TODO: > > * Add pg_ctl option to do a syntax check of postgresql.conf > ISTM we need something that can run inside the db as well, i'm thinking something like pg_check_conf() to go with pg_reload_conf(). Also, these should probably check validity of the pg_hba.conf as well. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote: > On Wednesday 11 June 2008 16:54:23 Bruce Momjian wrote: > > Dave Page wrote: > > > On Wed, Jun 11, 2008 at 6:56 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > > Dave Page wrote: > > > >> On Mon, Jun 2, 2008 at 5:46 AM, Joshua D. Drake <jd@commandprompt.com> > wrote: > > > >> > pg_ctl -D data check? > > > >> > > > > >> > I would +1 that. > > > >> > > > >> I would also really like to see that - though I'd also like to see an > > > >> SQL interface so we can check a config before saving when editing via > > > >> pgAdmin or similar. > > > > > > > > Should this be a TODO? > > > > > > Yes please. > > > > Added to TODO: > > > > * Add pg_ctl option to do a syntax check of postgresql.conf > > > > ISTM we need something that can run inside the db as well, i'm thinking > something like pg_check_conf() to go with pg_reload_conf(). Also, these > should probably check validity of the pg_hba.conf as well. Agreed, TODO updated: o Add functions to syntax check configuration files -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Jun 12, 2008, at 17:55, Greg Sabino Mullane wrote: > Glad to hear that, although I think this is only in HEAD, not > backpatched, > right? Well at any rate, I withdraw my strong support for 100 and > join in > the quest for a good number. The "anything but 10" campaign I vote for 11. That's one louda, in'it? Best, David
Greg Smith wrote: > On Thu, 5 Jun 2008, Magnus Hagander wrote: > >> We really need a "proper API" for it, and the stuff in pgAdmin isn't >> even enough to base one on. > > I would be curious to hear your opinion on whether the GUC overhaul > discussed in this thread is a useful precursor to building such a proper > API. I'm getting back to this one now that the discussion has died down a bit. As usual, the discussion spread out into these huge changes needed to be done to satisfy everyone from day one. I don't think that's the way to go - we have to do it piece by piece if that's ever going to be done. IMHO the first thing to do is to create a stable API for modifying config variables remotely. *at this time* it doesn't matter how this API modifies the configuration, if it's in a file on in the db or whatever. We can always change that later... Which is why I will be proceeding with this one - make an API that requires the least possible change for now, per discussions earlier this year (not in this thread) :-) Now, this in itself is unrelated to the stuff Josh was talking about which is collecting the information in one place and being able to generate config files that don't contain everything, and being able to auto-tune things. It would be nice for such a tool to be able to get the full descriptions and such from the pg_settings view or such, which means it needs to go in the code and not in a standalone file, but it's not *as necessary*. I still think the config file we ship today is horrible, I get lots of questions around it, and I see lots of examples of people who tweak settings they have no idea what they do, just because it's there. That'sa separate issue that could *either* be solved byshipping more than one default config file, or it could be solved by the config file generator Josh proposed. //Magnus
Added to TODO: o Add external tool to auto-tune some postgresql.conf parameters http://archives.postgresql.org/pgsql-hackers/2008-06/msg00000.php --------------------------------------------------------------------------- Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Tom Lane wrote: > >> * Can we build a "configuration wizard" to tell newbies what settings > >> they need to tweak? > > > That would trump all the other suggestions conclusively. Anyone good at > > expert systems? > > How far could we get with the answers to just three questions: > > * How many concurrent queries do you expect to have? > > * How much RAM space are you willing to let Postgres use? > > * How much "overhead" disk space are you willing to let Postgres use? > > concurrent queries drives max_connections, obviously, and RAM space > would drive shared_buffers and effective_cache_size, and both of them > would be needed to size work_mem. The third one is a bit weird but > I don't see any other good way to set the checkpoint parameters. > > If those aren't enough questions, what else must we ask? Or maybe they > aren't the right questions at all --- maybe we should ask "is this a > dedicated machine or not" and try to extrapolate everything else from > what we (hopefully) can find out about the hardware. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
<div dir="ltr">Hi there... Configuration autotuning is something I am really interested in.<br />I have seen this page: <ahref="http://wiki.postgresql.org/wiki/GUCS_Overhaul">http://wiki.postgresql.org/wiki/GUCS_Overhaul</a> and<br /> a coupleof emails mentioning this, so I wanted to ask is someone already<br />on it? If yes, I'd like to contribute.<br /><br/>Ideally, an external little app should also provide recommendations based<br />on current database usage statistics-- wouldn't this constitute something<br /> akin to application-specific advice? In this regard, the tool couldalso tell<br />you things like how much RAM you should order if you were to replace your<br />existing database server:-)<br /><br />Michael<br /><br /></div>
On Wed, 13 Aug 2008, Michael Nacos wrote: > Hi there... Configuration autotuning is something I am really interested in. > I have seen this page: http://wiki.postgresql.org/wiki/GUCS_Overhaul and > a couple of emails mentioning this, so I wanted to ask is someone already > on it? If yes, I'd like to contribute. Good time to give a status report on what's been going on with all this. With some help I just finished off an answer to problem #1 there recently, "Most people have no idea how to set these". There was some concern here that work was being done on config tools without a clear vision of what was going to be tuned. See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for an intro on how to set the 18 most important parameters (+7 logging parameters) based on the best information I'm aware of. Circa June, Steve Atkins was looking into writing a C++/Qt GUI tuning interface application, with the idea that somebody else would figure out the actual smarts to the tuning effort. Don't know where that's at. Josh Berkus and I have been exchanging some ideas for the GUC internals overhaul and had a quick discussion about that in person last month. We've been gravitating toward putting all the extra information we'd like to push into there in an extra catalog table (pg_settings_info or something). The stuff the server needs to start can stay right where it is right now, all the other decoration can move to the table. > Ideally, an external little app should also provide recommendations based > on current database usage statistics -- wouldn't this constitute something > akin to application-specific advice? Yes, there's a grand plan for a super-wizard that queries the database for size, index, and statistics information for figure out what to do; I've been beating that drum for a while now. Unfortunately, the actual implementation is blocked behind the dreadfully boring work of sorting out how to organize and manage the GUC information a bit better, and the moderately boring work of building a UI for modifying things. If you were hoping to work on the sexy autotuning parts without doing some of the grunt work, let me know if you can figure out how so I can follow you. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > Josh Berkus and I have been exchanging some ideas for the GUC internals > overhaul and had a quick discussion about that in person last month. > We've been gravitating toward putting all the extra information we'd like > to push into there in an extra catalog table (pg_settings_info or > something). The stuff the server needs to start can stay right where it > is right now, all the other decoration can move to the table. Somehow, the attraction of that idea escapes me. What we have now was named Grand Unified Configuration for a reason: it centralized the handling of what had been a mess of different things configured in different ways. I'm not eager to go backwards on that. I'm also interested to know exactly what such a table would provide that isn't already available in the form of the pg_settings view. regards, tom lane
On Sun, 17 Aug 2008, Tom Lane wrote: > What we have now was named Grand Unified Configuration for a reason: > it centralized the handling of what had been a mess of different things > configured in different ways. I'm not eager to go backwards on that. No need to change anything related to how the configuration is done. There's really only two things wrong with what's there right now IMHO and they don't require any changes to the internals, just what's shown: 1) The view should show both how the user defined the setting and how it's represented internally. Basically something that looks like this: select name,current_setting(name) as input_setting,setting from pg_settings; 2) Expose the default value. > I'm also interested to know exactly what such a table would provide > that isn't already available in the form of the pg_settings view. Links to the relevant documentation and a place to save both default and user comments about the setting were two things being considered that seemed a really bad fit to tack onto the GUC structure. There's some others. The main point is that that nobody wants to have to tinker with the core GUC itself just to decorate it with more information, that is complicated enough as it is. One might make a case that the stuff the GUC must handle (settings, units, type, defaults, etc.) could be usefully separated from all the more documentation-oriented bits stored there right now (category, descriptions), and that the existing documentation bits could move over to the table along with the hyperlinks and such. Doing that adds another place to have to edit, but I think there's an even exchange available there because it enables easy auto-generation of the postgresql.conf file at initdb time from that table + pg_settings. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Aug 17, 2008, at 1:48 PM, Greg Smith wrote: > On Wed, 13 Aug 2008, Michael Nacos wrote: > >> Hi there... Configuration autotuning is something I am really >> interested in. >> I have seen this page: http://wiki.postgresql.org/wiki/ >> GUCS_Overhaul and >> a couple of emails mentioning this, so I wanted to ask is someone >> already >> on it? If yes, I'd like to contribute. > > Good time to give a status report on what's been going on with all > this. > > With some help I just finished off an answer to problem #1 there > recently, "Most people have no idea how to set these". There was > some concern here that work was being done on config tools without a > clear vision of what was going to be tuned. See http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server > for an intro on how to set the 18 most important parameters (+7 > logging parameters) based on the best information I'm aware of. > > Circa June, Steve Atkins was looking into writing a C++/Qt GUI > tuning interface application, with the idea that somebody else would > figure out the actual smarts to the tuning effort. Don't know where > that's at. First pass is done. Needs a little cleanup before sharing. I spent a fair while down OS-specific-hardware-queries rathole, but I'm better now. Cheers, Steve
Steve, > First pass is done. Needs a little cleanup before sharing. I spent a > fair while down OS-specific-hardware-queries rathole, but I'm better now. Gods, I hope you gave up on that. You want to use SIGAR or something. --Josh
Josh Berkus wrote: > Steve, > >> First pass is done. Needs a little cleanup before sharing. I spent a >> fair while down OS-specific-hardware-queries rathole, but I'm better now. > > Gods, I hope you gave up on that. You want to use SIGAR or something. If it's going to be C++, and reasonably cross platform, and a pg tool, why not try to build something as a module in pgAdmin? Certainly going to get you a larger exposure... And I'm sure the pgAdmin team would be happy to have it! //Magnus
<div dir="ltr">What I'm interested in is auto-tuning, not necessarily overhauling GUCS, which happens to be the subject ofthis thread :-)<br />Having done a SELECT * FROM pg_settings, all the information you need seems to be there...<br /> MaybeI'm being over-simplistic here, but the important bit is knowing how you should tune stuff - and this is what I'm hopingto learn through this process.<br />Now, you could probably sidestep UI and GUCS concerns by moving the auto-tuningprocess inside the database. You don't need fancy GUIs for guessing configuration parameters, and if you can alreadydo that, coming up with a GUI should be pretty straightforward.<br /> For example, I see no reason why you couldn'tcapture the logic of tuning in a couple of PL/Python functions to look up usage stats, size of indices etc. PL/Pythonbeing an "untrusted" language, you could even write a new postgresql.conf file to disk, with the suggested alterations.Cheap, quick and cheerful!<br /><br />Perhaps the auto-tuning conversation should take place in a separate thread,how do you feel about changing the subject line? The most insteresting bit is discussing and testing tuning strategies.This, of course, is related to the [admin] and [perform] side of things, but there is also a development dimension.As soon as there is a tuning strategy published, a number of tools will certainly follow.<br /><br />Michael<br/><br /><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204,204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Yes, there's a grand plan for a super-wizard that queries the databasefor size, index, and statistics information for figure out what to do; I've been beating that drum for a while now. Unfortunately, the actual implementation is blocked behind the dreadfully boring work of sorting out how to organizeand manage the GUC information a bit better, and the moderately boring work of building a UI for modifying things. If you were hoping to work on the sexy autotuning parts without doing some of the grunt work, let me know if youcan figure out how so I can follow you.<br /><font color="#888888"><br /></font></blockquote></div><br /></div>
On Aug 18, 2008, at 1:05 AM, Magnus Hagander wrote: > Josh Berkus wrote: >> Steve, >> >>> First pass is done. Needs a little cleanup before sharing. I spent a >>> fair while down OS-specific-hardware-queries rathole, but I'm >>> better now. >> >> Gods, I hope you gave up on that. You want to use SIGAR or >> something. > > If it's going to be C++, and reasonably cross platform, and a pg tool, > why not try to build something as a module in pgAdmin? Certainly going > to get you a larger exposure... And I'm sure the pgAdmin team would be > happy to have it! I'm attempting to build it as something that can be used in several places. Where there's most need for it is as an install time option in installers, particularly on Windows. There's no reason the same underlying code couldn't also go into pgAdmin, of course. At the moment the code is a bit Qt specific, reducing that is part of the cleanup. Cheers, Steve
Steve Atkins wrote: > > On Aug 18, 2008, at 1:05 AM, Magnus Hagander wrote: > >> Josh Berkus wrote: >>> Steve, >>> >>>> First pass is done. Needs a little cleanup before sharing. I spent a >>>> fair while down OS-specific-hardware-queries rathole, but I'm better >>>> now. >>> >>> Gods, I hope you gave up on that. You want to use SIGAR or something. >> >> If it's going to be C++, and reasonably cross platform, and a pg tool, >> why not try to build something as a module in pgAdmin? Certainly going >> to get you a larger exposure... And I'm sure the pgAdmin team would be >> happy to have it! > > I'm attempting to build it as something that can be used in several > places. Where there's most need for it is as an install time option > in installers, particularly on Windows. Well, if it was in pgadmin, it would be there more or less by default on Windows. And very easy to get in on other platforms, since pgadmin is already packaged there. Plus, all the dependencies are already there on said platforms. //Magnus
On Mon, 18 Aug 2008, Michael Nacos wrote: > Having done a SELECT * FROM pg_settings, all the information you need > seems to be there... See http://archives.postgresql.org/pgsql-hackers/2008-06/msg00209.php You sound like you're at rung 2 on the tool author ladder I describe there, still thinking about the fun part of tuning but not yet aware of the annoying postgresql.conf management issues that show up in the field that motivate many of the GUCS changes suggested. Coping with user and system-generated comments is one difficult part that people normally don't consider, dealing with bad settings the server won't start with is another. I did make one mistake in that message, which is that the "context" field of pg_settings already exposes when a setting can be changed. And it is possible to get the value for a setting as entered by the admin by joining pg_settings against what current_setting returns, which is one part of handling the import/change/export cycle while keeping useful units intact. > Maybe I'm being over-simplistic here, but the important bit is knowing > how you should tune stuff - and this is what I'm hoping to learn through > this process. The tuning references at the bottom of http://wiki.postgresql.org/wiki/GUCS_Overhaul provide more detail here than anyone has been able to automate so far. There's certainly room to improve on the suggestions there with some introspection of the database, I'm trying to stay focused on something to help newbies whose settings are completely wrong first. > As soon as there is a tuning strategy published, a number of tools will > certainly follow. Josh Berkus published one in 2005 and zero such tools have been produced since then, even though it looked to him then (like it does to you now and like it did to me once) that such a tool would easily follow: http://pgfoundry.org/docman/?group_id=1000106 The bright side here is that you don't have to waste time tinkering in this area to find out where the dead ends are like Josh and I independantly did. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
"Greg Smith" <gsmith@gregsmith.com> writes: > On Mon, 18 Aug 2008, Michael Nacos wrote: > >> Having done a SELECT * FROM pg_settings, all the information you need seems >> to be there... > > See http://archives.postgresql.org/pgsql-hackers/2008-06/msg00209.php You sound > like you're at rung 2 on the tool author ladder I describe there, still > thinking about the fun part of tuning but not yet aware of the annoying > postgresql.conf management issues that show up in the field that motivate many > of the GUCS changes suggested. Coping with user and system-generated comments > is one difficult part that people normally don't consider, Because coping with free-form user-edited text is a losing game. People don't consider it because it's a dead-end. Instead you have one file for user-edited configuration and a separate file for computer generated configuration. You never try to automatically edit a user-edited file -- that way lies madness. > dealing with bad settings the server won't start with is another. A tuning interface can't be turing complete and detect all possible misconfigurations. To do that it would have to be as complex as the server. In any case worrying about things like this before you have a tuning interface that can do the basics is putting the cart before the horse. >> As soon as there is a tuning strategy published, a number of tools will >> certainly follow. > > Josh Berkus published one in 2005 and zero such tools have been produced since > then, even though it looked to him then (like it does to you now and like it > did to me once) that such a tool would easily follow: > http://pgfoundry.org/docman/?group_id=1000106 The entire target market for such a thing is DBAs stuck on hosted databases which don't have shell access to their machines. Perhaps the overlap between that and the people who can write a server-side module which dumps out a config file according to some rules is just too small? I do think you and others make it less likely every time you throw up big insoluble problems like above though. As a consequence every proposal has started with big overly-complex solutions trying to solve all these incidental issues which never go anywhere instead of simple solutions which directly tackle the main problem. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Greg, > The entire target market for such a thing is DBAs stuck on hosted > databases which don't have shell access to their machines. That's incorrect. The main reason for having a port-based API (such as the SQL command line) for managing your server is that it makes it much easier to manage a large number of servers. Right now, if you want to survey your databases, tables, approx disk space, query activity, etc., you can do that all through port 5432. You can't manage most of your server settings that way, and definitely can't manage the *persistent* settings. When you're trying to manage 1000 PostgreSQL servers, this is not a minor issue. With the growing "cloud" sector, the lack of easy server parameter management is hurting PostgreSQL's adoption for hosted applications. This isn't a new complaint, and is a big part of the reason why 90% of web hosts still don't offer PostgreSQL. I've heard complaints about our manageability problems from more vendors than I can count. HOWEVER, it's completely possible to get a 1st-generation config tool out there without first implementing port-based config access. For one thing, there's Puppet. So that's what I'm intending to do. > I do think you and others make it less likely every time you throw up > big insoluble problems like above though. It's not an insoluble problem. It's a political problem; several people don't want to add this functionality to the project. > As a consequence every > proposal has started with big overly-complex solutions trying to solve > all these incidental issues which never go anywhere instead of simple > solutions which directly tackle the main problem. What, in your opinion, is "the main problem"? I'm not sure we agree on that. -- --Josh Josh Berkus PostgreSQL San Francisco
Gregory Stark <stark@enterprisedb.com> writes: > The entire target market for such a thing is DBAs stuck on hosted databases > which don't have shell access to their machines. Perhaps the overlap between > that and the people who can write a server-side module which dumps out a > config file according to some rules is just too small? There's a veritable boatload of stuff we do that assumes shell access (how many times have you seen cron jobs recommended, for instance?). So I'm unconvinced that "modify the config without shell access" is really a goal that is worth lots of effort. In any case, there's already adequate support for sucking postgresql.conf out of the machine and putting it back: pg_read_file(), pg_file_write(), pg_reload_conf(). So at the end of the day remote access isn't a factor in this at all. > I do think you and others make it less likely every time you throw up big > insoluble problems like above though. As a consequence every proposal has > started with big overly-complex solutions trying to solve all these incidental > issues which never go anywhere instead of simple solutions which directly > tackle the main problem. The impression I get every time this comes up is that various people have different problems they want to solve that (they think) require redesign of the way GUC works. Those complicated solutions arise from attempting to satisfy N different demands simultaneously. The fact that many of these goals aren't subscribed to by the whole community to begin with doesn't help to ease resolution of the issues. regards, tom lane
"Josh Berkus" <josh@agliodbs.com> writes: > Greg, > >> The entire target market for such a thing is DBAs stuck on hosted >> databases which don't have shell access to their machines. > > That's incorrect. The main reason for having a port-based API (such as the > SQL command line) for managing your server is that it makes it much easier > to manage a large number of servers. Right now, if you want to survey > your databases, tables, approx disk space, query activity, etc., you can > do that all through port 5432. You can't manage most of your server > settings that way, and definitely can't manage the *persistent* settings. > When you're trying to manage 1000 PostgreSQL servers, this is not a minor > issue. This I don't understand. If you're managing lots of servers running lots of software the last thing you want to have to do is write a custom method for updating the configuration of each service. In that use case you would prefer to just use rsync/svn/git to push the new config file to all the machines anyways. > With the growing "cloud" sector, the lack of easy server parameter > management is hurting PostgreSQL's adoption for hosted applications. This > isn't a new complaint, and is a big part of the reason why 90% of web > hosts still don't offer PostgreSQL. I've heard complaints about our > manageability problems from more vendors than I can count. These are both use cases which fall in the category I described where you want to allow users to configure the system through an automated interface. We can do that today by generating the automatically generated section and including that in postgresql.conf as an include file. >> As a consequence every proposal has started with big overly-complex >> solutions trying to solve all these incidental issues which never go >> anywhere instead of simple solutions which directly tackle the main >> problem. > > What, in your opinion, is "the main problem"? I'm not sure we agree on > that. The main problem that I've seen described is what I mentioned before: allowing adjusting the postgresql.conf GUC settings by remote users who don't have shell access. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Greg, > The main problem that I've seen described is what I mentioned before: > allowing adjusting the postgresql.conf GUC settings by remote users who > don't have shell access. Oh, ok. I think we're in agreement, though. I don't think that's the *1st* problem to be solved, but it's definitely important. -- --Josh Josh Berkus PostgreSQL San Francisco
On Mon, Aug 18, 2008 at 8:51 PM, Gregory Stark <stark@enterprisedb.com> wrote: > The main problem that I've seen described is what I mentioned before: allowing > adjusting the postgresql.conf GUC settings by remote users who don't have > shell access. Which pgAdmin has done perfectly well for years, as long as the config is all in one file. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
On Mon, 18 Aug 2008, Gregory Stark wrote: > Because coping with free-form user-edited text is a losing game. People > don't consider it because it's a dead-end. Right, that's impossible technology to build, which is why I had to plan all these screen shots showing tools that handle that easily for Apache's very similar configuration file: http://www.apache-gui.com/apache-windows.html http://kochizz.sourceforge.net/quelques-captures-decran/ > Instead you have one file for user-edited configuration and a separate > file for computer generated configuration. It wouldn't be so difficult if the system generated postgresql.conf didn't have all this extra junk in it, which is part of what an overhaul plans to simplify. The way the file gets spit out right now encourages some of the awful practices people develop in the field. > A tuning interface can't be turing complete and detect all possible > misconfigurations. To do that it would have to be as complex as the server. Thank you for supporting the case for why changes need to be to the server code itself, to handle things like validating new postgresql.conf files before they get loaded. I try not to bring that up lest it complicate things further. > The entire target market for such a thing is DBAs stuck on hosted > databases which don't have shell access to their machines. I've never setup a hosted database on a system I don't have shell access to, so I have no idea where you get the impression that was a primary goal of anything I've said. It just so happens that improving what tuning you can do over port 5432 helps that crowd out too, that's a bonus as I see it. > Ask me about EnterpriseDB's On-Demand Production Tuning ...nah, too easy, I'll just let that go. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Dave Page wrote: > On Mon, Aug 18, 2008 at 8:51 PM, Gregory Stark <stark@enterprisedb.com> wrote: >> The main problem that I've seen described is what I mentioned before: allowing >> adjusting the postgresql.conf GUC settings by remote users who don't have >> shell access. > > Which pgAdmin has done perfectly well for years, as long as the config > is all in one file. I'll argue it's not done it perfectly well (it's not particularly user-friendly), but it has certainly *done* it... //Magnus
On 8/18/08, Magnus Hagander <magnus@hagander.net> wrote: > Dave Page wrote: >> On Mon, Aug 18, 2008 at 8:51 PM, Gregory Stark <stark@enterprisedb.com> >> wrote: >>> The main problem that I've seen described is what I mentioned before: >>> allowing >>> adjusting the postgresql.conf GUC settings by remote users who don't have >>> shell access. >> >> Which pgAdmin has done perfectly well for years, as long as the config >> is all in one file. > > I'll argue it's not done it perfectly well (it's not particularly > user-friendly), but it has certainly *done* it... I mean it's able to read & write the config file correctly. I agree the ui is, umm, sub-optimal. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com
Tom Lane escribió: > Gregory Stark <stark@enterprisedb.com> writes: > > The entire target market for such a thing is DBAs stuck on hosted databases > > which don't have shell access to their machines. Perhaps the overlap between > > that and the people who can write a server-side module which dumps out a > > config file according to some rules is just too small? > > There's a veritable boatload of stuff we do that assumes shell access > (how many times have you seen cron jobs recommended, for instance?). > So I'm unconvinced that "modify the config without shell access" > is really a goal that is worth lots of effort. Actually, lots of people are discouraged by suggestions of using cron to do anything. The only reason cron is suggested is because we don't have any other answer, and for many people it's a half-solution. An integrated task scheduler in Pg would be more than welcome. Also, remember that pgAdmin already comes with a pgAgent thing. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Tom Lane escribió: >> Gregory Stark <stark@enterprisedb.com> writes: >>> The entire target market for such a thing is DBAs stuck on hosted databases >>> which don't have shell access to their machines. Perhaps the overlap between >>> that and the people who can write a server-side module which dumps out a >>> config file according to some rules is just too small? >> There's a veritable boatload of stuff we do that assumes shell access >> (how many times have you seen cron jobs recommended, for instance?). >> So I'm unconvinced that "modify the config without shell access" >> is really a goal that is worth lots of effort. > > Actually, lots of people are discouraged by suggestions of using cron to > do anything. The only reason cron is suggested is because we don't have > any other answer, and for many people it's a half-solution. An > integrated task scheduler in Pg would be more than welcome. Yes, I hear a lot of people complaining aobut that too. Now that we have a working autovacuum, some of it goes away though - no need to cron your VACUUMs in most cases anymore. But there are still backups - but they are often managed by the scheduler of an enterprise backup software. > Also, remember that pgAdmin already comes with a pgAgent thing. Yeah, it's a real life-saver on Windows where the builtin task-scheduler isn't as readily accessible or easy to use.. //Magnus
Am Monday, 18. August 2008 schrieb Josh Berkus: > Right now, if you want to survey > your databases, tables, approx disk space, query activity, etc., you can > do that all through port 5432. You can't manage most of your server > settings that way, and definitely can't manage the *persistent* settings. > When you're trying to manage 1000 PostgreSQL servers, this is not a minor > issue. Some of that effort could go into making less settings persistent.
Am Monday, 18. August 2008 schrieb Tom Lane: > The impression I get every time this comes up is that various people > have different problems they want to solve that (they think) require > redesign of the way GUC works. Those complicated solutions arise from > attempting to satisfy N different demands simultaneously. Which may be the reason that I have been getting the impression that the "Problems" and the proposed resolutions on http://wiki.postgresql.org/wiki/GUCS_Overhaul are not really closely related. I can agree with the Problems, but then I am lost.