Thread: postgresql.conf
I have been thinking about postgresql.conf and I have made a few posts already about it. My concern is that Postgres, as default, is not very well tuned. One can even say the default is pretty much a bad configuration. My idea of a postgresql.conf "cookbook" seemed not such a good idea, some liked it, some did not, and the discussion degenerated into a discussion about the values. How about this: we have just just two or three default configuration files? Compact, Workstation, and server. "Compact" would be the current postgresql.conf. "Workstation" would boost the number of buffers, sort and vacuum memory, in essence Postgres would be configured to use about 64M~128M efficiently, maybe limited to 16 backends. Say 4096 buffers, Sort mem setting of 4096. "Server" would have a huge number of buffers, large numbers for sort, and a boosted vacuum memory. A server would be assumed to have lots of memory and be limited to 128 backends. Say 65536 buffers, Sort memory of 32768. We could also tune some of the optimizer parameters. For instance, on "Server" rand_page_cost would be lower because of concurrent disk operations. We could even try to tune some of the wal settings accordingly. I know these things are all documented, and shame on the dba for not reading the documentation, but all the help we can give to someone new to PostgreSQL makes it that much more likely that they will be able to use it successfully. Besides, if we put out a small number of specific versions of postgresql.conf, a more focused feedback about performence issues on optimization can be obtained.
mlw <markw@mohawksoft.com> writes: > "Server" would have a huge number of buffers, Do you have any evidence whatsoever that that's actually a good idea? Certainly the existing default configuration is ridiculously cramped for modern machines. But I've always felt that NBuffers somewhere in the low thousands should be plenty. If you have lots of main memory then the kernel can be expected to use it for kernel-level disk buffering, which should be nearly as good as buffering inside Postgres. (Maybe better, considering that we have some routines that scan all our buffers linearly ...) Moreover, if you request a huge chunk of shared memory then you run a significant risk that the kernel will decide to start swapping parts of it, at which point it definitely becomes a loser. Swapping a dirty buffer out and back in before it finally gets written to disk is counterproductive. You want to keep the number of buffers small enough that they all stay pretty "hot" in the swapper's eyes. Basically, I think that it's best to give the kernel plenty of elbow room to deal with memory pressures on its own terms. Even on a machine that's nominally dedicated to running Postgres. Awhile back I suggested raising the default configuration to 1000 or so buffers, which would be slightly less silly than the current default even if it's not optimal. Didn't get much feedback about the idea though. regards, tom lane
Tom Lane wrote: > mlw <markw@mohawksoft.com> writes: > > "Server" would have a huge number of buffers, > > Do you have any evidence whatsoever that that's actually a good idea? > > Certainly the existing default configuration is ridiculously cramped > for modern machines. But I've always felt that NBuffers somewhere in > the low thousands should be plenty. If you have lots of main memory > then the kernel can be expected to use it for kernel-level disk > buffering, which should be nearly as good as buffering inside Postgres. > (Maybe better, considering that we have some routines that scan all our > buffers linearly ...) Moreover, if you request a huge chunk of shared > memory then you run a significant risk that the kernel will decide to > start swapping parts of it, at which point it definitely becomes a > loser. Swapping a dirty buffer out and back in before it finally gets > written to disk is counterproductive. You want to keep the number of > buffers small enough that they all stay pretty "hot" in the swapper's > eyes. I can't speak about routines that scan buffers linearly, but I have noticed HUGH performance gains by increasing the number of buffers. Queries that normally hit a few thousand blocks, can have a cache hit rate of about 80%. (Very query specific I know) Sort memory is also a huge gain for large queries. I understand that there is a point of diminishing returns, cache management vs disk access. Cache too large and poorly managed costs more than disk. I'm not sure I've hit that point yet. On an SMP machine, it seems that a CPU bottle neck is better than an I/O bottleneck. The CPU bottleneck is scalable, where as an I/O bottleneck is not. Perhaps on a single process machine, fewer buffers would be more appropriate. > > Basically, I think that it's best to give the kernel plenty of elbow > room to deal with memory pressures on its own terms. Even on a machine > that's nominally dedicated to running Postgres. In our database systems we have 1G of ram. Postgres is configured to use about 1/4~1/2. (Our number of buffers is 32768). Our sort memory is 32768. > Awhile back I suggested raising the default configuration to 1000 or > so buffers, which would be slightly less silly than the current default > even if it's not optimal. Didn't get much feedback about the idea > though. (I am really glad we are talking about this. ) Surely your recommendation of using 1000 buffers is a great step. 8M of shared memory on a modern system is trivial, and would make a huge impact. Sort memory also seems exceedingly small as well, when machines ship with a practical minimum of 256M RAM, and a probable 512M~1G, 512K seems like a very small sort size. Regardless of the actual numbers, I still think that more than one "default" needs to be defined. I would bet that Postgres runs as much as a stand-alone server as it does as a workstation database ala Access.
mlw writes: > I have been thinking about postgresql.conf and I have made a few posts already > about it. My concern is that Postgres, as default, is not very well tuned. One > can even say the default is pretty much a bad configuration. The default configuration is mainly guided by three sometimes contradictory aspects: It should be reasonably secure, "unusual" or non-standard features are turned off, and resources are regulated so that it is easy to "try out" PostgreSQL without having to do major kernel tuning first or bringing other applications down to their knees. I think the default settings for most parameters are not really disputed, it's only the performance and resource-related settings that you want to work on. > How about this: we have just just two or three default configuration files? > Compact, Workstation, and server. Trying to elimate the one-size-does-not-fit-all problem with N-sizes-fit-all cannot be an optimal idea considering the dimensionality of the space of possible configurations. If all you're concerned about is buffers and sort memory it's much easier to say "configure buffers to use 1/4 of available memory" than to make arbitrary guesses about the available memory and attach arbitrary labels to them. Theoretically, it should be possible to determine optimal values for all performance-related settings from a combination of benchmarks, a few questions asked of the user about the system configuration and the expected work load, and a dynamic analysis of the nature of the data. But a system of formulas describing these relationsships is incredibly difficult to figure out and solve. If it weren't, we could get rid of all these settings and allocate resources dynamically at run time. What we ought to do, however, is to collect and document empirical methods for tuning, such as the above "1/4 of available memory" rule (which does not claim to be correct, btw.). -- Peter Eisentraut peter_e@gmx.net
> What we ought to do, however, is to collect and document empirical methods > for tuning, such as the above "1/4 of available memory" rule (which does > not claim to be correct, btw.). What would be interesting is to have a program that prompted the user, checked some system values, and modified postgresql.conf accordingly. The major problem with this idea is that I don't know of a good way to determine the proper values programmatically, let alone portably. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 19 Nov 2001 16:04:10 -0500 (EST), you wrote: >> What we ought to do, however, is to collect and document empirical methods >> for tuning, such as the above "1/4 of available memory" rule (which does >> not claim to be correct, btw.). > >What would be interesting is to have a program that prompted the user, >checked some system values, and modified postgresql.conf accordingly. I vaguely remember Oracle had an out-of-the-box choice for a small, medium or large installation, with small being the default. This changed a bunch of parameters, including the RDBMS block buffers, shared global area, max open cursors and such. This can be implemented as an (installation time) config utility on top of the current fundamental parameters. Regards, René Pijlman <rene@lab.applinet.nl>
Since I proposed three postgresql.conf configuration files, I will start by suggesting some settings different from the default: (Any additions or corrections would be greatly appreciated.) Compact: The current postgresql.conf Workstation: tcpip_socket = true max_connections = 32 shared_buffers = 1024 sort_mem = 8192 random_page_cost = 2 Server: tcpip_socket = true max_connections = 128 shared_buffers = 8192 sort_mem = 16384 random_page_cost = 1 The random_page_cost is changed because of an assumption that the bigger systems will be more busy. The more busy a machine is doing I/O the lower the differential between a sequential and random access. ("sequential" to the application is less likely sequential to the physical disk.) I'd like to open a debate about the benefit/cost of shared_buffers. The question is: "Will postgres' management of shared buffers out perform O/S cache? Is there a point of diminishing return on number of buffers? If so, what? Sort memory makes a huge impact on queries. If you got the memory, use it. These are just ballpark settings, I don't even know how good they are. The problem is that server environments differ so greatly that there is no right answer. I am just really concerned that the newbe PostgreSQL user will assume the performance they see with the default settings are what they will judge PostgreSQL.
mlw writes: > These are just ballpark settings, I don't even know how good they are. The problem > is that server environments differ so greatly that there is no right answer. Which is why this is clearly not a solution. > I am just really concerned that the newbe PostgreSQL user will assume > the performance they see with the default settings are what they will > judge PostgreSQL. For this kind of "newbie", the kind that doesn't read the documentation, this would only make it worse, because they'd assume that by making the choice between three default configurations they've done an adequate amount of tuning. Basically, you'd exchange, "I can't find any tuning information, but it's slow" for "I did all the tuning and it's still slow". Not a good choice. The bottom line is that you *must* edit postgresql.conf in order to tune your server. If this editing is simplified it doesn't matter what the default is. -- Peter Eisentraut peter_e@gmx.net
> > I am just really concerned that the newbe PostgreSQL user will assume > > the performance they see with the default settings are what they will > > judge PostgreSQL. > > For this kind of "newbie", the kind that doesn't read the documentation, > this would only make it worse, because they'd assume that by making the > choice between three default configurations they've done an adequate > amount of tuning. Basically, you'd exchange, "I can't find any tuning > information, but it's slow" for "I did all the tuning and it's still > slow". Not a good choice. > > The bottom line is that you *must* edit postgresql.conf in order to tune > your server. If this editing is simplified it doesn't matter what the > default is. Is it possible to probe the machine with and update postgresql.conf automatically? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut wrote: > > mlw writes: > > > These are just ballpark settings, I don't even know how good they are. The problem > > is that server environments differ so greatly that there is no right answer. > > Which is why this is clearly not a solution. Sometimes an incomplete solution, or even a grossly poor solution, which addresses a problem, is better than no solution what so ever. > > > I am just really concerned that the newbe PostgreSQL user will assume > > the performance they see with the default settings are what they will > > judge PostgreSQL. > > For this kind of "newbie", the kind that doesn't read the documentation, > this would only make it worse, because they'd assume that by making the > choice between three default configurations they've done an adequate > amount of tuning. Basically, you'd exchange, "I can't find any tuning > information, but it's slow" for "I did all the tuning and it's still > slow". Not a good choice. I think sort of thinking will not help the end user at all. Offering a choice of three less badly tuned configuration files will probably produce a better user experience than one very badly tuned file. > > The bottom line is that you *must* edit postgresql.conf in order to tune > your server. If this editing is simplified it doesn't matter what the > default is. I don't think this is true at all. Making buffers and sort larger numbers, will improve performance dramatically. I would also bet that most users NEVER see the postgresql.conf file, and just blame poor performance on bad design and start using MySQL.
On Thursday 22 November 2001 06:52, mlw wrote: > > For this kind of "newbie", the kind that doesn't read the > > documentation, this would only make it worse, because they'd assume > > that by making the choice between three default configurations they've > > done an adequate amount of tuning. Basically, you'd exchange, "I > > can't find any tuning information, but it's slow" for "I did all the > > tuning and it's still slow". Not a good choice. I am the other kind of newbie, the one that reads documentation. However, I fail to find much regarding tuning within the documentation delivered with the original version 7.1.2 tarball. Even when searching the postgresql website or the mailing list archives, the information is still sporadic. In the whole "Administrators guide" section there is no chapter "tuning", and even if you grep your way through it, you won't find much. And yes, I have read Bruce's book , from first to last page. And I have read Stinson's PostgreSQL Essential Reference the same way. Still, I am no wiser. I am not complaining. Postgresql, after all, is free. But it is kinda strange to blame the users for not reading documentation if this documentation is that hard to find (does it exist?) that "Joe Average" can't find it. The other domain where I could hardly find any information at all is the Postgres log settings. I would be more than happy to write the documentation if I could get hold of the neccessary information, as I badly need it myself. The other thing I am thinking abouty is that tuning can't be any "magic". If I can tune it, why shouldn't a configuration script be able to do the same? After all, even I would follow some algorithm to do it; it would probably involve a few educaterd trial & error & experiments - but hey, a script can do that too, can't it? I know too little about Postgres to do that part myself, but I don't think it is valid just to shove the idea of "autotuning" aside like this. Horst
Horst Herb writes: > I am the other kind of newbie, the one that reads documentation. > However, I fail to find much regarding tuning within the documentation > delivered with the original version 7.1.2 tarball. Yes, there is clearly a deficit in this area. But it's obviously not going to get better with more buffers by default. > The other thing I am thinking abouty is that tuning can't be any "magic". > If I can tune it, why shouldn't a configuration script be able to do the > same? Theoretically, you're right of course. In practice this could be quite complicated to set up. Let's say, the optimal configuration depends primarily on four groups of parameters: 1. hardware setup 2. load/desired load on the server ("dedicated" vs something else) 3. nature of the data 4. nature of the clients/query workload #1 is easy to figure out by asking a few questions or perhaps a few nonportable peeks into /proc. #2 is more difficult, it's not simply yes or no or loadavg = X, because it depends on the nature of the other applications. #3 is also not quite that easy. It'd require a hypersmart version of ANALYZE, but in reality you would want to configure your server before the data starts arriving. So it's not really feasible to run a an automatic "benchmark" or something. The same with #4, the queries really arrive only after the tuning is done. You'd hardly want to tune while the application is live and ask the users "how fast was it?". So what would be required is to parametrize these four factors (and others we come up with) accurately into questions the user can answer easily. This would be an enormous task, but I'm not saying it can't be done. -- Peter Eisentraut peter_e@gmx.net
mlw writes: > I don't think this is true at all. Making buffers and sort larger numbers, will > improve performance dramatically. I think we want to make the buffers bigger by default but we need to put some thought into the numbers. Currently, the shared buffers are set so that they presumably fit under the default shared memory limit on most systems. We already know this isn't actually true anymore. Tom Lane thinks that "a few thousand" buffers is enough, so let's say 2048 = 16MB. This is certainly a lot less than the 64MB that were proposed or the 512MB that some people use. However, I feel we should have *some* data points before we commit to a number that, as you say, most users will implicitly be stuck with. Even more so if some users claim "the more the better" and the leading developer in the field disagrees. (Perhaps we could arrange it that by default the system attempts to allocate X amount of shared memory and if it fails it tries smaller sizes until it succeeds (down to a reasonable minimum)? This could combine convenience and optimimal default.) As for sort memory, I have no idea why this isn't much larger by default. > I would also bet that most users NEVER see the postgresql.conf file, > and just blame poor performance on bad design and start using MySQL. This could be an information deficit more than anything else. I don't know. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > ... However, I feel we should have *some* > data points before we commit to a number that, as you say, most users will > implicitly be stuck with. Data points would be a good thing. I will freely admit that I have made no measurements to back up my opinion :-( > As for sort memory, I have no idea why this isn't much larger by default. The problem with sort memory is that you don't know what the multiplier is for it. SortMem is per sort/hash/whatever plan step, which means that not only might one backend be consuming several times SortMem on a complex query, but potentially all MaxBackend backends might be doing the same. In practice that seems like a pretty unlikely scenario, but surely you should figure *some* function of SortMem * MaxBackends as the number you need to compare to available RAM. The present 512K default is on the small side for current hardware, no doubt, but that doesn't mean we should crank it up without thought. We just recently saw a trouble report from someone who had pushed it to the moon and found out the hard way not to do that. regards, tom lane
What! I don't remember ever seeing docs pointing me to this configuration file. When did this file appear in postgres.