Thread: Re: [HACKERS] PostgreSQL Benchmarks
I've tested all the win32 versions of postgres I can get my hands on (cygwin and not), and my general feeling is that they have problems with insert performance with fsync() turned on, probably the fault of the os. Select performance is not so much affected. This is easily solved with transactions and other such things. Also Postgres benefits from pl just like oracle. May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative, and its hurting you in benchmarks being run by idiots, but its still bad publicity. Any real database admin would know his test are synthetic and not meaningful without having to look at the #s. This is irritating me so much that I am going to put together a benchmark of my own, a real world one, on (publicly available) real world data. Mysql is a real dog in a lot of situations. The FCC publishes a database of wireless transmitters that has tables with 10 million records in it. I'll pump that into pg, run some benchmarks, real world queries, and we'll see who the faster database *really* is. This is just a publicity issue, that's all. Its still annoying though. I'll even run an open challenge to database admin to beat query performance of postgres in such datasets, complex multi table joins, etc. I'll even throw out the whole table locking issue and analyze single user performance. Merlin _____________ How much of the performance difference is from the RDBMS, from the middleware, and from the quality of implementation in the middleware. While I'm not surprised that the the cygwin version of PostgreSQL is slow, those results don't tell me anything about the quality of the middleware interface between PHP and PostgreSQL. Does anyone know if we can rule out some of the performance loss by pinning it to bad middleware implementation for PostgreSQL? Regards, -- Greg Copeland <greg@copelandconsulting.net> Copeland Computer Consulting
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > May I make a suggestion that maybe it is time to start thinking about > tuning the default config file, IMHO its just a little bit too > conservative, It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults. The original motivation for setting shared_buffers = 64 was so that Postgres would start out-of-the-box on machines where SHMMAX is 1 meg (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data structures). At one time SHMMAX=1M was a pretty common stock kernel setting. But our other data structures blew past the 1/2 meg mark some time ago; at default settings the shmem request is now close to 1.5 meg. So people with SHMMAX=1M have already got to twiddle their postgresql.conf settings, or preferably learn how to increase SHMMAX. That means there is *no* defensible reason anymore for defaulting to 64 buffers. We could retarget to try to stay under SHMMAX=4M, which I think is the next boundary that's significant in terms of real-world platforms (isn't that the default SHMMAX on some BSDen?). That would allow us 350 or so shared_buffers, which is better, but still not really a serious choice for production work. What I would really like to do is set the default shared_buffers to 1000. That would be 8 meg worth of shared buffer space. Coupled with more-realistic settings for FSM size, we'd probably be talking a shared memory request approaching 16 meg. This is not enough RAM to bother any modern machine from a performance standpoint, but there are probably quite a few platforms out there that would need an increase in their stock SHMMAX kernel setting before they'd take it. So what this comes down to is making it harder for people to get Postgres running for the first time, versus making it more likely that they'll see decent performance when they do get it running. It's worth noting that increasing SHMMAX is not nearly as painful as it was back when these decisions were taken. Most people have moved to platforms where it doesn't even take a kernel rebuild, and we've acquired documentation that tells how to do it on all(?) our supported platforms. So I think it might be okay to expect people to do it. The alternative approach is to leave the settings where they are, and to try to put more emphasis in the documentation on the fact that the factory-default settings produce a toy configuration that you *must* adjust upward for decent performance. But we've not had a lot of success spreading that word, I think. With SHMMMAX too small, you do at least get a pretty specific error message telling you so. Comments? regards, tom lane
On Tue, 2003-02-11 at 10:20, Tom Lane wrote: > "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > May I make a suggestion that maybe it is time to start thinking about > > tuning the default config file, IMHO its just a little bit too > > conservative, > > It's a lot too conservative. I've been thinking for awhile that we > should adjust the defaults. > > The original motivation for setting shared_buffers = 64 was so that > Postgres would start out-of-the-box on machines where SHMMAX is 1 meg > (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data > structures). At one time SHMMAX=1M was a pretty common stock kernel > setting. But our other data structures blew past the 1/2 meg mark > some time ago; at default settings the shmem request is now close to > 1.5 meg. So people with SHMMAX=1M have already got to twiddle their > postgresql.conf settings, or preferably learn how to increase SHMMAX. > That means there is *no* defensible reason anymore for defaulting to > 64 buffers. > > We could retarget to try to stay under SHMMAX=4M, which I think is > the next boundary that's significant in terms of real-world platforms > (isn't that the default SHMMAX on some BSDen?). That would allow us > 350 or so shared_buffers, which is better, but still not really a > serious choice for production work. > > What I would really like to do is set the default shared_buffers to > 1000. That would be 8 meg worth of shared buffer space. Coupled with > more-realistic settings for FSM size, we'd probably be talking a shared > memory request approaching 16 meg. This is not enough RAM to bother > any modern machine from a performance standpoint, but there are probably > quite a few platforms out there that would need an increase in their > stock SHMMAX kernel setting before they'd take it. > > So what this comes down to is making it harder for people to get > Postgres running for the first time, versus making it more likely that > they'll see decent performance when they do get it running. > > It's worth noting that increasing SHMMAX is not nearly as painful as > it was back when these decisions were taken. Most people have moved > to platforms where it doesn't even take a kernel rebuild, and we've > acquired documentation that tells how to do it on all(?) our supported > platforms. So I think it might be okay to expect people to do it. > > The alternative approach is to leave the settings where they are, and > to try to put more emphasis in the documentation on the fact that the > factory-default settings produce a toy configuration that you *must* > adjust upward for decent performance. But we've not had a lot of > success spreading that word, I think. With SHMMMAX too small, you > do at least get a pretty specific error message telling you so. > > Comments? I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, "it's beyond my skill set", and attempt to get help or walk away. That seems better than them being able to run it and say, "it's a dog", spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can. Nutshell: "Easy to install but is horribly slow." or "Took a couple of minutes to configure and it rocks!" Seems fairly cut-n-dry to me. ;) Regards, -- Greg Copeland <greg@copelandconsulting.net> Copeland Computer Consulting
Re: Changing the default configuration (was Re: [HACKERS] PostgreSQL Benchmarks)
From
Patrick Welche
Date:
On Tue, Feb 11, 2003 at 11:20:14AM -0500, Tom Lane wrote: ... > We could retarget to try to stay under SHMMAX=4M, which I think is > the next boundary that's significant in terms of real-world platforms > (isn't that the default SHMMAX on some BSDen?). ... Assuming 1 page = 4k, and number of pages is correct in GENERIC kernel configs, SHMMAX=4M for NetBSD (8M for i386, x86_64) Cheers, Patrick
>Nutshell: > "Easy to install but is horribly slow." > > or > > "Took a couple of minutes to configure and it rocks!" Since when is it easy to install on win32? The easiest way I know of is through Cygwin, then you have to worry about installing the IPC service (an getting the right version too!) I've installed versions 6.1 to 7.1, but I almost gave up on the windows install. At least in 6.x you had very comprehensive installation guide with a TOC. Versus the competition which are you going to choose if you're a wanna-be DBA? The one with all he hoops to jump through, or the one that comes with a setup.exe? Now I actually am in support of making it more aggressive, but it should wait until we too have a setup.exe for the native windows port. (Changing it on *n*x platforms is of little benefit because most benchmarks seem to run it on w32 anyway :-( ) Just my $.02. I reserve the right to be wrong. -J
Tom Lane wrote:
One of the things I did on my Windows install was to have a number of default configuration files, postgresql.conf.small, postgresql.conf.medium, postgresql.conf.large."Merlin Moncure" <merlin.moncure@rcsonline.com> writes:May I make a suggestion that maybe it is time to start thinking about tuning the default config file, IMHO its just a little bit too conservative,It's a lot too conservative. I've been thinking for awhile that we should adjust the defaults.
Rather than choose one, in the "initdb" script, ask for or determine the mount of shared memory, memory, etc.
Another pet peeve I have is forcing the configuration files to be in the database directory. We had this argument in 7.1 days, and I submitted a patch that allowed a configuration file to be specified as a command line parameter. One of the things that Oracle does better is separating the "configuration" from the data.
It is an easy patch to allow PostgreSQL to use a separate configuration directory, and specify the data directory within the configuration file (The way any logical application works), and, NO, symlinks are not a solution, they are a kludge.
Tom Lane wrote: <snip> > What I would really like to do is set the default shared_buffers to > 1000. That would be 8 meg worth of shared buffer space. Coupled with > more-realistic settings for FSM size, we'd probably be talking a shared > memory request approaching 16 meg. This is not enough RAM to bother > any modern machine from a performance standpoint, but there are probably > quite a few platforms out there that would need an increase in their > stock SHMMAX kernel setting before they'd take it. <snip> Totally agree with this. We really, really, really, really need to get the default to a point where we have _decent_ default performance. > The alternative approach is to leave the settings where they are, and > to try to put more emphasis in the documentation on the fact that the > factory-default settings produce a toy configuration that you *must* > adjust upward for decent performance. But we've not had a lot of > success spreading that word, I think. With SHMMMAX too small, you > do at least get a pretty specific error message telling you so. > > Comments? Yep. Here's an *unfortunately very common* scenario, that again unfortunately, a _seemingly large_ amount of people fall for. a) Someone decides to "benchmark" database XYZ vs PostgreSQL vs other databases b) Said benchmarking person knows very little about PostgreSQL, so they install the RPM's, packages, or whatever, and "it works". Then they run whatever benchmark they've downloaded, or designed, or whatever c) PostgreSQL, being practically unconfigured, runs at the pace of a slow, mostly-disabled snail. d) Said benchmarking person gets better performance from the other databases (also set to their default settings) and thinks "PostgreSQL has lots of features, and it's free, but it's Too Slow". Yes, this kind of testing shouldn't even _pretend_ to have any real world credibility. e) Said benchmarking person tells everyone they know, _and_ everyone they meet about their results. Some of them even create nice looking or profesional looking web pages about it. f) People who know even _less_ than the benchmarking person hear about the test, or read the result, and don't know any better than to believe it at face value. So, they install whatever system was recommended. g) Over time, the benchmarking person gets the hang of their chosen database more and writes further articles about it, and doesn't generally look any further afield than it for say... a couple of years. By this time, they've already influenced a couple of thousand people in the non-optimal direction. h) Arrgh. With better defaults, our next release would _appear_ to be a lot faster to quite a few people, just because they have no idea about tuning. So, as sad as this scenario is, better defaults will probably encourage a lot more newbies to get involved, and that'll eventually translate into a lot more experienced users, and a few more coders to assist. ;-) Personally I'd be a bunch happier if we set the buffers so high that we definitely have decent performance, and the people that want to run PostgreSQL are forced to make the choice of either: 1) Adjust their system settings to allow PostgreSQL to run properly, or 2) Manually adjust the PostgreSQL settings to run memory-constrained This way, PostgreSQL either runs decently, or they are _aware_ that they're limiting it. That should cut down on the false benchmarks (hopefully). :-) Regards and best wishes, Justin Clift > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Re: Changing the default configuration (was Re: [HACKERS] PostgreSQL Benchmarks)
From
Steve Crawford
Date:
A quick-'n'-dirty first step would be more comments in postgresql.conf. Most of the lines are commented out which would imply "use the default" but the default is not shown. (I realize this has the difficulty of defaults that change depending upon how PostgreSQL was configured/compiled but perhaps postgresql.conf could be built by the make process based on the configuration options.) If postgresql.conf were commented with recommendations it would probably be all I need though perhaps a recommendation to edit that file should be displayed at the conclusion of "make install". Cheers, Steve On Tuesday 11 February 2003 8:20 am, Tom Lane wrote: > "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > May I make a suggestion that maybe it is time to start thinking about > > tuning the default config file, IMHO its just a little bit too > > conservative, > > It's a lot too conservative. I've been thinking for awhile that we > should adjust the defaults. > > The original motivation for setting shared_buffers = 64 was so that > Postgres would start out-of-the-box on machines where SHMMAX is 1 meg > (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data > structures). At one time SHMMAX=1M was a pretty common stock kernel > setting. But our other data structures blew past the 1/2 meg mark > some time ago; at default settings the shmem request is now close to > 1.5 meg. So people with SHMMAX=1M have already got to twiddle their > postgresql.conf settings, or preferably learn how to increase SHMMAX. > That means there is *no* defensible reason anymore for defaulting to > 64 buffers. > > We could retarget to try to stay under SHMMAX=4M, which I think is > the next boundary that's significant in terms of real-world platforms > (isn't that the default SHMMAX on some BSDen?). That would allow us > 350 or so shared_buffers, which is better, but still not really a > serious choice for production work. > > What I would really like to do is set the default shared_buffers to > 1000. That would be 8 meg worth of shared buffer space. Coupled with > more-realistic settings for FSM size, we'd probably be talking a shared > memory request approaching 16 meg. This is not enough RAM to bother > any modern machine from a performance standpoint, but there are probably > quite a few platforms out there that would need an increase in their > stock SHMMAX kernel setting before they'd take it. > > So what this comes down to is making it harder for people to get > Postgres running for the first time, versus making it more likely that > they'll see decent performance when they do get it running. > > It's worth noting that increasing SHMMAX is not nearly as painful as > it was back when these decisions were taken. Most people have moved > to platforms where it doesn't even take a kernel rebuild, and we've > acquired documentation that tells how to do it on all(?) our supported > platforms. So I think it might be okay to expect people to do it. > > The alternative approach is to leave the settings where they are, and > to try to put more emphasis in the documentation on the fact that the > factory-default settings produce a toy configuration that you *must* > adjust upward for decent performance. But we've not had a lot of > success spreading that word, I think. With SHMMMAX too small, you > do at least get a pretty specific error message telling you so. > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Greg Copeland wrote: > > >I'd personally rather have people stumble trying to get PostgreSQL >running, up front, rather than allowing the lowest common denominator >more easily run PostgreSQL only to be disappointed with it and move on. > >After it's all said and done, I would rather someone simply say, "it's >beyond my skill set", and attempt to get help or walk away. That seems >better than them being able to run it and say, "it's a dog", spreading >word-of-mouth as such after they left PostgreSQL behind. Worse yet, >those that do walk away and claim it performs horribly are probably >doing more harm to the PostgreSQL community than expecting someone to be >able to install software ever can. > <RANT> And that my friends is why PostgreSQL is still relatively obscure. This attitude sucks. If you want a product to be used, you must put the effort into making it usable. It is a no-brainer to make the default configuration file suitable for the majority of users. It is lunacy to create a default configuration which provides poor performance for over 90% of the users, but which allows the lowest common denominator to work. A product must not perform poorly out of the box, period. A good product manager would choose one of two possible configurations, (a) a high speed fairly optimized system from the get-go, or (b) it does not run unless you create the configuration file. Option (c) out of the box it works like crap, is not an option. This is why open source gets such a bad reputation. Outright contempt for the user who may not know the product as well as those developing it. This attitude really sucks and it turns people off. We want people to use PostgreSQL, to do that we must make PostgreSQL usable. Usability IS important. </RANT>
Re: Changing the default configuration (was Re: [HACKERS] PostgreSQL Benchmarks)
From
Tom Lane
Date:
Justin Clift <justin@postgresql.org> writes: > Personally I'd be a bunch happier if we set the buffers so high that we > definitely have decent performance, and the people that want to run > PostgreSQL are forced to make the choice of either: > 1) Adjust their system settings to allow PostgreSQL to run properly, or > 2) Manually adjust the PostgreSQL settings to run memory-constrained > This way, PostgreSQL either runs decently, or they are _aware_ that > they're limiting it. Yeah, that is the subtext here. If you can't increase SHMMAX then you can always trim the postgresql.conf parameters --- but theoretically, at least, you should then have a clue that you're running a badly-configured setup ... regards, tom lane
Tom, Justin, > > What I would really like to do is set the default shared_buffers to > > 1000. That would be 8 meg worth of shared buffer space. Coupled with > > more-realistic settings for FSM size, we'd probably be talking a shared > > memory request approaching 16 meg. This is not enough RAM to bother > > any modern machine from a performance standpoint, but there are probably > > quite a few platforms out there that would need an increase in their > > stock SHMMAX kernel setting before they'd take it. What if we supplied several sample .conf files, and let the user choose which to copy into the database directory? We could have a "high read performance" profile, and a "transaction database" profile, and a "workstation" profile, and a "low impact" profile. We could even supply a Perl script that would adjust SHMMAX and SHMMALL on platforms where this can be done from the command line. -- Josh Berkus Aglio Database Solutions San Francisco
On Tue, 2003-02-11 at 12:10, Steve Crawford wrote: > A quick-'n'-dirty first step would be more comments in postgresql.conf. Most This will not solve the issue with the large number of users who have no interest in looking at the config file -- but are interested in publishing their results. -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Josh Berkus <josh@agliodbs.com> writes: > What if we supplied several sample .conf files, and let the user choose which > to copy into the database directory? We could have a "high read > performance" profile, and a "transaction database" profile, and a > "workstation" profile, and a "low impact" profile. Uh ... do we have a basis for recommending any particular sets of parameters for these different scenarios? This could be a good idea in the abstract, but I'm not sure I know enough to fill in the details. A lower-tech way to accomplish the same result is to document these alternatives in postgresql.conf comments and encourage people to review that file, as Steve Crawford just suggested. But first we need the raw knowledge. regards, tom lane
Josh Berkus wrote: > Tom, Justin, <snip> > > What if we supplied several sample .conf files, and let the user choose which > to copy into the database directory? We could have a "high read > performance" profile, and a "transaction database" profile, and a > "workstation" profile, and a "low impact" profile. We could even supply a > Perl script that would adjust SHMMAX and SHMMALL on platforms where this can > be done from the command line. This might have value as the next step in the process of: a) Are we going to have better defaults? or b) Let's stick with the current approach. If we decide to go with better (changed) defaults, we may also be able to figure out a way of having profiles that could optionally be chosen from. As a longer term thought, it would be nice if the profiles weren't just hard-coded example files, but more of: pg_autotune --setprofile=xxx Or similar utility, and it did all the work. Named profiles being one capability, and other tuning measurements (i.e. cpu costings, disk performance profiles, etc) being the others. Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Tom Lane wrote: <snip> > Uh ... do we have a basis for recommending any particular sets of > parameters for these different scenarios? This could be a good idea > in the abstract, but I'm not sure I know enough to fill in the details. > > A lower-tech way to accomplish the same result is to document these > alternatives in postgresql.conf comments and encourage people to review > that file, as Steve Crawford just suggested. But first we need the raw > knowledge. Without too much hacking around, you could pretty easily adapt the pg_autotune code to do proper profiles of a system with different settings. i.e. increment one setting at a time, run pgbench on it with some decent amount of transactions and users, stuff the results into a different database. Aggregate data over time kind of thing. Let it run for a week, etc. If it's helpful, there's a 100% spare Althon 1.6Ghz box around with (choose your OS) + Adaptec 29160 + 512MB RAM + 2 x 9GB Seagate Cheetah 10k rpm drives hanging around. No stress to set that up and let it run any long terms tests you'd like plus send back results. Regards and best wishes, Justin Clift > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Tue, 2003-02-11 at 11:23, mlw wrote: > Greg Copeland wrote: > > > > > > >I'd personally rather have people stumble trying to get PostgreSQL > >running, up front, rather than allowing the lowest common denominator > >more easily run PostgreSQL only to be disappointed with it and move on. > > > >After it's all said and done, I would rather someone simply say, "it's > >beyond my skill set", and attempt to get help or walk away. That seems > >better than them being able to run it and say, "it's a dog", spreading > >word-of-mouth as such after they left PostgreSQL behind. Worse yet, > >those that do walk away and claim it performs horribly are probably > >doing more harm to the PostgreSQL community than expecting someone to be > >able to install software ever can. > > > <RANT> > > And that my friends is why PostgreSQL is still relatively obscure. > > This attitude sucks. If you want a product to be used, you must put the > effort into making it usable. > Ah..okay.... > It is a no-brainer to make the default configuration file suitable for > the majority of users. It is lunacy to create a default configuration > which provides poor performance for over 90% of the users, but which > allows the lowest common denominator to work. > I think you read something into my email which I did not imply. I'm certainly not advocating a default configuration file assuming 512M of share memory or some such insane value. Basically, you're arguing that they should keep doing exactly what they are doing. It's currently known to be causing problems and propagating the misconception that PostgreSQL is unable to perform under any circumstance. I'm arguing that who cares if 5% of the potential user base has to learn to properly install software. Either they'll read and learn, ask for assistance, or walk away. All of which are better than Jonny-come-lately offering up a meaningless benchmark which others are happy to eat with rather large spoons. > A product must not perform poorly out of the box, period. A good product > manager would choose one of two possible configurations, (a) a high > speed fairly optimized system from the get-go, or (b) it does not run > unless you create the configuration file. Option (c) out of the box it > works like crap, is not an option. > That's the problem. Option (c) is what we currently have. I'm amazed that you even have a problem with option (a), as that's what I'm suggesting. The problem is, potentially for some minority of users, it may not run out of the box. As such, I'm more than happy with this situation than 90% of the user base being stuck with a crappy default configuration. Oddly enough, your option (b) is even worse than what you are ranting at me about. Go figure. > This is why open source gets such a bad reputation. Outright contempt > for the user who may not know the product as well as those developing > it. This attitude really sucks and it turns people off. We want people > to use PostgreSQL, to do that we must make PostgreSQL usable. Usability > IS important. > </RANT> There is no contempt here. Clearly you've read your own bias into this thread. If you go back and re-read my posting, I think it's VERY clear that it's entirely about usability. Regards, -- Greg Copeland <greg@copelandconsulting.net> Copeland Computer Consulting
Tom, Justin, > > Uh ... do we have a basis for recommending any particular sets of > > parameters for these different scenarios? This could be a good idea > > in the abstract, but I'm not sure I know enough to fill in the details. Sure. Mostly-Read database, few users, good hardware, complex queries: = High shared buffers and sort mem, high geqo and join collapse thresholds, moderate fsm settings, defaults for WAL. Same as above with many users and simple queries (webserver) = same as above, except lower sort mem and higher connection limit High-Transaction Database = Moderate shared buffers and sort mem, high FSM settings, increase WAL files and buffers. Workstation = Moderate to low shared buffers and sort mem, moderate FSM, defaults for WAL, etc. Low-Impact server = current defaults, more or less. While none of these settings will be *perfect* for anyone, they will be considerably better than what's shipping with postgresql. And, based on my "Learning Perl" knowledge, I'm pretty sure I could write the program. All we'd need to do is argue out, on the PERFORMANCE list, what's a good value for each profile. That's the tough part. The Perl script is easy. > > A lower-tech way to accomplish the same result is to document these > > alternatives in postgresql.conf comments and encourage people to review > > that file, as Steve Crawford just suggested. But first we need the raw > > knowledge. That's also not a bad approach ... the CONF file should be more heavily commented, period, regardless of what approach we take. I volunteer to work on this with other participants. > Without too much hacking around, you could pretty easily adapt the > pg_autotune code to do proper profiles of a system with different settings. No offense, Justin, but I don't know anyone else who's gotten your pg_autotune script to run other than you. And pg_bench has not been useful performance measure for any real database server I have worked on so far. I'd be glad to help improve pg_autotune, with two caveats: 1) We will still need to figure out the "profiles" above so that we have decent starting values. 2) I suggest that we do pg_autotune in Perl or Python or another higher-level language. This would enable several performance buffs who don't do C to contribute to it, and a performance-tuning script is a higher-level-language sort of function, anyway. -- Josh Berkus Aglio Database Solutions San Francisco
Justin Clift <justin@postgresql.org> writes: > Tom Lane wrote: >> Uh ... do we have a basis for recommending any particular sets of >> parameters for these different scenarios? This could be a good idea >> in the abstract, but I'm not sure I know enough to fill in the details. > Without too much hacking around, you could pretty easily adapt the > pg_autotune code to do proper profiles of a system with different settings. > i.e. increment one setting at a time, run pgbench on it with some decent > amount of transactions and users, stuff the results into a different > database. If I thought that pgbench was representative of anything, or even capable of reliably producing repeatable numbers, then I might subscribe to results derived this way. But I have little or no confidence in pgbench. Certainly I don't see how you'd use it to produce recommendations for a range of application scenarios, when it's only one very narrow scenario itself. regards, tom lane
mlw <pgsql@mohawksoft.com> writes: > This attitude sucks. If you want a product to be used, you must put the > effort into making it usable. > [snip] AFAICT, you are flaming Greg for recommending the exact same thing you are recommending. Please calm down and read again. regards, tom lane
Apology
After Mark calms down and, in fact, sees that Greg was saying the right thing after all, chagrin is the only word.
I'm sorry.
Greg Copeland wrote:
After Mark calms down and, in fact, sees that Greg was saying the right thing after all, chagrin is the only word.
I'm sorry.
Greg Copeland wrote:
On Tue, 2003-02-11 at 11:23, mlw wrote:Greg Copeland wrote:I'd personally rather have people stumble trying to get PostgreSQL running, up front, rather than allowing the lowest common denominator more easily run PostgreSQL only to be disappointed with it and move on. After it's all said and done, I would rather someone simply say, "it's beyond my skill set", and attempt to get help or walk away. That seems better than them being able to run it and say, "it's a dog", spreading word-of-mouth as such after they left PostgreSQL behind. Worse yet, those that do walk away and claim it performs horribly are probably doing more harm to the PostgreSQL community than expecting someone to be able to install software ever can.<RANT> And that my friends is why PostgreSQL is still relatively obscure. This attitude sucks. If you want a product to be used, you must put the effort into making it usable.Ah..okay....It is a no-brainer to make the default configuration file suitable for the majority of users. It is lunacy to create a default configuration which provides poor performance for over 90% of the users, but which allows the lowest common denominator to work.I think you read something into my email which I did not imply. I'm certainly not advocating a default configuration file assuming 512M of share memory or some such insane value. Basically, you're arguing that they should keep doing exactly what they are doing. It's currently known to be causing problems and propagating the misconception that PostgreSQL is unable to perform under any circumstance. I'm arguing that who cares if 5% of the potential user base has to learn to properly install software. Either they'll read and learn, ask for assistance, or walk away. All of which are better than Jonny-come-lately offering up a meaningless benchmark which others are happy to eat with rather large spoons.A product must not perform poorly out of the box, period. A good product manager would choose one of two possible configurations, (a) a high speed fairly optimized system from the get-go, or (b) it does not run unless you create the configuration file. Option (c) out of the box it works like crap, is not an option.That's the problem. Option (c) is what we currently have. I'm amazed that you even have a problem with option (a), as that's what I'm suggesting. The problem is, potentially for some minority of users, it may not run out of the box. As such, I'm more than happy with this situation than 90% of the user base being stuck with a crappy default configuration. Oddly enough, your option (b) is even worse than what you are ranting at me about. Go figure.This is why open source gets such a bad reputation. Outright contempt for the user who may not know the product as well as those developing it. This attitude really sucks and it turns people off. We want people to use PostgreSQL, to do that we must make PostgreSQL usable. Usability IS important. </RANT>There is no contempt here. Clearly you've read your own bias into this thread. If you go back and re-read my posting, I think it's VERY clear that it's entirely about usability. Regards,
Tom Lane writes: > We could retarget to try to stay under SHMMAX=4M, which I think is > the next boundary that's significant in terms of real-world platforms > (isn't that the default SHMMAX on some BSDen?). That would allow us > 350 or so shared_buffers, which is better, but still not really a > serious choice for production work. What is a serious choice for production work? And what is the ideal choice? The answer probably involves some variables, but maybe we should get values for those variables in each case and work from there. -- Peter Eisentraut peter_e@gmx.net
> On Tue, 2003-02-11 at 10:20, Tom Lane wrote: > > "Merlin Moncure" <merlin.moncure@rcsonline.com> writes: > > > May I make a suggestion that maybe it is time to start thinking about > > > tuning the default config file, IMHO its just a little bit too > > > conservative, > > > > It's a lot too conservative. I've been thinking for awhile that we > > should adjust the defaults. > > > > The original motivation for setting shared_buffers = 64 was so that > > Postgres would start out-of-the-box on machines where SHMMAX is 1 meg > > (64 buffers = 1/2 meg, leaving 1/2 meg for our other shared data > > structures). At one time SHMMAX=1M was a pretty common stock kernel > > setting. But our other data structures blew past the 1/2 meg mark > > some time ago; at default settings the shmem request is now close to > > 1.5 meg. So people with SHMMAX=1M have already got to twiddle their > > postgresql.conf settings, or preferably learn how to increase SHMMAX. > > That means there is *no* defensible reason anymore for defaulting to > > 64 buffers. > > > > We could retarget to try to stay under SHMMAX=4M, which I think is > > the next boundary that's significant in terms of real-world platforms > > (isn't that the default SHMMAX on some BSDen?). That would allow us > > 350 or so shared_buffers, which is better, but still not really a > > serious choice for production work. > > > > What I would really like to do is set the default shared_buffers to > > 1000. That would be 8 meg worth of shared buffer space. Coupled with > > more-realistic settings for FSM size, we'd probably be talking a shared > > memory request approaching 16 meg. This is not enough RAM to bother > > any modern machine from a performance standpoint, but there are probably > > quite a few platforms out there that would need an increase in their > > stock SHMMAX kernel setting before they'd take it. > > > > So what this comes down to is making it harder for people to get > > Postgres running for the first time, versus making it more likely that > > they'll see decent performance when they do get it running. > > > > It's worth noting that increasing SHMMAX is not nearly as painful as > > it was back when these decisions were taken. Most people have moved > > to platforms where it doesn't even take a kernel rebuild, and we've > > acquired documentation that tells how to do it on all(?) our supported > > platforms. So I think it might be okay to expect people to do it. > > > > The alternative approach is to leave the settings where they are, and > > to try to put more emphasis in the documentation on the fact that the > > factory-default settings produce a toy configuration that you *must* > > adjust upward for decent performance. But we've not had a lot of > > success spreading that word, I think. With SHMMMAX too small, you > > do at least get a pretty specific error message telling you so. > > > > Comments? > > I'd personally rather have people stumble trying to get PostgreSQL > running, up front, rather than allowing the lowest common denominator > more easily run PostgreSQL only to be disappointed with it and move on. > > After it's all said and done, I would rather someone simply say, "it's > beyond my skill set", and attempt to get help or walk away. That seems > better than them being able to run it and say, "it's a dog", spreading > word-of-mouth as such after they left PostgreSQL behind. Worse yet, > those that do walk away and claim it performs horribly are probably > doing more harm to the PostgreSQL community than expecting someone to be > able to install software ever can. > > Nutshell: > "Easy to install but is horribly slow." > > or > > "Took a couple of minutes to configure and it rocks!" > > > > Seems fairly cut-n-dry to me. ;) The type of person who can't configure it or doesnt' think to try is probably not doing a project that requires any serious performance. As long as you are running it on decent hardware postgres will run fantastic for anything but a very heavy load. I think there may be many people out there who have little experience but want an RDBMS to manage their data. Those people need something very, very easy. Look at the following that mysql gets despite how poor of a product it is. It's very, very easy. Mysql works great for many peoples needs but then when they need to do something real they need to move to a different database entirely. I think there is a huge advantage to having a product that can be set up very quickly out of the box. Those who need serious performance, hopefully for ther employers sake, will be more like to take a few minutes to do some quick performance tuning. Rick Gigger
On Tue, 11 Feb 2003, Tom Lane wrote: > It's a lot too conservative. I've been thinking for awhile that we > should adjust the defaults. Some of these issues could be made to Just Go Away with some code changes. For example, using mmap rather than SysV shared memory would automatically optimize your memory usage, and get rid of the double-buffering problem as well. If we could find a way to avoid using semephores proportional to the number of connections we have, then you wouldn't have to worry about that configuration parameter, either. In fact, some of this stuff might well improve our portability, too. For example, mmap is a POSIX standard, whereas shmget is only an X/Open standard. That makes me suspect that mmap is more widely available on non-Unix platforms. (But I could be wrong.) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
On Tue, 11 Feb 2003, Rick Gigger wrote: > The type of person who can't configure it or doesnt' think to try is > probably not doing a project that requires any serious performance. As long > as you are running it on decent hardware postgres will run fantastic for > anything but a very heavy load. I think there may be many people out there > who have little experience but want an RDBMS to manage their data. Those > people need something very, very easy. Look at the following that mysql > gets despite how poor of a product it is. It's very, very easy. Mysql > works great for many peoples needs but then when they need to do something > real they need to move to a different database entirely. I think there is a > huge advantage to having a product that can be set up very quickly out of > the box. Those who need serious performance, hopefully for ther employers > sake, will be more like to take a few minutes to do some quick performance > tuning. Very good point. I'm pushing for changes that will NOT negatively impact joe beginner on the major platforms (Linux, BSD, Windows) in terms of install. I figure anyone installing on big iron already knows enough about their OS we don't have to worry about shared buffers being too big for that machine. So, a compromise of faster performance out of the box, with little or no negative user impact seems the sweet spot here. I'm thinking a good knee setting for each one, where not too much memory / semaphores / file handles get gobbled up, but the database isn't pokey. The poor performance of Postgresql in it's current default configuration HAS cost us users, trust me, I know a few we've almost lost where I work that I converted after some quick tweaking of their database. In it's stock form Postgresql is very slow at large simple queries, like 'select * from table1 t1 natural join table2 t2 where t1.field='a'; where you get back something like 10,000 rows. The real bottleneck here is sort_mem. A simple bump up to 8192 or so makes the database much more responsive. If we're looking at changing default settings for 7.4, then we should look at changing ALL of them that matter, since we'll have the most time to shake out problems if we do them early, and we won't have four or five rounds of setting different defaults over time and finding the limitations of the HOST OSes one at a time.
> If I thought that pgbench was representative of anything, or even > capable of reliably producing repeatable numbers, then I might subscribe > to results derived this way. But I have little or no confidence in > pgbench. Certainly I don't see how you'd use it to produce > recommendations for a range of application scenarios, when it's only > one very narrow scenario itself. Sigh. People always complain "pgbench does not reliably producing repeatable numbers" or something then say "that's because pgbench's transaction has too much contention on the branches table". So I added -N option to pgbench which makes pgbench not to do any UPDATE to the branches table. But still people continue to complian... There should be many factors that would produce non-repeatable results exist, for instance kenel buffer, PostgreSQL's buffer manager, pgbench itself etc. etc... So far it seems no one has ever made clean explanation why non-repeatable results happen... -- Tatsuo Ishii
On Wed, 12 Feb 2003, Curt Sampson wrote: > On Tue, 11 Feb 2003, Tom Lane wrote: > > > It's a lot too conservative. I've been thinking for awhile that we > > should adjust the defaults. > > Some of these issues could be made to Just Go Away with some code > changes. For example, using mmap rather than SysV shared memory > would automatically optimize your memory usage, and get rid of the > double-buffering problem as well. If we could find a way to avoid using > semephores proportional to the number of connections we have, then you > wouldn't have to worry about that configuration parameter, either. > > In fact, some of this stuff might well improve our portability, too. > For example, mmap is a POSIX standard, whereas shmget is only an X/Open > standard. That makes me suspect that mmap is more widely available on > non-Unix platforms. (But I could be wrong.) I'll vote for mmap. I use the mm libs with apache/openldap/authldap and it is very fast and pretty common nowadays. It seems quite stable as well.
On Tue, Feb 11, 2003 at 17:42:06 -0700, "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > The poor performance of Postgresql in it's current default configuration > HAS cost us users, trust me, I know a few we've almost lost where I work > that I converted after some quick tweaking of their database. About two years ago I talked some people into trying it at work to use with IMP/Horde which had been having some corruption problems while using MySQL (though it wasn't necessarily a problem with MySQL). I told them to be sure to use 7.1. When they tried it out it couldn't keep up with the load. I asked the guys what they tried and found out they couldn't find 7.1 rpms and didn't want to compile from source and so ended up using 7.0.?. Also as far as I could tell from talking to them, they didn't do any tuning at all. They weren't interested in taking another look at it after that. We are still using MySQL with that system today. One of our DBAs is using it for some trial projects (including one for me) even though we have a site license for Oracle.
Why don't we include a postgresql.conf.recommended along with our postgresql.conf.sample. That shouldn't be too hard. We can just jack up the shared buffers and wal buffers and everything - it doesn't matter if it's not perfect, but it will at least give people an idea of what needs to be increased, etc to get good results. I'm currently benchmarking our new DB server before we put it into production. I plan to publish the results from that shortly. Regards, Chris > -----Original Message----- > From: pgsql-advocacy-owner@postgresql.org > [mailto:pgsql-advocacy-owner@postgresql.org]On Behalf Of Merlin Moncure > Sent: Tuesday, 11 February 2003 11:44 PM > To: Greg Copeland > Cc: PostgresSQL Hackers Mailing List; pgsql-advocacy@postgresql.org > Subject: Re: [pgsql-advocacy] [HACKERS] PostgreSQL Benchmarks > > > I've tested all the win32 versions of postgres I can get my hands on > (cygwin and not), and my general feeling is that they have problems with > insert performance with fsync() turned on, probably the fault of the os. > Select performance is not so much affected. > > This is easily solved with transactions and other such things. Also > Postgres benefits from pl just like oracle. > > May I make a suggestion that maybe it is time to start thinking about > tuning the default config file, IMHO its just a little bit too > conservative, and its hurting you in benchmarks being run by idiots, but > its still bad publicity. Any real database admin would know his test > are synthetic and not meaningful without having to look at the #s. > > This is irritating me so much that I am going to put together a > benchmark of my own, a real world one, on (publicly available) real > world data. Mysql is a real dog in a lot of situations. The FCC > publishes a database of wireless transmitters that has tables with 10 > million records in it. I'll pump that into pg, run some benchmarks, > real world queries, and we'll see who the faster database *really* is. > This is just a publicity issue, that's all. Its still annoying though. > > I'll even run an open challenge to database admin to beat query > performance of postgres in such datasets, complex multi table joins, > etc. I'll even throw out the whole table locking issue and analyze > single user performance. > > Merlin > > > > _____________ > How much of the performance difference is from the RDBMS, from the > middleware, and from the quality of implementation in the middleware. > > While I'm not surprised that the the cygwin version of PostgreSQL is > slow, those results don't tell me anything about the quality of the > middleware interface between PHP and PostgreSQL. Does anyone know if we > can rule out some of the performance loss by pinning it to bad > middleware implementation for PostgreSQL? > > > Regards, > > -- > Greg Copeland <greg@copelandconsulting.net> > Copeland Computer Consulting > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
> >After it's all said and done, I would rather someone simply say, "it's > >beyond my skill set", and attempt to get help or walk away. That seems > >better than them being able to run it and say, "it's a dog", spreading > >word-of-mouth as such after they left PostgreSQL behind. Worse yet, > >those that do walk away and claim it performs horribly are probably > >doing more harm to the PostgreSQL community than expecting someone to be > >able to install software ever can. > > > <RANT> > > And that my friends is why PostgreSQL is still relatively obscure. Dude - I hang out on PHPBuilder's database forums and you wouldn't believe how often the "oh, don't use Postgres, it has a history of database corruption problems" thing is mentioned. Chris
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> We could retarget to try to stay under SHMMAX=4M, which I think is >> the next boundary that's significant in terms of real-world platforms >> (isn't that the default SHMMAX on some BSDen?). That would allow us >> 350 or so shared_buffers, which is better, but still not really a >> serious choice for production work. > What is a serious choice for production work? Well, as I commented later in that mail, I feel that 1000 buffers is a reasonable choice --- but I have to admit that I have no hard data to back up that feeling. Perhaps we should take this to the pgsql-perform list and argue about reasonable choices. A separate line of investigation is "what is the lowest common denominator nowadays?" I think we've established that SHMMAX=1M is obsolete, but what replaces it as the next LCD? 4M seems to be correct for some BSD flavors, and I can confirm that that's the current default for Mac OS X --- any other comments? regards, tom lane
> >> We could retarget to try to stay under SHMMAX=4M, which I think is > >> the next boundary that's significant in terms of real-world platforms > >> (isn't that the default SHMMAX on some BSDen?). That would allow us > >> 350 or so shared_buffers, which is better, but still not really a > >> serious choice for production work. > > > What is a serious choice for production work? > > Well, as I commented later in that mail, I feel that 1000 buffers is > a reasonable choice --- but I have to admit that I have no hard data > to back up that feeling. Perhaps we should take this to the > pgsql-perform list and argue about reasonable choices. Damn. Another list I have to subscribe to! The results I just posted indicate that 1000 buffers is really quite bad performance comaped to 4000, perhaps up to 100 TPS for selects and 30 TPS for TPC-B. Still, that 1000 is in itself vastly better than 64!! Chris
> A separate line of investigation is "what is the lowest common > denominator nowadays?" I think we've established that SHMMAX=1M > is obsolete, but what replaces it as the next LCD? 4M seems to be > correct for some BSD flavors, and I can confirm that that's the > current default for Mac OS X --- any other comments? It's 1025 * 4k pages on FreeBSD = 4MB Chris
On Tue, Feb 11, 2003 at 05:25:29PM -0700, Rick Gigger wrote: > The type of person who can't configure it or doesnt' think to try is > probably not doing a project that requires any serious performance. I have piles of email, have fielded thousands of phone calls, and have had many conversations which prove that claim false. People think that computers are magic. That they don't think the machines require a little bit of attention is nowise an indication that they don't need the system to come with reasonable defaults. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Wed, 2003-02-12 at 11:39, Andrew Sullivan wrote: > On Tue, Feb 11, 2003 at 05:25:29PM -0700, Rick Gigger wrote: > > > The type of person who can't configure it or doesnt' think to try is > > probably not doing a project that requires any serious performance. > > I have piles of email, have fielded thousands of phone calls, and > have had many conversations which prove that claim false. People But IBM told me computers are self healing, so if there is a performance problem should it just fix itself? -- Rod Taylor <rbt@rbt.ca> PGP Key: http://www.rbt.ca/rbtpub.asc
Tom Lane writes: > Well, as I commented later in that mail, I feel that 1000 buffers is > a reasonable choice --- but I have to admit that I have no hard data > to back up that feeling. I know you like it in that range, and 4 or 8 MB of buffers by default should not be a problem. But personally I think if the optimal buffer size does not depend on both the physical RAM you want to dedicate to PostgreSQL and the nature and size of the database, then we have achieved a medium revolution in computer science. ;-) -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > I know you like it in that range, and 4 or 8 MB of buffers by default > should not be a problem. But personally I think if the optimal buffer > size does not depend on both the physical RAM you want to dedicate to > PostgreSQL and the nature and size of the database, then we have achieved > a medium revolution in computer science. ;-) But this is not about "optimal" settings. This is about "pretty good" settings. As long as we can get past the knee of the performance curve, I think we've done what should be expected of a default parameter set. I believe that 1000 buffers is enough to get past the knee in most scenarios. Again, I haven't got hard evidence, but that's my best guess. regards, tom lane
Peter Eisentraut wrote: > Tom Lane writes: > > > Well, as I commented later in that mail, I feel that 1000 buffers is > > a reasonable choice --- but I have to admit that I have no hard data > > to back up that feeling. > > I know you like it in that range, and 4 or 8 MB of buffers by default > should not be a problem. But personally I think if the optimal buffer > size does not depend on both the physical RAM you want to dedicate to > PostgreSQL and the nature and size of the database, then we have achieved > a medium revolution in computer science. ;-) I have thought about this and I have an idea. Basically, increasing the default values may get us closer, but it will discourage some to tweek, and it will cause problems with some OS's that have small SysV params. So, my idea is to add a message at the end of initdb that states people should run the pgtune script before running a production server. The pgtune script will basically allow us to query the user, test the OS version and perhaps parameters, and modify postgresql.conf with reasonable values. I think this is the only way to cleanly get folks close to where they should be. For example, we can ask them how many rows and tables they will be changing, on average, between VACUUM runs. That will allow us set the FSM params. We can ask them about using 25% of their RAM for shared buffers. If they have other major apps running on the server or have small tables, we can make no changes. We can basically ask them questions and use that info to set values. We can even ask about sort usage maybe and set sort memory. We can even control checkpoint_segments this way if they say they will have high database write activity and don't worry about disk space usage. We may even be able to compute some random page cost estimate. Seems a script is going to be the best way to test values and assist folks in making reasonable decisions about each parameter. Of course, they can still edit the file, and we can ask them if they want assistance to set each parameter or leave it alone. I would restrict the script to only deal with tuning values, and tell people they still need to review that file for other useful parameters. Another option would be to make a big checklist or web page that asks such questions and computes proper values, but it seems a script would be easiest. We can even support '?' which would explain why the question is being ask and how it affects the value. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>>Bruce Momjian said: [...] > For example, we can ask them how many rows and tables they will be > changing, on average, between VACUUM runs. That will allow us set the > FSM params. We can ask them about using 25% of their RAM for shared > buffers. If they have other major apps running on the server or have > small tables, we can make no changes. We can basically ask them > questions and use that info to set values. Bruce, this is an very good idea and such tool would simplify setup for the me-too type of DBA - we should definitely try to attract them. However, how could one possibly answer the above question, if they setup their database for the first time? What is more, these settings are on a per-installation, not per-database - which means, that if you have several small, but active databases and one large database the requirements will be very different. Nobody likes answering such questions when installing new software. You might enjoy it the first few times, but then learn the 'answers' and don't even think what the question is. (we all know the answer :) Perhaps indeed a better idea is to have PostgreSQL itself collect usage statistics, and from time to time print 'suggestions' to the log file (best in my opinion), or have these available via some query. These suggestions should best reflect the of course require minimal intervention to the database system, such as restart etc. Daniel
Bruce Momjian <pgman@candle.pha.pa.us> writes: > So, my idea is to add a message at the end of initdb that states people > should run the pgtune script before running a production server. Do people read what initdb has to say? IIRC, the RPM install scripts hide initdb's output from the user entirely. I wouldn't put much faith in such a message as having any real effect on people... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > So, my idea is to add a message at the end of initdb that states people > > should run the pgtune script before running a production server. > > Do people read what initdb has to say? > > IIRC, the RPM install scripts hide initdb's output from the user > entirely. I wouldn't put much faith in such a message as having any > real effect on people... Yes, that is a problem. We could show something in the server logs if pg_tune hasn't been run. Not sure what else we can do, but it would give folks a one-stop thing to run to deal with performance configuration. We could prevent the postmaster from starting unless they run pg_tune or if they have modified postgresql.conf from the default. Of course, that's pretty drastic. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I imagined they could run pgtune anytime after install to update those performance parameters. It gives them a one-stop location to at least do minimal tuning, and as their load changes, they can run it again. --------------------------------------------------------------------------- Daniel Kalchev wrote: > >>>Bruce Momjian said: > [...] > > For example, we can ask them how many rows and tables they will be > > changing, on average, between VACUUM runs. That will allow us set the > > FSM params. We can ask them about using 25% of their RAM for shared > > buffers. If they have other major apps running on the server or have > > small tables, we can make no changes. We can basically ask them > > questions and use that info to set values. > > Bruce, this is an very good idea and such tool would simplify setup for the > me-too type of DBA - we should definitely try to attract them. > > However, how could one possibly answer the above question, if they setup their > database for the first time? > > What is more, these settings are on a per-installation, not per-database - > which means, that if you have several small, but active databases and one > large database the requirements will be very different. > > Nobody likes answering such questions when installing new software. You might > enjoy it the first few times, but then learn the 'answers' and don't even > think what the question is. (we all know the answer :) > > Perhaps indeed a better idea is to have PostgreSQL itself collect usage > statistics, and from time to time print 'suggestions' to the log file (best in > my opinion), or have these available via some query. These suggestions should > best reflect the of course require minimal intervention to the database > system, such as restart etc. > > > Daniel > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread that gathers statistics and periodically re-tunes the database parameters. Of course, be able to turn it off. People that actually take the time to run tune manually will turn it off as to not have the overhead or interruption. Those that don't care about pg_tune shouldn't care about having a thread around retuning. Those that will care will tune manually. -----Original Message----- From: pgsql-advocacy-owner@postgresql.org [mailto:pgsql-advocacy-owner@postgresql.org]On Behalf Of Bruce Momjian Sent: Thursday, February 13, 2003 2:22 PM To: Daniel Kalchev Cc: PostgresSQL Hackers Mailing List; pgsql-advocacy@postgresql.org Subject: Re: [pgsql-advocacy] [HACKERS] Changing the default configuration I imagined they could run pgtune anytime after install to update those performance parameters. It gives them a one-stop location to at least do minimal tuning, and as their load changes, they can run it again. --------------------------------------------------------------------------- Daniel Kalchev wrote: > >>>Bruce Momjian said: > [...] > > For example, we can ask them how many rows and tables they will be > > changing, on average, between VACUUM runs. That will allow us set the > > FSM params. We can ask them about using 25% of their RAM for shared > > buffers. If they have other major apps running on the server or have > > small tables, we can make no changes. We can basically ask them > > questions and use that info to set values. > > Bruce, this is an very good idea and such tool would simplify setup for the > me-too type of DBA - we should definitely try to attract them. > > However, how could one possibly answer the above question, if they setup their > database for the first time? > > What is more, these settings are on a per-installation, not per-database - > which means, that if you have several small, but active databases and one > large database the requirements will be very different. > > Nobody likes answering such questions when installing new software. You might > enjoy it the first few times, but then learn the 'answers' and don't even > think what the question is. (we all know the answer :) > > Perhaps indeed a better idea is to have PostgreSQL itself collect usage > statistics, and from time to time print 'suggestions' to the log file (best in > my opinion), or have these available via some query. These suggestions should > best reflect the of course require minimal intervention to the database > system, such as restart etc. > > > Daniel > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
To auto-tune, you would need to monitor swap usage and other stuff that may vary too much based on load from other systems. Only the admin knows how to answer some of those questions. --------------------------------------------------------------------------- Jason Hihn wrote: > Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread > that gathers statistics and periodically re-tunes the database parameters. > Of course, be able to turn it off. People that actually take the time to run > tune manually will turn it off as to not have the overhead or interruption. > Those that don't care about pg_tune shouldn't care about having a thread > around retuning. Those that will care will tune manually. > > > > -----Original Message----- > From: pgsql-advocacy-owner@postgresql.org > [mailto:pgsql-advocacy-owner@postgresql.org]On Behalf Of Bruce Momjian > Sent: Thursday, February 13, 2003 2:22 PM > To: Daniel Kalchev > Cc: PostgresSQL Hackers Mailing List; pgsql-advocacy@postgresql.org > Subject: Re: [pgsql-advocacy] [HACKERS] Changing the default > configuration > > > > I imagined they could run pgtune anytime after install to update those > performance parameters. It gives them a one-stop location to at least > do minimal tuning, and as their load changes, they can run it again. > > --------------------------------------------------------------------------- > > Daniel Kalchev wrote: > > >>>Bruce Momjian said: > > [...] > > > For example, we can ask them how many rows and tables they will be > > > changing, on average, between VACUUM runs. That will allow us set the > > > FSM params. We can ask them about using 25% of their RAM for shared > > > buffers. If they have other major apps running on the server or have > > > small tables, we can make no changes. We can basically ask them > > > questions and use that info to set values. > > > > Bruce, this is an very good idea and such tool would simplify setup for > the > > me-too type of DBA - we should definitely try to attract them. > > > > However, how could one possibly answer the above question, if they setup > their > > database for the first time? > > > > What is more, these settings are on a per-installation, not per-database - > > which means, that if you have several small, but active databases and one > > large database the requirements will be very different. > > > > Nobody likes answering such questions when installing new software. You > might > > enjoy it the first few times, but then learn the 'answers' and don't even > > think what the question is. (we all know the answer :) > > > > Perhaps indeed a better idea is to have PostgreSQL itself collect usage > > statistics, and from time to time print 'suggestions' to the log file > (best in > > my opinion), or have these available via some query. These suggestions > should > > best reflect the of course require minimal intervention to the database > > system, such as restart etc. > > > > > > Daniel > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Josh Berkus wrote: > > > Uh ... do we have a basis for recommending any particular sets of > > > parameters for these different scenarios? This could be a good idea > > > in the abstract, but I'm not sure I know enough to fill in the details. > > Sure. > Mostly-Read database, few users, good hardware, complex queries: > = High shared buffers and sort mem, high geqo and join collapse thresholds, > moderate fsm settings, defaults for WAL. > Same as above with many users and simple queries (webserver) = > same as above, except lower sort mem and higher connection limit > High-Transaction Database = > Moderate shared buffers and sort mem, high FSM settings, increase WAL files > and buffers. > Workstation = > Moderate to low shared buffers and sort mem, moderate FSM, defaults for WAL, > etc. > Low-Impact server = current defaults, more or less. Okay, but there should probably be one more, called "Benchmark". The real problem is what values to use for it. :-) -- Kevin Brown kevin@sysexperts.com
Tatsuo, > Sigh. People always complain "pgbench does not reliably producing > repeatable numbers" or something then say "that's because pgbench's > transaction has too much contention on the branches table". So I added > -N option to pgbench which makes pgbench not to do any UPDATE to > the branches table. But still people continue to complian... Hey, pg_bench is a good start on a Postgres performance tester, and it's much, much better than what there was before you came along ... which was nothing. Thank you again for contributing it. pg_bench is, however, only a start on a performance tester, and we'd need to build it up before we could use it as the basis of a PG tuner. -- Josh Berkus Aglio Database Solutions San Francisco
HEY PEOPLE! How about we take this discussion to the Performance List, where it belongs? -- Josh Berkus Aglio Database Solutions San Francisco
Bruce Momjian wrote: > We could prevent the postmaster from starting unless they run pg_tune or > if they have modified postgresql.conf from the default. Of course, > that's pretty drastic. If you're going to do that, then you may as well make the defaults something that will perform reasonably well under the widest circumstances possible and let the postmaster fail when it can't acquire the resources those defaults demand. What I'd do is go ahead and make the defaults something reasonable, and if the postmaster can't allocate, say, enough shared memory pages, then it should issue an error message saying not only that it wasn't able to allocate enough shared memory, but also which parameter to change and (if it's not too much trouble to implement) what it can be changed to in order to get past that part of the initialization (this means that the postmaster has to figure out how much shared memory it can actually allocate, via a binary search allocate/free method). It should also warn that by lowering the value, the resulting performance may be much less than satisfactory, and that the alternative (to increase SHMMAX, in this example) should be used if good performance is desired. That way, someone whose only concern is to make it work will be able to do so without having to do a lot of experimentation, and will get plenty of warning that the result isn't likely to work very well. And we end up getting better benchmarks in the cases where people don't have to touch the default config. :-) -- Kevin Brown kevin@sysexperts.com
>>>Bruce Momjian said: > > I imagined they could run pgtune anytime after install to update those > performance parameters. It gives them a one-stop location to at least > do minimal tuning, and as their load changes, they can run it again. True, but to make reasonably good choice, they will need to base the parameter selection on some statistics data. Not many people do trough testing before tuning their system and in many cases, the tests one do rarely resemble the real-world usage of their database(s). I agree that pgtune would be wonderful tool in this case, but they first need to get some idea what parameters should be given to it. This process if further complicated by the fact that we can tune PostgreSQL on a per-installation basis, instead of on per-database basis - many of the parameters, for example FSM and sort memory are database related. We usually split data into databases to put related data together or data with similar usage pattern etc. Daniel
>>>Jason Hihn said: > Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread > that gathers statistics and periodically re-tunes the database parameters. > Of course, be able to turn it off. People that actually take the time to run > tune manually will turn it off as to not have the overhead or interruption. > Those that don't care about pg_tune shouldn't care about having a thread > around retuning. Those that will care will tune manually. This is related to my proposition, but trouble is, there is not such thing as 'well tuned database' that will suit all queries. You can tune the database to the hardware for example (still remember that old argument on random access and fast disks). It seems the system could 'self-tune' itself on minor choices. I believe it does this today for a number of things already. More significant changes require the DBA consent and choice - but they need to be well informed of the current usage statistics when making the choice. Daniel
>>>Josh Berkus said: > How about we take this discussion to the Performance List, where it belongs? I believe the design and addition of code that collects and outputs the usage patterns of the database (statistics) belongshere. If we take the approach to providing information to tune PostgreSQL based on real-world usage, I guess we need at least thefollowing: - Usage statistics on a per-database or even per-table level. I believe we already collect some; - Statistics analysis tool/routine/process to produce suggestions; - Of course lots of real-world data to justify the suggestions; - Can we provide more knobs for tunable parameters that can be applied on a per-database or even per-table level. One firstcandidate might be the FSM? - Can some of these parameters (when available) to auto-tune? Of course, this could move out of pgsql-hackers :) Daniel
On Fri, 2003-02-14 at 03:00, Daniel Kalchev wrote: > >>>Jason Hihn said: > > Pardon my ignorance, but there's no way to auto-tune? Ship it with a thread > > that gathers statistics and periodically re-tunes the database parameters. > > Of course, be able to turn it off. People that actually take the time to run > > tune manually will turn it off as to not have the overhead or interruption. > > Those that don't care about pg_tune shouldn't care about having a thread > > around retuning. Those that will care will tune manually. > > This is related to my proposition, but trouble is, there is not such thing as > 'well tuned database' that will suit all queries. You can tune the database to > the hardware for example (still remember that old argument on random access > and fast disks). > > It seems the system could 'self-tune' itself on minor choices. I believe it > does this today for a number of things already. More significant changes > require the DBA consent and choice - but they need to be well informed of the > current usage statistics when making the choice. > I agree. Given that we don't have solid explanations on telling people how to tune the different parameters, nor do we have enough mechanisms for actually giving people the information they need to determine the changes they need, a complete auto-tune seems premature. Robert Treat
On Thu, 13 Feb 2003, Bruce Momjian wrote: > To auto-tune, you would need to monitor swap usage and other stuff that > may vary too much based on load from other systems. Only the admin > knows how to answer some of those questions. No, to "auto-tune" many parameters that currently require manual configuration, you merely need to change things so that you work with the OS, rather than beside it. There are a lot of things we could mmap that would remove the need for tuning certain things altogether. The buffer cache is the biggest one; mmaping that would let the OS take care of it all, and a few hundred KB of shared memory would be all postgres would need. (Or none at all, if you used mmap'd memory regions where for that part of your IPC as well.) You could probably also make sort_mem need a lot less tuning if you sorted in mmap'd memory and let the OS deal with paging only when and if it needed it (as well as asking the OS not to page memory you're randomly accessing, since it really destroys your peformance when you do that). I'm not sure if you could get rid of semaphores or not, but if you could somehow do that, that would be another limited resource that you wouldn't have to deal with. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
People seemed to like the idea: Add a script to ask system configuration questions and tune postgresql.conf. --------------------------------------------------------------------------- Bruce Momjian wrote: > Peter Eisentraut wrote: > > Tom Lane writes: > > > > > Well, as I commented later in that mail, I feel that 1000 buffers is > > > a reasonable choice --- but I have to admit that I have no hard data > > > to back up that feeling. > > > > I know you like it in that range, and 4 or 8 MB of buffers by default > > should not be a problem. But personally I think if the optimal buffer > > size does not depend on both the physical RAM you want to dedicate to > > PostgreSQL and the nature and size of the database, then we have achieved > > a medium revolution in computer science. ;-) > > I have thought about this and I have an idea. Basically, increasing the > default values may get us closer, but it will discourage some to tweek, > and it will cause problems with some OS's that have small SysV params. > > So, my idea is to add a message at the end of initdb that states people > should run the pgtune script before running a production server. > > The pgtune script will basically allow us to query the user, test the OS > version and perhaps parameters, and modify postgresql.conf with > reasonable values. I think this is the only way to cleanly get folks > close to where they should be. > > For example, we can ask them how many rows and tables they will be > changing, on average, between VACUUM runs. That will allow us set the > FSM params. We can ask them about using 25% of their RAM for shared > buffers. If they have other major apps running on the server or have > small tables, we can make no changes. We can basically ask them > questions and use that info to set values. > > We can even ask about sort usage maybe and set sort memory. We can even > control checkpoint_segments this way if they say they will have high > database write activity and don't worry about disk space usage. We may > even be able to compute some random page cost estimate. > > Seems a script is going to be the best way to test values and assist > folks in making reasonable decisions about each parameter. Of course, > they can still edit the file, and we can ask them if they want > assistance to set each parameter or leave it alone. > > I would restrict the script to only deal with tuning values, and tell > people they still need to review that file for other useful parameters. > > Another option would be to make a big checklist or web page that asks > such questions and computes proper values, but it seems a script would > be easiest. We can even support '?' which would explain why the > question is being ask and how it affects the value. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2003-02-11 at 20:10, Tatsuo Ishii wrote: > Sigh. People always complain "pgbench does not reliably producing > repeatable numbers" or something then say "that's because pgbench's > transaction has too much contention on the branches table". So I added > -N option to pgbench which makes pgbench not to do any UPDATE to > the branches table. But still people continue to complian... What exactly does the -N option do? I see no mention of it in the README.pgbench, which might be part of reason people "continue to complain".