Thread: AW: AW: [HACKERS] Some notes on optimizer cost estimates
> > > > Couldn't we test some of these parameters inside > configure and set > > > > them there? > > > > > > If we could figure out a reasonably cheap way of estimating these > > > numbers, it'd be worth setting up custom values at > installation time. > > > > Imho this whole idea is not so good. (Sorry) > > > > My points are: > > 1. even if it is good for an optimizer to be smart, > > it is even more important, that it is predictable > > ISTM that by the nature of things the most important capability of an > optimizer is to yield optimal results. Yes, but beleive me it cannot do that. There are various reasons for that e.g.: 1. bad/old statistics 2. not considered concurrency issues 3. iterference from other applications It will only lead a high percentage of optimal plans. The rest ranges from not so good to far off. It is the responsibility of the SQL programmer to avoid these, but he can only do that if the optimizer is predictable. > This, however, does > not have to be > mutually exclusive with predictability. If you estimate some > CPU and disk > parameters and write them into a config file, then you can always give > this config file to a bug fixer. It will still work on his machine, just > less than optimally. That is an idea, yes. But I doubt that it is really worth it. Imho it would be more important to consider concurrency issues first, and that would be very hard to do. e.g. 2 sessions doing seq scan on huge table x. they start their query with a time offset, that does not allow session 2 to use pages from session 1 buffer cache (not enough memory). It would be optimal if session 1 would wait so long that session 2 can read from cache. They would both benefit from this strategy. Andreas
At 09:51 26/01/00 +0100, Zeugswetter Andreas SB wrote: > >> > > > Couldn't we test some of these parameters inside >> configure and set >> > > > them there? >> > > >> > > If we could figure out a reasonably cheap way of estimating these >> > > numbers, it'd be worth setting up custom values at >> installation time. >> > >> > Imho this whole idea is not so good. (Sorry) >> > >> > My points are: >> > 1. even if it is good for an optimizer to be smart, >> > it is even more important, that it is predictable >> >> ISTM that by the nature of things the most important capability of an >> optimizer is to yield optimal results. > >Yes, but beleive me it cannot do that. >There are various reasons for that e.g.: >1. bad/old statistics >2. not considered concurrency issues >3. iterference from other applications >It will only lead a high percentage of optimal plans. >The rest ranges from not so good to far off. It is the responsibility of the >SQL programmer to avoid these, but he can only do that if the >optimizer is predictable. Based on experience with optimizer improvements across releases of DB products (not PostgreSQL, I hastily add), I would be inclined to say (from bitter experience) that no optimizer is ever truly predicatable. The SQL programmer has to be given the tools to ensure that a 'bad' query can be forced to run the same way with each release, and release notes should indicate what extra strategies are now available, in case the 'bad' query can be made better. It gets my goat (a bit) when commercial DB manufacturers believe that they can solve intractable optimization problems - it would be a pity for PGSQL to go the same way. I'd love to have the opportunity to prove my point with PGSQL, but since I can't affect the optimizers choices in any way, I am left with rhetoric, and examples from commercial DBs, which aren't, really, relevant. Sorry about the chestnut. >> This, however, does >> not have to be >> mutually exclusive with predictability. If you estimate some >> CPU and disk >> parameters and write them into a config file, then you can always give >> this config file to a bug fixer. It will still work on his machine, just >> less than optimally. > >That is an idea, yes. But I doubt that it is really worth it. >Imho it would be more important to consider concurrency >issues first, and that would be very hard to do. > >e.g. > >2 sessions doing seq scan on huge table x. >they start their query with a time offset, that >does not allow session 2 to use pages from session 1 >buffer cache (not enough memory). >It would be optimal if session 1 would wait so long that >session 2 can read from cache. >They would both benefit from this strategy. > >Andreas > >************ > > ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On Wed, Jan 26, 2000 at 08:07:17PM +1100, Philip Warner wrote: > > Based on experience with optimizer improvements across releases of DB > products (not PostgreSQL, I hastily add), I would be inclined to say (from > bitter experience) that no optimizer is ever truly predicatable. The SQL > programmer has to be given the tools to ensure that a 'bad' query can be > forced to run the same way with each release, and release notes should > indicate what extra strategies are now available, in case the 'bad' query > can be made better. > > It gets my goat (a bit) when commercial DB manufacturers believe that they > can solve intractable optimization problems - it would be a pity for PGSQL > to go the same way. I'd love to have the opportunity to prove my point with > PGSQL, but since I can't affect the optimizers choices in any way, I am > left with rhetoric, and examples from commercial DBs, which aren't, really, > relevant. > > Sorry about the chestnut. Ah, but you _can_ affect how the plans chosen, which in turn can affect the optimizer. Not as part of a running, production system, I grant you, but for debugging performance problems (and in particular, changes from one release to the next) it can be useful. What I'm talking about are the switches to the backend that tell pgsql not use particular kinds of joins/scans in planning a query >From postgres(1): -f Forbids the use of particular scan and join meth ods: s and i disable sequential and indexscans respectively, while n, m and h disable nested-loop, merge and hash joins respectively. (Neither sequential scans nor nested-loop joins can be dis abled completely; the -fs and -fn options simply discourage the optimizer from using those plan types if it hasany other alternative.) While not the whole ball of wax in terms of controlling the planner/optimizer stages, it does give you one more knob to tweak, beyond saying "This query took 2 sec. on release X, now it takes 2 min. on release X+1" Perhaps someone (Phil?) could collect 'bad' queries, and run them against each release, and donate that part of the release notes to Bruce. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > Ah, but you _can_ affect how the plans chosen, which in turn can affect > the optimizer. Not as part of a running, production system, I grant you, > but for debugging performance problems (and in particular, changes from > one release to the next) it can be useful. What I'm talking about are > the switches to the backend that tell pgsql not use particular kinds > of joins/scans in planning a query BTW, I have been thinking that it'd be a lot better if these flags could be twiddled via SET/SHOW commands, instead of having to restart psql. Nothing done about it yet, but it's an idea... Also, you already can twiddle the basic cost parameters (cpu_page_weight and cpu_index_page_weight) via SET variables whose names I forget at the moment. There will be probably be at least one more such variable before 7.0 comes out, to control cost of random page fetch vs. sequential. regards, tom lane
At 12:06 PM 1/26/00 -0500, Tom Lane wrote: >"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: >> Ah, but you _can_ affect how the plans chosen, which in turn can affect >> the optimizer. Not as part of a running, production system, I grant you, >> but for debugging performance problems (and in particular, changes from >> one release to the next) it can be useful. What I'm talking about are >> the switches to the backend that tell pgsql not use particular kinds >> of joins/scans in planning a query > >BTW, I have been thinking that it'd be a lot better if these flags could >be twiddled via SET/SHOW commands, instead of having to restart psql. >Nothing done about it yet, but it's an idea... If something like this could be done for only the current transaction or only the current backend, you'd have a sledgehammer-quality tool for tuning individual queries, no? Obviously not an ideal tool but maybe helpful to some folks? I've not looked at the code, maybe doing a SET would only affect the current backend? >Also, you already can twiddle the basic cost parameters (cpu_page_weight >and cpu_index_page_weight) via SET variables whose names I forget at the >moment. There will be probably be at least one more such variable >before 7.0 comes out, to control cost of random page fetch vs. sequential. These should be helpful, too...again, are they system-wide or limited to the current backend? It would probably be nice to be able to futz them for a particular query without impacting all other queries going on at the same time. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: > I've not looked at the code, maybe doing a SET would only affect > the current backend? Right, just the current backend. regards, tom lane
At 10:05 26/01/00 -0600, Ross J. Reedstrom wrote: >On Wed, Jan 26, 2000 at 08:07:17PM +1100, Philip Warner wrote: >> >> Based on experience with optimizer improvements across releases of DB >> products (not PostgreSQL, I hastily add), I would be inclined to say (from >> bitter experience) that no optimizer is ever truly predicatable. The SQL >> programmer has to be given the tools to ensure that a 'bad' query can be >> forced to run the same way with each release, and release notes should >> indicate what extra strategies are now available, in case the 'bad' query >> can be made better. >> >> It gets my goat (a bit) when commercial DB manufacturers believe that they >> can solve intractable optimization problems - it would be a pity for PGSQL >> to go the same way. I'd love to have the opportunity to prove my point with >> PGSQL, but since I can't affect the optimizers choices in any way, I am >> left with rhetoric, and examples from commercial DBs, which aren't, really, >> relevant. >> >> Sorry about the chestnut. > >Ah, but you _can_ affect how the plans chosen, which in turn can affect >the optimizer. Not as part of a running, production system, I grant you, >but for debugging performance problems (and in particular, changes from >one release to the next) it can be useful. What I'm talking about are >the switches to the backend that tell pgsql not use particular kinds >of joins/scans in planning a query > >>From postgres(1): > > -f Forbids the use of particular scan and join meth > ods: s and i disable sequential and index scans > respectively, while n, m and h disable nested-loop, > merge and hash joins respectively. (Neither > sequential scans nor nested-loop joins can be dis > abled completely; the -fs and -fn options simply > discourage the optimizer from using those plan > types if it has any other alternative.) I think what I would ike to see is the flip-side to this: the ability to force it to do, eg, nested-loop joins. But this is certainly a good start, and it would be *great* if (as is suggested later in the thread), these could be made runtime settings, and even better if they could be set for an individual query. >While not the whole ball of wax in terms of controlling the >planner/optimizer stages, it does give you one more knob to tweak, >beyond saying "This query took 2 sec. on release X, now it takes 2 >min. on release X+1" > >Perhaps someone (Phil?) could collect 'bad' queries, and run them against >each release, and donate that part of the release notes to Bruce. I would have though they might be better placed in the regression tests, and issue a warning when a query is more than, say, 50% slower. The problem with this is that you really only see performance problems on large tables, so such tests would take a long time to construct & run. Nevertheless, I like to idea of maintaining a collection of known 'bad' queries - I'll try to track down the ones that caused me problems. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
On 2000-01-26, Tom Lane mentioned: > BTW, I have been thinking that it'd be a lot better if these flags could > be twiddled via SET/SHOW commands, instead of having to restart psql. > Nothing done about it yet, but it's an idea... > > Also, you already can twiddle the basic cost parameters (cpu_page_weight > and cpu_index_page_weight) via SET variables whose names I forget at the > moment. There will be probably be at least one more such variable > before 7.0 comes out, to control cost of random page fetch vs. sequential. Independent of this, I thought numerous times (when similar "tuning" issues came up) that it's time for a real unified configuration file, which includes pg_options, the geqo what-not, an option for each of these backend tuning options (join methods, fsync), heck maybe even the port number and an alternative location for temp/sort files. Kind of put all the administration in one place. Something to think about maybe. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
[Charset ISO-8859-1 unsupported, filtering to ASCII...] > On 2000-01-26, Tom Lane mentioned: > > > BTW, I have been thinking that it'd be a lot better if these flags could > > be twiddled via SET/SHOW commands, instead of having to restart psql. > > Nothing done about it yet, but it's an idea... > > > > Also, you already can twiddle the basic cost parameters (cpu_page_weight > > and cpu_index_page_weight) via SET variables whose names I forget at the > > moment. There will be probably be at least one more such variable > > before 7.0 comes out, to control cost of random page fetch vs. sequential. > > Independent of this, I thought numerous times (when similar "tuning" > issues came up) that it's time for a real unified configuration file, > which includes pg_options, the geqo what-not, an option for each of these > backend tuning options (join methods, fsync), heck maybe even the port > number and an alternative location for temp/sort files. Kind of put all > the administration in one place. Something to think about maybe. Added to TODO: * Unify configuration into one configuration file -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Independent of this, I thought numerous times (when similar "tuning" > > issues came up) that it's time for a real unified configuration file, > > which includes pg_options, the geqo what-not, an option for each of these > > backend tuning options (join methods, fsync), heck maybe even the port > > number and an alternative location for temp/sort files. Kind of put all > > the administration in one place. Something to think about maybe. > Added to TODO: > * Unify configuration into one configuration file ... and that is a good example of database design because?? ;) It may be the right thing to do, but I can see why an RDBMS project would have trouble with the concept... - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Added to TODO: >> * Unify configuration into one configuration file > ... and that is a good example of database design because?? ;) Good point ;-). OTOH, the existing mishmash of config files and option-setting methods isn't a good example of any kind of design. It "just grew". One thing to consider while contemplating a grand unified config file (GUC?) is that much of this stuff needs to be settable per-client. It would be wrong to rip out whatever dynamic option-setting code there is. Cleaning it up and making a more uniform interface to the various options does sound like a good project though. I'd want to see a paper design for how things should work before any coding starts --- the existing methods do have some non-obvious advantages. For example, even something as grotty as the PGOPTIONS environment variable has its uses: you can pass options through to a backend without needing explicit cooperation from your client application. regards, tom lane
On 2000-01-28, Tom Lane mentioned: > Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > >> Added to TODO: > >> * Unify configuration into one configuration file > > > ... and that is a good example of database design because?? ;) This has more to do with software administration than database design or databases in the first place. IMHO, o.c. > One thing to consider while contemplating a grand unified config file > (GUC?) Darn, I was gonna suggest that name. > is that much of this stuff needs to be settable per-client. > It would be wrong to rip out whatever dynamic option-setting code > there is. Cleaning it up and making a more uniform interface to the > various options does sound like a good project though. Nobody said anything about ripping out existing code. There just need to be defaults settable somewhere without entering -o -F -f -B -q -R -n all the time. I'm sure we can come up with something. > > I'd want to see a paper design for how things should work before any > coding starts --- the existing methods do have some non-obvious > advantages. For example, even something as grotty as the PGOPTIONS > environment variable has its uses: you can pass options through to > a backend without needing explicit cooperation from your client > application. So PGOPTIONS can be tied into the scheme. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden