Thread: AW: AW: [HACKERS] Some notes on optimizer cost estimates

AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Zeugswetter Andreas SB
Date:
 
> > > > Couldn't we test some of these parameters inside 
> configure and set
> > > > them there?
> > > 
> > > If we could figure out a reasonably cheap way of estimating these
> > > numbers, it'd be worth setting up custom values at 
> installation time.
> > 
> > Imho this whole idea is not so good. (Sorry)
> > 
> > My points are:
> > 1. even if it is good for an optimizer to be smart,
> >     it is even more important, that it is predictable
> 
> ISTM that by the nature of things the most important capability of an
> optimizer is to yield optimal results. 

Yes, but beleive me it cannot do that. 
There are various reasons for that e.g.:
1. bad/old statistics
2. not considered concurrency issues
3. iterference from other applications
It will only lead a high percentage of optimal plans.
The rest ranges from not so good to far off. It is the responsibility of the
SQL programmer to avoid these, but he can only do that if the 
optimizer is predictable.

> This, however, does 
> not have to be
> mutually exclusive with predictability. If you estimate some 
> CPU and disk
> parameters and write them into a config file, then you can always give
> this config file to a bug fixer. It will still work on his machine, just
> less than optimally.

That is an idea, yes. But I doubt that it is really worth it.
Imho it would be more important to consider concurrency
issues first, and that would be very hard to do.

e.g.

2 sessions doing seq scan on huge table x.
they start their query with a time offset, that 
does not allow session 2 to use pages from session 1 
buffer cache (not enough memory).
It would be optimal if session 1 would wait so long that 
session 2 can read from cache.
They would both benefit from this strategy.

Andreas


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Philip Warner
Date:
At 09:51 26/01/00 +0100, Zeugswetter Andreas SB wrote:
> 
>> > > > Couldn't we test some of these parameters inside 
>> configure and set
>> > > > them there?
>> > > 
>> > > If we could figure out a reasonably cheap way of estimating these
>> > > numbers, it'd be worth setting up custom values at 
>> installation time.
>> > 
>> > Imho this whole idea is not so good. (Sorry)
>> > 
>> > My points are:
>> > 1. even if it is good for an optimizer to be smart,
>> >     it is even more important, that it is predictable
>> 
>> ISTM that by the nature of things the most important capability of an
>> optimizer is to yield optimal results. 
>
>Yes, but beleive me it cannot do that. 
>There are various reasons for that e.g.:
>1. bad/old statistics
>2. not considered concurrency issues
>3. iterference from other applications
>It will only lead a high percentage of optimal plans.
>The rest ranges from not so good to far off. It is the responsibility of the
>SQL programmer to avoid these, but he can only do that if the 
>optimizer is predictable.

Based on experience with optimizer improvements across releases of DB
products (not PostgreSQL, I hastily add), I would be inclined to say (from
bitter experience) that no optimizer is ever truly predicatable. The SQL
programmer has to be given the tools to ensure that a 'bad' query can be
forced to run the same way with each release, and release notes should
indicate what extra strategies are now available, in case the 'bad' query
can be made better.

It gets my goat (a bit) when commercial DB manufacturers believe that they
can solve intractable optimization problems - it would be a pity for PGSQL
to go the same way. I'd love to have the opportunity to prove my point with
PGSQL, but since I can't affect the optimizers choices in any way, I am
left with rhetoric, and examples from commercial DBs, which aren't, really,
relevant.

Sorry about the chestnut.


>> This, however, does 
>> not have to be
>> mutually exclusive with predictability. If you estimate some 
>> CPU and disk
>> parameters and write them into a config file, then you can always give
>> this config file to a bug fixer. It will still work on his machine, just
>> less than optimally.
>
>That is an idea, yes. But I doubt that it is really worth it.
>Imho it would be more important to consider concurrency
>issues first, and that would be very hard to do.
>
>e.g.
>
>2 sessions doing seq scan on huge table x.
>they start their query with a time offset, that 
>does not allow session 2 to use pages from session 1 
>buffer cache (not enough memory).
>It would be optimal if session 1 would wait so long that 
>session 2 can read from cache.
>They would both benefit from this strategy.
>
>Andreas
>
>************
>
>
----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
"Ross J. Reedstrom"
Date:
On Wed, Jan 26, 2000 at 08:07:17PM +1100, Philip Warner wrote:
> 
> Based on experience with optimizer improvements across releases of DB
> products (not PostgreSQL, I hastily add), I would be inclined to say (from
> bitter experience) that no optimizer is ever truly predicatable. The SQL
> programmer has to be given the tools to ensure that a 'bad' query can be
> forced to run the same way with each release, and release notes should
> indicate what extra strategies are now available, in case the 'bad' query
> can be made better.
> 
> It gets my goat (a bit) when commercial DB manufacturers believe that they
> can solve intractable optimization problems - it would be a pity for PGSQL
> to go the same way. I'd love to have the opportunity to prove my point with
> PGSQL, but since I can't affect the optimizers choices in any way, I am
> left with rhetoric, and examples from commercial DBs, which aren't, really,
> relevant.
> 
> Sorry about the chestnut.

Ah, but you _can_ affect how the plans chosen, which in turn can affect
the optimizer. Not as part of a running, production system, I grant you,
but for debugging performance problems (and in particular, changes from
one release to the next) it can be useful. What I'm talking about are
the switches to the backend that tell pgsql not use particular kinds
of joins/scans in planning a query

>From postgres(1):
      -f     Forbids the use of particular scan and  join  meth­             ods:  s  and  i  disable sequential and
indexscans             respectively, while n, m and h disable nested-loop,             merge   and   hash  joins
respectively.  (Neither             sequential scans nor nested-loop joins can be  dis­             abled  completely;
the -fs and -fn options simply             discourage the  optimizer  from  using  those  plan             types if it
hasany other alternative.)
 

While not the whole ball of wax in terms of controlling the
planner/optimizer stages, it does give you one more knob to tweak,
beyond saying "This query took 2 sec. on release X, now it takes 2
min. on release X+1"

Perhaps someone (Phil?) could collect 'bad' queries, and run them against
each release, and donate that part of the release notes to Bruce.

Ross

-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Tom Lane
Date:
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
> Ah, but you _can_ affect how the plans chosen, which in turn can affect
> the optimizer. Not as part of a running, production system, I grant you,
> but for debugging performance problems (and in particular, changes from
> one release to the next) it can be useful. What I'm talking about are
> the switches to the backend that tell pgsql not use particular kinds
> of joins/scans in planning a query

BTW, I have been thinking that it'd be a lot better if these flags could
be twiddled via SET/SHOW commands, instead of having to restart psql.
Nothing done about it yet, but it's an idea...

Also, you already can twiddle the basic cost parameters (cpu_page_weight
and cpu_index_page_weight) via SET variables whose names I forget at the
moment.  There will be probably be at least one more such variable
before 7.0 comes out, to control cost of random page fetch vs. sequential.
        regards, tom lane


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Don Baccus
Date:
At 12:06 PM 1/26/00 -0500, Tom Lane wrote:
>"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes:
>> Ah, but you _can_ affect how the plans chosen, which in turn can affect
>> the optimizer. Not as part of a running, production system, I grant you,
>> but for debugging performance problems (and in particular, changes from
>> one release to the next) it can be useful. What I'm talking about are
>> the switches to the backend that tell pgsql not use particular kinds
>> of joins/scans in planning a query
>
>BTW, I have been thinking that it'd be a lot better if these flags could
>be twiddled via SET/SHOW commands, instead of having to restart psql.
>Nothing done about it yet, but it's an idea...

If something like this could be done for only the current transaction
or only the current backend, you'd have a sledgehammer-quality tool
for tuning individual queries, no?  Obviously not an ideal tool but
maybe helpful to some folks?

I've not looked at the code, maybe doing a SET would only affect
the current backend?

>Also, you already can twiddle the basic cost parameters (cpu_page_weight
>and cpu_index_page_weight) via SET variables whose names I forget at the
>moment.  There will be probably be at least one more such variable
>before 7.0 comes out, to control cost of random page fetch vs. sequential.

These should be helpful, too...again, are they system-wide or
limited to the current backend?  It would probably be nice to be
able to futz them for a particular query without impacting all other
queries going on at the same time.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> I've not looked at the code, maybe doing a SET would only affect
> the current backend?

Right, just the current backend.
        regards, tom lane


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Philip Warner
Date:
At 10:05 26/01/00 -0600, Ross J. Reedstrom wrote:
>On Wed, Jan 26, 2000 at 08:07:17PM +1100, Philip Warner wrote:
>> 
>> Based on experience with optimizer improvements across releases of DB
>> products (not PostgreSQL, I hastily add), I would be inclined to say (from
>> bitter experience) that no optimizer is ever truly predicatable. The SQL
>> programmer has to be given the tools to ensure that a 'bad' query can be
>> forced to run the same way with each release, and release notes should
>> indicate what extra strategies are now available, in case the 'bad' query
>> can be made better.
>> 
>> It gets my goat (a bit) when commercial DB manufacturers believe that they
>> can solve intractable optimization problems - it would be a pity for PGSQL
>> to go the same way. I'd love to have the opportunity to prove my point with
>> PGSQL, but since I can't affect the optimizers choices in any way, I am
>> left with rhetoric, and examples from commercial DBs, which aren't, really,
>> relevant.
>> 
>> Sorry about the chestnut.
>
>Ah, but you _can_ affect how the plans chosen, which in turn can affect
>the optimizer. Not as part of a running, production system, I grant you,
>but for debugging performance problems (and in particular, changes from
>one release to the next) it can be useful. What I'm talking about are
>the switches to the backend that tell pgsql not use particular kinds
>of joins/scans in planning a query
>
>>From postgres(1):
>
>       -f     Forbids the use of particular scan and  join  meth­
>              ods:  s  and  i  disable sequential and index scans
>              respectively, while n, m and h disable nested-loop,
>              merge   and   hash  joins  respectively.   (Neither
>              sequential scans nor nested-loop joins can be  dis­
>              abled  completely;  the  -fs and -fn options simply
>              discourage the  optimizer  from  using  those  plan
>              types if it has any other alternative.)

I think what I would ike to see is the flip-side to this: the ability to
force it to do, eg, nested-loop joins. But this is certainly a good start,
and it would be *great* if (as is suggested later in the thread), these
could be made runtime settings, and even better if they could be set for an
individual query.


>While not the whole ball of wax in terms of controlling the
>planner/optimizer stages, it does give you one more knob to tweak,
>beyond saying "This query took 2 sec. on release X, now it takes 2
>min. on release X+1"
>
>Perhaps someone (Phil?) could collect 'bad' queries, and run them against
>each release, and donate that part of the release notes to Bruce.

I would have though they might be better placed in the regression tests,
and issue a warning when a query is more than, say, 50% slower. The problem
with this is that you really only see performance problems on large tables,
so such tests would take a long time to construct & run. Nevertheless, I
like to idea of maintaining a collection of known 'bad' queries - I'll try
to track down the ones that caused me problems.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Peter Eisentraut
Date:
On 2000-01-26, Tom Lane mentioned:

> BTW, I have been thinking that it'd be a lot better if these flags could
> be twiddled via SET/SHOW commands, instead of having to restart psql.
> Nothing done about it yet, but it's an idea...
> 
> Also, you already can twiddle the basic cost parameters (cpu_page_weight
> and cpu_index_page_weight) via SET variables whose names I forget at the
> moment.  There will be probably be at least one more such variable
> before 7.0 comes out, to control cost of random page fetch vs. sequential.

Independent of this, I thought numerous times (when similar "tuning"
issues came up) that it's time for a real unified configuration file,
which includes pg_options, the geqo what-not, an option for each of these
backend tuning options (join methods, fsync), heck maybe even the port
number and an alternative location for temp/sort files. Kind of put all
the administration in one place. Something to think about maybe.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden





Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Bruce Momjian
Date:
[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> On 2000-01-26, Tom Lane mentioned:
> 
> > BTW, I have been thinking that it'd be a lot better if these flags could
> > be twiddled via SET/SHOW commands, instead of having to restart psql.
> > Nothing done about it yet, but it's an idea...
> > 
> > Also, you already can twiddle the basic cost parameters (cpu_page_weight
> > and cpu_index_page_weight) via SET variables whose names I forget at the
> > moment.  There will be probably be at least one more such variable
> > before 7.0 comes out, to control cost of random page fetch vs. sequential.
> 
> Independent of this, I thought numerous times (when similar "tuning"
> issues came up) that it's time for a real unified configuration file,
> which includes pg_options, the geqo what-not, an option for each of these
> backend tuning options (join methods, fsync), heck maybe even the port
> number and an alternative location for temp/sort files. Kind of put all
> the administration in one place. Something to think about maybe.

Added to TODO:
* Unify configuration into one configuration file


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Thomas Lockhart
Date:
> > Independent of this, I thought numerous times (when similar "tuning"
> > issues came up) that it's time for a real unified configuration file,
> > which includes pg_options, the geqo what-not, an option for each of these
> > backend tuning options (join methods, fsync), heck maybe even the port
> > number and an alternative location for temp/sort files. Kind of put all
> > the administration in one place. Something to think about maybe.
> Added to TODO:
>         * Unify configuration into one configuration file

... and that is a good example of database design because?? ;)

It may be the right thing to do, but I can see why an RDBMS project
would have trouble with the concept...
                    - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Added to TODO:
>> * Unify configuration into one configuration file

> ... and that is a good example of database design because?? ;)

Good point ;-).  OTOH, the existing mishmash of config files and
option-setting methods isn't a good example of any kind of design.
It "just grew".

One thing to consider while contemplating a grand unified config file
(GUC?) is that much of this stuff needs to be settable per-client.
It would be wrong to rip out whatever dynamic option-setting code
there is.  Cleaning it up and making a more uniform interface to the
various options does sound like a good project though.

I'd want to see a paper design for how things should work before any
coding starts --- the existing methods do have some non-obvious
advantages.  For example, even something as grotty as the PGOPTIONS
environment variable has its uses: you can pass options through to
a backend without needing explicit cooperation from your client
application.
        regards, tom lane


GUC (Re: AW: AW: [HACKERS] Some notes on optimizer cost estimates)

From
Peter Eisentraut
Date:
On 2000-01-28, Tom Lane mentioned:

> Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> >> Added to TODO:
> >> * Unify configuration into one configuration file
> 
> > ... and that is a good example of database design because?? ;)

This has more to do with software administration than database design or
databases in the first place. IMHO, o.c.

> One thing to consider while contemplating a grand unified config file
> (GUC?)

Darn, I was gonna suggest that name.

> is that much of this stuff needs to be settable per-client.
> It would be wrong to rip out whatever dynamic option-setting code
> there is.  Cleaning it up and making a more uniform interface to the
> various options does sound like a good project though.

Nobody said anything about ripping out existing code. There just need to
be defaults settable somewhere without entering -o -F -f -B -q -R -n all
the time. I'm sure we can come up with something.

> 
> I'd want to see a paper design for how things should work before any
> coding starts --- the existing methods do have some non-obvious
> advantages.  For example, even something as grotty as the PGOPTIONS
> environment variable has its uses: you can pass options through to
> a backend without needing explicit cooperation from your client
> application.

So PGOPTIONS can be tied into the scheme.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden