Thread: Index Tuning Features

Index Tuning Features

From
Simon Riggs
Date:
For 8.3, I'd like to add the following two related features to assist
with Index Tuning and usability:

- Virtual Indexes

An index which only exists in the catalog, so is visible to the planner
but not the executor. This is useful where a specific SQL query is being
hand-tuned, allowing very specific options to be selected. 

Virtual indexes would only be seen by the planner when performing an
EXPLAIN and when enable_virtual_index = on (default: off, Userset).
Normal SQL statements would ignore them completely, whatever
enable_virtual_index is set to.

It would not be possible to have both a virtual and a real index defined
identically at the same time. (If facilities existed to make temporary
tables exist only for a single backend, rather than requiring catalog
access then that implementation route would also work here, but until
that does, simple updates seem fine).

SQL: CREATE [VIRTUAL] [UNIQUE] INDEX ...


- RECOMMEND command

Similar in usage to an EXPLAIN, the RECOMMEND command would return a
list of indexes that need to be added to get the cheapest plan for a
particular query (no explain plan result though).

At planning time, all possible single column indexes would be assumed to
exist, plus all groups of cols that make up a multi-col Foreign Key
would be assumed to make a multi-col index. (PKs always exist,
remember). We track whether hypothetical indexes exist on the plan, so
once the cheapest plan has been decided we can report what they are (if
any). Hypothetical indexes last only for the duration of planning - no
catalog changes are made.

Command will return 1 row per selected index (can be more than one for a
complex query), first col gives list of indexed cols, second col shows
the SQL required to create that index. Virtual indexes will be noted,
though treated identically to hypothetical indexes.


The changes to do this would not be very invasive to the planner and
mainly involve adding additional fields to the planner data structures,
some additional branching code and command changes/additions.


Overall we need both of these new features: RECOMMEND covers many cases
in an easy to use form, with VIRTUAL indexes covers the rest of the
search space for possible new indexes for specific cases.

There's a host of other little tweaky bits we might imagine to enhance
this capability further, but this seems to cover the basic requirements.
Specifically, multi-column indexes are not considered very heavily in
RECOMMEND. This is deliberate because 
a) we don't have good multi-col interaction stats (though we might have
for 8.3?)
b) it greatly increases the run-time of exhaustive searching and 
c) because we have bitmap index interaction the usefulness of
multi-column indexes is much reduced anyhow, so cost/benefit not good.

Comments? (I'll do a summary of feedback tomorrow.)

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Index Tuning Features

From
Peter Eisentraut
Date:
Simon Riggs wrote:
> For 8.3, I'd like to add the following two related features to assist
> with Index Tuning and usability:
>
> - Virtual Indexes

This seems useful, but I'm not sure we need a catalog object for that.  
It might be sufficient to declare these hypothetical indexes within the 
EXPLAIN command.  That is after all the only place where they are 
applied.

> - RECOMMEND command
>
> Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> list of indexes that need to be added to get the cheapest plan for a
> particular query (no explain plan result though).

This functionality also seems useful, but maybe it should be the job of 
a user-space tool?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Index Tuning Features

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> - Virtual Indexes

> An index which only exists in the catalog, so is visible to the planner
> but not the executor.

Say what?  What would that possibly be useful for, other than crashing
any bit of code that failed to know about it?

> - RECOMMEND command

> Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> list of indexes that need to be added to get the cheapest plan for a
> particular query (no explain plan result though).

Both of these seem to assume that EXPLAIN results, without EXPLAIN
ANALYZE results to back them up, are sufficient for tuning.  I find
this idea a bit dubious, particularly for cases of "marginal" indexes.

> Specifically, multi-column indexes are not considered very heavily in
> RECOMMEND.

That seems like a bad idea as well --- multicol indexes are exactly the
sort of thing a novice DBA might fail to consider.  If you're going to
do this then you should consider all cases.
        regards, tom lane


Re: Index Tuning Features

From
"Joshua D. Drake"
Date:
Peter Eisentraut wrote:
> Simon Riggs wrote:
>> For 8.3, I'd like to add the following two related features to assist
>> with Index Tuning and usability:
>>
>> - Virtual Indexes
> 
> This seems useful, but I'm not sure we need a catalog object for that.  
> It might be sufficient to declare these hypothetical indexes within the 
> EXPLAIN command.  That is after all the only place where they are 
> applied.
> 
>> - RECOMMEND command
>>
>> Similar in usage to an EXPLAIN, the RECOMMEND command would return a
>> list of indexes that need to be added to get the cheapest plan for a
>> particular query (no explain plan result though).
> 
> This functionality also seems useful, but maybe it should be the job of 
> a user-space tool?

On this same vein I thought it would be interesting if we added a
suggestion to explain analyze... Something like:


Your estimated number of rows appears to be off. Have you ran analyze
lately?

Sincerely,

Joshua D. Drake




-- 
  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL
solutionssince 1997            http://www.commandprompt.com/
 




Re: Index Tuning Features

From
"Jim C. Nasby"
Date:
On Tue, Oct 10, 2006 at 06:06:09PM +0200, Peter Eisentraut wrote:
> Simon Riggs wrote:
> > For 8.3, I'd like to add the following two related features to assist
> > with Index Tuning and usability:
> >
> > - Virtual Indexes
> 
> This seems useful, but I'm not sure we need a catalog object for that.  
> It might be sufficient to declare these hypothetical indexes within the 
> EXPLAIN command.  That is after all the only place where they are 
> applied.
If you wanted to try multiple scenarios, that might become a pain. I
guess it depends on how verbose the syntax was...

> > - RECOMMEND command
> >
> > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > list of indexes that need to be added to get the cheapest plan for a
> > particular query (no explain plan result though).
> 
> This functionality also seems useful, but maybe it should be the job of 
> a user-space tool?

I think it makes the most sense to have this in core, though I guess an
argument could be made for having it be seperate from the backend. But
it'd have to be easy to call from an external tool, such as pgAdmin,
which means in probably needs to speak libpq.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Index Tuning Features

From
Robert Treat
Date:
On Tuesday 10 October 2006 12:06, Tom Lane wrote:
> > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > list of indexes that need to be added to get the cheapest plan for a
> > particular query (no explain plan result though).
>
> Both of these seem to assume that EXPLAIN results, without EXPLAIN
> ANALYZE results to back them up, are sufficient for tuning.  I find
> this idea a bit dubious, particularly for cases of "marginal" indexes.
>

While I agree with Tom that generally EXPLAIN is not enough for tuning, I also 
know that when your dealing with queries that have run times in multiples of 
hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just 
isn't an option.  Anything that can be done to wheedle down your choices 
before you have to run EXPLAIN ANALYZE is a bonus. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Index Tuning Features

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> Anything that can be done to wheedle down your choices 
> before you have to run EXPLAIN ANALYZE is a bonus. 

Fair enough, but I prefer Peter's suggestion of attaching the
hypothetical index definitions to EXPLAIN itself, rather than making
bogus catalog entries.  Something along the line of

EXPLAIN <statement>
ASSUMING INDEX fooi ON foo ....
[ ASSUMING INDEX ... ]

although this exact syntax probably doesn't work unless we're willing
to make ASSUMING a fully reserved word :-(

I have some vague recollection that this idea has been discussed
before...
        regards, tom lane


Re: Index Tuning Features

From
"Mark Woodward"
Date:
> Simon Riggs <simon@2ndquadrant.com> writes:

>> - RECOMMEND command
>
>> Similar in usage to an EXPLAIN, the RECOMMEND command would return a
>> list of indexes that need to be added to get the cheapest plan for a
>> particular query (no explain plan result though).
>
> Both of these seem to assume that EXPLAIN results, without EXPLAIN
> ANALYZE results to back them up, are sufficient for tuning.  I find
> this idea a bit dubious, particularly for cases of "marginal" indexes.


I think the idea of "virtual indexes" is pretty interesting, but
ultimately a lesser solution to a more fundimental issue, and that would
be "hands on" control over the planner. Estimating the effect of an index
on a query "prior" to creating the index is a great idea, how that is done
is something different than building concensus that it should be done.

Another thing that this brings up is "hints" to a query. Over the years, I
have run into situation where the planner wasn't great.  It would be nice
to try forcing different strategies on the planner and see if performance
caan be improved.


Re: Index Tuning Features

From
Josh Berkus
Date:
Mark,

> Another thing that this brings up is "hints" to a query. Over the years,
> I have run into situation where the planner wasn't great.  It would be
> nice to try forcing different strategies on the planner and see if
> performance caan be improved.

See discussion on -performance.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Index Tuning Features

From
"Jaime Casanova"
Date:
On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote:
> I think the idea of "virtual indexes" is pretty interesting, but
> ultimately a lesser solution to a more fundimental issue, and that would
> be "hands on" control over the planner. Estimating the effect of an index
> on a query "prior" to creating the index is a great idea, how that is done
> is something different than building concensus that it should be done.
>
> Another thing that this brings up is "hints" to a query. Over the years, I
> have run into situation where the planner wasn't great.  It would be nice
> to try forcing different strategies on the planner and see if performance
> caan be improved.
>

you can do this by setting enable_"access_method" type parameters.

-- 
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."                                      Richard Cook


Re: Index Tuning Features

From
Simon Riggs
Date:
On Tue, 2006-10-10 at 20:17 -0400, Mark Woodward wrote:
> Another thing that this brings up is "hints" to a query. Over the
> years, I
> have run into situation where the planner wasn't great.  It would be
> nice
> to try forcing different strategies on the planner and see if
> performance
> caan be improved.

/*+ Not on this thread, p-l-e-a-s-e */

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Index Tuning Features

From
"Zeugswetter Andreas ADI SD"
Date:
> > Another thing that this brings up is "hints" to a query. Over the
> > years, I have run into situation where the planner wasn't
> great.  It
> > would be nice to try forcing different strategies on the
> planner and
> > see if performance caan be improved.
> >
>
> you can do this by setting enable_"access_method" type parameters.

No, not generally. Usual problems include join order and wrong index,
not only wrong access method.

Andreas


Re: Index Tuning Features

From
Simon Riggs
Date:
Thanks everybody for comments so far; this will be a useful discussion.

On Tue, 2006-10-10 at 18:56 -0400, Robert Treat wrote: 
> On Tuesday 10 October 2006 12:06, Tom Lane wrote:
> > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > > list of indexes that need to be added to get the cheapest plan for a
> > > particular query (no explain plan result though).
> >
> > Both of these seem to assume that EXPLAIN results, without EXPLAIN
> > ANALYZE results to back them up, are sufficient for tuning.  I find
> > this idea a bit dubious, particularly for cases of "marginal" indexes.
> >
> 
> While I agree with Tom that generally EXPLAIN is not enough for tuning, I also 
> know that when your dealing with queries that have run times in multiples of 
> hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just 
> isn't an option.  Anything that can be done to wheedle down your choices 
> before you have to run EXPLAIN ANALYZE is a bonus. 

IMHO you need EXPLAIN, EXPLAIN ANALYZE and RECOMMEND

As Robert points out, using EA can make tuning take a long time and that
is the critical factor when you have a whole database/app to tune. 

This discussion helps me to make explicit what my thoughts had been on
what an ideal index tuning process is:

1. Recommendation: Use RECOMMEND to get an 80/20 setting for a
statement. As Peter suggests a "user-space" tool, I also imagine a tool
that would automatically run RECOMMEND on all SQL statements in a
workload and come up with proposals for additional indexes. We would
have a first cut index design in minutes rather than days.

2. Evaluation: We can then create the potential indexes as Virtual ones
and then re-run EXPLAINs to model how a whole workload would behave. We
can begin to prune low-impact indexes out of the mix at this stage.
Again, this can be done automatically.

3. Implementation: We re-create the new indexes as real indexes (perhaps
concurrently)

4. Correction: We then run the workload and then use existing tools to
spot the statements causing the most problems and manually assess them
using EXPLAIN ANALYZE. Manually postulate new Virtual indexes and
re-model the workload again as (2)

Steps (3) and (4) have both been improved for 8.2. Steps (1) and (2) are
completely new steps for 8.3

The above process can be performed without tool support, but its clear
that further automation will help greatly here. I foresee that the
development of both server-side and tools will take more than one
release. Discussion of tool support can begin once we have agreed
server-side capability.


With that as a backdrop, further comments are:

On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: 
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Anything that can be done to wheedle down your choices 
> > before you have to run EXPLAIN ANALYZE is a bonus. 
> 
> Fair enough, but I prefer Peter's suggestion of attaching the
> hypothetical index definitions to EXPLAIN itself, rather than making
> bogus catalog entries.  Something along the line of
> 
> EXPLAIN <statement>
> ASSUMING INDEX fooi ON foo ....
> [ ASSUMING INDEX ... ]

I do like this, though for step (2) above we would need to attach the
appropriate indexes to each of the SQL statements prior to execution.
Doing this for a single SQL statement is fine, but doing that for a
whole workload of 1000s of statements is not very practical, hence an
externally declarative approach seems better.

I can imagine many other declarative approaches other than the one I
proposed; it just seems pretty neat to me to use almost exactly the same
syntax for a virtual index as for a real index. As I mentioned, ideally
this would not be a full-strength catalog object, but I was thinking
towards implementation also. Another possibility would be to use a local
pg_virtual_indexes table.



On Tue, 2006-10-10 at 18:06 +0200, Peter Eisentraut wrote: 
> Simon Riggs wrote:
> 
> > - RECOMMEND command
> >
> > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > list of indexes that need to be added to get the cheapest plan for a
> > particular query (no explain plan result though).
> 
> This functionality also seems useful, but maybe it should be the job of 
> a user-space tool?

So from above, Yes, I see a user-space tool also, but not instead.

The RECOMMEND command is the minimal server functionality required to
enable an (external) automated tuning support tool to be developed.

Possible architectures for this functionality include both user-space
and server-space options. Much thinking has been done on this in the DB
research community, with the general consensus being its easier to
extend the planner to cope with postulation that it is to create an
external postulation tool that acts (accurately) like the planner.

"DB2 advisor: An optimizer smart enough to recommend its own indexes."
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohnman, and
Alan Skelley. 
In The 16th International Conference on Data Engineering (ICDE'00), San
Diego, CA. IEEE Computer Society, February 2000.

A wonderful summary of which is available here, but not sure if the full
paper is publicly available for free.
http://www.andrew.cmu.edu/user/ngm/15-823/summaries/08.pdf

DB2 Design Advisor: Integrated Automatic Physical Database Design

"DB2 Design Advisor: Integrated Automatic Physical Database Design"
Zilio et al
which is available at 
http://www.vldb.org/conf/2004/IND4P1.PDF#search=%22db2%20design%
20advisor%22


On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: 
> > Specifically, multi-column indexes are not considered very heavily in
> > RECOMMEND.
> 
> That seems like a bad idea as well --- multicol indexes are exactly the
> sort of thing a novice DBA might fail to consider.  If you're going to
> do this then you should consider all cases.

Calculating all index cases would follow the combinatorial explosion of
sum(N!/(r!(N-r)!)) though we can argue about exactly what N is in this
case. So we have the same problem as the main optimiser: exhaustive
search is not practical, so we must find a heuristic that allows us to
limit the search space so RECOMMEND doesn't run for too long.

The "no multi-col indexes except FKs" is just a proposed heuristic, so
happy to debate exactly what that heuristic should be. (There are
various research papers available with proposed heuristics). 

Multi-col indexes are also subject to over-fitting, since RECOMMEND
would be liable to return (for example) 7-column indexes as the best
choice for a single query, which would be bad overall. 

I'd been thinking about this for some time: the virtual index concept
fills in the gaps so that taken together, RECOMMEND and virtual indexes
provide a reasonable toolset for both limiting search space and yet
allowing more complex ideas to be tested.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Index Tuning Features

From
Csaba Nagy
Date:
> The above process can be performed without tool support, but its clear
> that further automation will help greatly here. I foresee that the
> development of both server-side and tools will take more than one
> release. Discussion of tool support can begin once we have agreed
> server-side capability.

If it came to automated tools, wouldn't fit in this discussion to give
some performance requirement limits to the RECOMMEND tool ? In a
workload not all queries are real time or high priority, and such a
lesser impact index can help enough sometimes to meet the requirements,
compared to a high impact index which would make the query fly.

Example: inserting in a table must be real time, reporting can be taken
offline...

So it would be nice to have a recommendation tool which can take into
account the performance requirements of the individual queries, possibly
making the right compromises to meat all requirements for all queries.

Cheers,
Csaba.




Re: Index Tuning Features

From
"Mark Woodward"
Date:
> On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote:
>> I think the idea of "virtual indexes" is pretty interesting, but
>> ultimately a lesser solution to a more fundimental issue, and that would
>> be "hands on" control over the planner. Estimating the effect of an
>> index
>> on a query "prior" to creating the index is a great idea, how that is
>> done
>> is something different than building concensus that it should be done.
>>
>> Another thing that this brings up is "hints" to a query. Over the years,
>> I
>> have run into situation where the planner wasn't great.  It would be
>> nice
>> to try forcing different strategies on the planner and see if
>> performance
>> caan be improved.
>>
>
> you can do this by setting enable_"access_method" type parameters.

Here's your hammer, all your problems are now nails.

The enable_xxx setting are OK for simple queries gone wrong, but if you
have a more complex query, any one of those settins may help or hinder
different parts of a query, then you would be left with choosing which of
them helps more than hurts the over-all query.

being able to alter the query plan would help in areas where there are
data patterns in a database that the ANALYZE command can't pick up because
it is not designed too.

Imagine you have a street map database ordered by zip, street, number. The
primary order is zipcode, the secondary order is street. There is a
relationship of number to street, and zip to street. The analyzer, at
least the last time I checked, does not recognize these relationships. So,
a search by street and number would probably use a sequential scan rather
than the street index.





Re: Index Tuning Features

From
Gregory Stark
Date:
"Mark Woodward" <pgsql@mohawksoft.com> writes:

> The analyzer, at least the last time I checked, does not recognize these
> relationships. 

The analyzer is imperfect but arguing from any particular imperfection is weak
because someone will just come back and say we should work on that problem --
though I note nobody's actually volunteering to do so whereas they appear to
be for hints.

I think the stronger argument is to say that there are some statistical
properties that the analyzer _cannot_ be expected to figure out. Either
because 

a) they're simply too complex to ever expect to be able to find automatically,

b) too expensive to make it worthwhile in the general case, or 

c) because of some operational issue such as the data changing frequently  enough that the analyzes that would be
necessaryto keep the statistics up  to date would become excessively expensive or even be impossible to perform
rapidlyenough.
 

The people arguing that hints themselves are of negative benefit are taking
the argument far too far. I've never heard an Oracle DBA gripe about having to
fix hints on an upgrade; they're usually the first ones to suggest hinting a
poorly written query. In fact Oracle is going in the opposite direction of
even relying on hints internally. Its plan stability feature depends on
generating and storing hints internally associated with every query.

The argument against hints is usually that the effort would be better spent
elsewhere, not that hints are inherently a bad idea. We already have enable_*
parameters and they are absolutely necessary for testing and experimenting to
understand whether the planner is incorrect and where it has gone wrong. Hints
are just a more precisely targeted version of these. There have been plenty of
instances on this list where people posted 20-30 line query plans with several
joins of each type where the enable_* parameters were too coarse grained to
use effectively.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Index Tuning Features

From
"Mark Woodward"
Date:
>
> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>
>> The analyzer, at least the last time I checked, does not recognize these
>> relationships.
>
> The analyzer is imperfect but arguing from any particular imperfection is
> weak
> because someone will just come back and say we should work on that problem
> --
> though I note nobody's actually volunteering to do so whereas they appear
> to
> be for hints.
>
> I think the stronger argument is to say that there are some statistical
> properties that the analyzer _cannot_ be expected to figure out. Either
> because
>
> a) they're simply too complex to ever expect to be able to find
> automatically,
>
> b) too expensive to make it worthwhile in the general case, or
>
> c) because of some operational issue such as the data changing frequently
>    enough that the analyzes that would be necessary to keep the statistics
> up
>    to date would become excessively expensive or even be impossible to
> perform rapidly enough.

Well, from a purely data domain standpoint, it is impossible to charactize
the exact nature of a data set without enough information to recreate it.
Anything less must be designed for a fixed set of assumptions. There is no
way that every specific trend can be covered by a fixed number of
assumptions.

The argument that all we need is better statistics completely misses the
point. There will *always* be a number cases where the planner will not
work optimally. I would say that a "simpler" planner with better hints
will always be capable of creating a better query plan.



Re: Index Tuning Features

From
Tom Lane
Date:
"Mark Woodward" <pgsql@mohawksoft.com> writes:
> I would say that a "simpler" planner with better hints
> will always be capable of creating a better query plan.

This is demonstrably false: all you need is an out-of-date hint, and
you can have a worse plan.

The argument against hints is not about whether someone could knock
together a crappy hint facility and be able to get some use out of it.
It is about how much work it would take to design a *good* hint facility
that makes it easy to maintain hints that are robust in the face of data
and query changes.  If someone were to sit down and design and build
such a thing, it'd very likely get accepted into core Postgres --- but
personally, I think the equivalent amount of effort would be better
spent on improving the planner and the statistics.

As Josh already noted, Oracle-like hints are pretty likely to get
rejected ... not only because of doubts about their true usefulness,
but out of fear of falling foul of some Oracle patent or other.
        regards, tom lane


Re: Index Tuning Features

From
"Mark Woodward"
Date:
> "Mark Woodward" <pgsql@mohawksoft.com> writes:
>> I would say that a "simpler" planner with better hints
>> will always be capable of creating a better query plan.
>
> This is demonstrably false: all you need is an out-of-date hint, and
> you can have a worse plan.

That doesn't make it false, it makes it higher maintenance. Hints are
understood to require maintenance.

>
> The argument against hints is not about whether someone could knock
> together a crappy hint facility and be able to get some use out of it.
> It is about how much work it would take to design a *good* hint facility
> that makes it easy to maintain hints that are robust in the face of data
> and query changes.  If someone were to sit down and design and build
> such a thing, it'd very likely get accepted into core Postgres --- but
> personally, I think the equivalent amount of effort would be better
> spent on improving the planner and the statistics.

While it is always true that something can be improved, there comes a
point where work outweighs benefits. I can't say that the planner is at
that point, but I think that isn't even an issue.

The notion of hints would probably one of the biggest steps toward
improving the planner. Like I said, it is inarguable that there will
always be queries that the planner can not execute efficiently based on
the statistics gathered by analze. Since that number must be greater than
zero, some methodology to deal with it should be created.


>
> As Josh already noted, Oracle-like hints are pretty likely to get
> rejected ... not only because of doubts about their true usefulness,
> but out of fear of falling foul of some Oracle patent or other.

Well, if it would get rejected if it looked like Oracle, assuming you
would probably be one of the people rejecting it, what do you envision as
not being rejected?


Re: Index Tuning Features

From
Andrew Sullivan
Date:
On Wed, Oct 11, 2006 at 12:40:42PM -0400, Gregory Stark wrote:
> poorly written query. In fact Oracle is going in the opposite direction of
> even relying on hints internally. Its plan stability feature depends on
> generating and storing hints internally associated with every query.

But IBM, whose DB2 planner and optimiser is generally regarded as way
better than Oracle's (at least by anyone I know who's used both),
doesn't like hints.  The IBM people all say the same thing Tom has
said before: that the work to design the thing correctly is better
spent making the planner and optimiser parts smarter and cheaper,
because out of that work you also manage not to have the DBA
accidentally mess things up by simple-minded rule-based hints.  (Note
that I'm not trying to wade into the actual argument; I'm just
pointing out that even the biggest industry people don't agree on
this point.)

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.    --Alexander Hamilton


Re: Index Tuning Features

From
Andrew Sullivan
Date:
On Wed, Oct 11, 2006 at 03:27:19PM -0400, Mark Woodward wrote:

> improving the planner. Like I said, it is inarguable that there will
> always be queries that the planner can not execute efficiently based on
> the statistics gathered by analze. Since that number must be greater than
> zero, some methodology to deal with it should be created.

Just because I'm one of those statistics true believers, what sort of
information do you think it is possible for the DBA to take into
consideration, when building a hint, that could not in principle be
gathered efficiently by a statistics system?  It seems to me that
you're claiming that DBAs can have magic knowledge.

While I would be delighted to learn that my thumb in the air guesses
in the past had turned out to be due to my deep knowledge of my data,
I'm instead unhappily confessing that what I really, really wanted
when I made those guesses was better knowledge, based on some
analysis of the data.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.    --George Orwell


Re: Index Tuning Features

From
Theo Schlossnagle
Date:
On Oct 11, 2006, at 3:00 PM, Andrew Sullivan wrote:

> On Wed, Oct 11, 2006 at 12:40:42PM -0400, Gregory Stark wrote:
>> poorly written query. In fact Oracle is going in the opposite  
>> direction of
>> even relying on hints internally. Its plan stability feature  
>> depends on
>> generating and storing hints internally associated with every query.
>
> But IBM, whose DB2 planner and optimiser is generally regarded as way
> better than Oracle's (at least by anyone I know who's used both),
> doesn't like hints.  The IBM people all say the same thing Tom has
> said before: that the work to design the thing correctly is better
> spent making the planner and optimiser parts smarter and cheaper,
> because out of that work you also manage not to have the DBA
> accidentally mess things up by simple-minded rule-based hints.  (Note
> that I'm not trying to wade into the actual argument; I'm just
> pointing out that even the biggest industry people don't agree on
> this point.)

DBAs can mess things up already if they misuse the tools they are  
provided.  Like 'rm'.  Which is there, but should _RARELY_ be used on  
database datafiles.  The argument that people _could_ use them in a  
bad way is silly.  Of course, they could use them in a bad way,  
that's not an _argument_.  Everyone agrees people can be stupid.

However, the planner will never be perfect.  I would like to see 1  
out of every 500,000 queries actually benefit from a hint system  
(which means that 499,999 of the queries were planned perfectly fine  
by the planner).  To fix my one query, that is crucially important to  
my business, it is a much more sane approach to hint the system to  
change its plan than it is to have to upgrade my binaries.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/




Re: Index Tuning Features

From
Josh Berkus
Date:
Simon,

The University of North Carolina (I think?) did some nice work on not only 
hypothetical indexes, but hypothetical materialized views (as well as 
really materialized view planner selection).   Have you looked at that 
work?  I think I forwarded the paper & code to Jonah at one point ...

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Hints WAS: Index Tuning Features

From
Josh Berkus
Date:
Mark,

First off, I'm going to request that you (and other people) stop hijacking 
Simon's thread on hypothetical indexes.   Hijacking threads is an 
effective way to get your ideas rejected out of hand, just because the 
people whose thread you hijacked are angry with you.  

So please observe the thread split, thanks.

> Well, if it would get rejected if it looked like Oracle, assuming you
> would probably be one of the people rejecting it, what do you envision
> as not being rejected?

Something "better than Oracle".

Since you're the one who wants hints, that's kind of up to you to define. 
Write a specification and make a proposal.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Index Tuning Features

From
Ron Mayer
Date:
Andrew Sullivan wrote:
> Just because I'm one of those statistics true believers, what sort of
> information do you think it is possible for the DBA to take into
> consideration, when building a hint, that could not in principle be
> gathered efficiently by a statistics system?  It seems to me that
> you're claiming that DBAs can have magic knowledge.

Is one example is the table of addresses clustered by zip-code
and indexes on State, City, County, etc?

The current statistics systems at least see no correlation between
these fields (since the alphabetical ordering of cities and
numbering of postal codes is quite different).   This makes the
planner under-use the indexes because it sees no correlation and
overestimates the number of pages read and the random accesses
needed.

However since San Francisco, CA data happens to be tightly packed
on a few pages (since it shares the same few zip codes), few
pages are needed and mostly sequential access could be used
when querying SF data -- though the optimizer guesses most pages
in the table may be hit, so often ignores the indexes.


Now I'm not saying that a more advanced statistics system
couldn't one-day be written that sees these patterns in the
data -- but it doesn't seem likely in the near term.  DBA-based
hints could be a useful interim work-around.


Re: Index Tuning Features

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Anything that can be done to wheedle down your choices 
> > before you have to run EXPLAIN ANALYZE is a bonus. 
> 
> Fair enough, but I prefer Peter's suggestion of attaching the
> hypothetical index definitions to EXPLAIN itself, rather than making
> bogus catalog entries.  Something along the line of

While I do like avoiding the bogus catalog entries and attaching the
declarations to the explain plan. One advantage of that is that I can see
extending it to handling "IGNORING INDEX foo" as well which may be just as
important.

One disadvantage is that it doesn't let you gather any statistics related to
the new index to see what the plan would really be. "But indexes don't
influence statistics" I can hear already from the chorus. But the reason we
have indexes not affecting planning is precisely because we don't want to
require an analyze after creating an index before it's used. Which these bogus
entries would resolve.

If we had the ability to create bogus indexes it would kill two birds with one
stone. You could use that as the facility for noting which multi-column
combinations are interesting.

You would create your proposed index, then run ANALYZE and EXPLAIN to your
heart's content. When you have it set up just so then you REINDEX your index
and you're set.

We already have these "bogus" indexes incidentally, we just create the index
with indisvalid=f.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Hints WAS: Index Tuning Features

From
"Mark Woodward"
Date:
> Mark,
>
> First off, I'm going to request that you (and other people) stop hijacking
> Simon's thread on hypothetical indexes.   Hijacking threads is an
> effective way to get your ideas rejected out of hand, just because the
> people whose thread you hijacked are angry with you.
>
> So please observe the thread split, thanks.
>
>> Well, if it would get rejected if it looked like Oracle, assuming you
>> would probably be one of the people rejecting it, what do you envision
>> as not being rejected?
>
> Something "better than Oracle".
>
> Since you're the one who wants hints, that's kind of up to you to define.
> Write a specification and make a proposal.
>

What is the point of writing a proposal if there is a threat of "will be
rejected" if one of the people who would do the rejection doesn't at least
outline what would be acceptable?




Re: Hints WAS: Index Tuning Features

From
"Joshua D. Drake"
Date:
>>
>> Since you're the one who wants hints, that's kind of up to you to define.
>> Write a specification and make a proposal.
>>
> 
> What is the point of writing a proposal if there is a threat of "will be
> rejected" if one of the people who would do the rejection doesn't at least
> outline what would be acceptable?

Oh come on Mark, you have been here long enough to know how this works.
You define what you would like to see and submit it for feedback.
-Hackers submit feedback, you refine and the cyle continues till either
-hackers determine it just isn't going to happen (packages), the would
be hacker gives up, or a workable plan comes out of the discussion.

Joshua D. Drake


> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


-- 
  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL
solutionssince 1997            http://www.commandprompt.com/
 




Re: Index Tuning Features

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> You would create your proposed index, then run ANALYZE and EXPLAIN to your
> heart's content. When you have it set up just so then you REINDEX your index
> and you're set.

And when you realize you don't want it after all ... you need an exclusive
lock on the table to drop it.  (Yes, you would, see relcache load.)
The advantage of keeping this idea all inside EXPLAIN is that there's
guaranteed to be no interference with anything else.

> We already have these "bogus" indexes incidentally, we just create the index
> with indisvalid=f.

Au contraire, that is something completely different.  indisvalid=f is
really the exact opposite: it's not there to the planner and it is there
to the executor.

As for the statistics business: really, we use the presence of an index
as a hint to gather certain kinds of stats about its underlying table.
If we had (ahem) statistical hints then we could gather appropriate data
with or without a real associated index.  That sort of feature would
have additional uses, ie, being able to estimate selectivities more
accurately for expressions that might not have anything to do with any
of the indexes on a table.
        regards, tom lane


Re: Hints WAS: Index Tuning Features

From
"Mark Woodward"
Date:
>>>
>>> Since you're the one who wants hints, that's kind of up to you to
>>> define.
>>> Write a specification and make a proposal.
>>>
>>
>> What is the point of writing a proposal if there is a threat of "will be
>> rejected" if one of the people who would do the rejection doesn't at
>> least
>> outline what would be acceptable?
>
> Oh come on Mark, you have been here long enough to know how this works.

Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great
system, and while I completely respect the individuals involved, I think
the "management" for lack of a better term, is difficult.



Re: Hints WAS: Index Tuning Features

From
"Joshua D. Drake"
Date:
Mark Woodward wrote:
>>>> Since you're the one who wants hints, that's kind of up to you to
>>>> define.
>>>> Write a specification and make a proposal.
>>>>
>>> What is the point of writing a proposal if there is a threat of "will be
>>> rejected" if one of the people who would do the rejection doesn't at
>>> least
>>> outline what would be acceptable?
>> Oh come on Mark, you have been here long enough to know how this works.
> 
> Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great
> system, and while I completely respect the individuals involved, I think
> the "management" for lack of a better term, is difficult.

Well that is the nature of FOSS development. If you think we are bad.. I
could easily list half a dozen that are worse ;) A couple of the much
larger then us.

Joshua D. Drake


> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


-- 
  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240  Providing the most comprehensive  PostgreSQL
solutionssince 1997            http://www.commandprompt.com/
 

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Re: Hints WAS: Index Tuning Features

From
Christopher Browne
Date:
Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into her beard:
>> Mark,
>>
>> First off, I'm going to request that you (and other people) stop
>> hijacking Simon's thread on hypothetical indexes.  Hijacking
>> threads is an effective way to get your ideas rejected out of hand,
>> just because the people whose thread you hijacked are angry with
>> you.
>>
>> So please observe the thread split, thanks.
>>
>>> Well, if it would get rejected if it looked like Oracle, assuming
>>> you would probably be one of the people rejecting it, what do you
>>> envision as not being rejected?
>>
>> Something "better than Oracle".
>>
>> Since you're the one who wants hints, that's kind of up to you to
>> define.  Write a specification and make a proposal.
>
> What is the point of writing a proposal if there is a threat of
> "will be rejected" if one of the people who would do the rejection
> doesn't at least outline what would be acceptable?

If your proposal is merely "let's do something like Oracle," it should
be obvious why that would be rejected.  There is considerable legal
danger to slavish emulation.  Further, since PostgreSQL isn't Oracle,
slavish emulation wouldn't work anyways.

If a proposal is too fuzzy to be considered a source of a
specification, it should be obvious that that would be rejected.

If you have an idea clear enough to turn into a meaningful proposal,
put it in for the usual "to and fro"; that generally leads to enormous
improvements.

I'm not sure what a good hinting system ought to look like; what I
*do* know is that a fuzzy proposal won't be much good.
-- 
(format nil "~S@~S" "cbbrowne" "acm.org")
http://linuxfinances.info/info/postgresql.html
The quickest way to a man's heart is through his chest, with an axe. 


Re: Hints WAS: Index Tuning Features

From
Tom Lane
Date:
"Mark Woodward" <pgsql@mohawksoft.com> writes:
> What is the point of writing a proposal if there is a threat of "will be
> rejected" if one of the people who would do the rejection doesn't at least
> outline what would be acceptable?

FWIW, I said some things about what I'd consider a good design in that
other hints thread on pgsql-performance.
        regards, tom lane


Re: Index Tuning Features

From
Florian Weimer
Date:
* Andrew Sullivan:

> Just because I'm one of those statistics true believers, what sort of
> information do you think it is possible for the DBA to take into
> consideration, when building a hint, that could not in principle be
> gathered efficiently by a statistics system?

Some statistics are very hard to gather from a sample, e.g. the number
of distinct values in a column.

-- 
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Durlacher Allee 47            tel: +49-721-96201-1
D-76131 Karlsruhe             fax: +49-721-96201-99


Re: Hints WAS: Index Tuning Features

From
Simon Riggs
Date:
On Wed, 2006-10-11 at 19:18 -0400, Mark Woodward wrote:
> >
> > Since you're the one who wants hints, that's kind of up to you to define.
> > Write a specification and make a proposal.
> >
> 
> What is the point of writing a proposal if there is a threat of "will be
> rejected" if one of the people who would do the rejection doesn't at least
> outline what would be acceptable?

The general theme of other discussions has been that the best approach
is to provide additional information in a general declarative form.
Further details on that have not yet been proposed.

A hint touches a single SQL statement, so decorating 1000s of statements
with exact tips about what to do is both time consuming and eventually
inaccurate. Yet after all that work, the planner still doesn't know why
you thought the hint was the right thing to do and so the 1001st query
will perform poorly.

AFAICS hints are a legacy code compatibility issue, not something truly
desirable in the long run. Once you introduce them you must honour them
across 10+ years of releases and then you remove any chance of improved
optimisations speeding up applications in the future. Support for such
tricks is possibly a different issue from encouraging their use; if we
did support them I would welcome the day when enable_hints = off is the
default and would discourage their general use where possible.

We may be following other products in some ways, so that gives us an
opportunity to learn from both the useful lessons and the mistakes.
Deciding which is which is the hard part, IMHO.

The *right* place, IMHO, for planner information is to decorate the
tables, columns and relationships so that *every* SQL statement can pick
that up. If the world changes, you make one change and all your SQL
benefits. As the analyzers improve, you may be able to just remove those
declarations entirely but generally I imagine the DB designer will for
many years know things that cannot be determined by an analyzer.

Some might say this is a EndUserDeveloper v DBA v InternalsHacker issue
and I might agree, but would side with the DBAs on this.

I'm not aware of any research specifically in that area - though I know
many proposals have been made for various kinds of learning optimizer.
Thats dandy, but you'll still need an infrastructure to support what has
been learned and use it to override the more general analyzer info. So a
manual declarative approach seems like the first step in that direction.

So, I'm interested to hear various possible declarative approaches and
will assist where I can with that. /*+ we might be able to use some
special functions to do this, rather than more SQL */

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Hints WAS: Index Tuning Features

From
"Mark Woodward"
Date:
> Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into
> her beard:
>> What is the point of writing a proposal if there is a threat of
>> "will be rejected" if one of the people who would do the rejection
>> doesn't at least outline what would be acceptable?
>
> If your proposal is merely "let's do something like Oracle," it should
> be obvious why that would be rejected.  There is considerable legal
> danger to slavish emulation.  Further, since PostgreSQL isn't Oracle,
> slavish emulation wouldn't work anyways.

I don't actually like Oracle's hinting system.
>
> If a proposal is too fuzzy to be considered a source of a
> specification, it should be obvious that that would be rejected.

Well, "fuzzy" isn't a bad starting place to start gathering information
for an eventual  proposal.

>
> If you have an idea clear enough to turn into a meaningful proposal,
> put it in for the usual "to and fro"; that generally leads to enormous
> improvements.

Absolutely.

>
> I'm not sure what a good hinting system ought to look like; what I
> *do* know is that a fuzzy proposal won't be much good.

That is sort of the stopping block. None of us "know" what it should look
like, but leaving the topic as "if you want it, go do the work and submit
a patch." Isn't going to get it done.

First we should decide if it is, in fact, something that ought to happen,
then if that happens, we should think about what it should be.

Again, what would be the point of writing a proposal if there is *no*
concensus on what would be acceptible?


Hints (was: Index Tuning Features)

From
Andrew Sullivan
Date:
On Wed, Oct 11, 2006 at 03:08:42PM -0700, Ron Mayer wrote:
> Is one example is the table of addresses clustered by zip-code
> and indexes on State, City, County, etc?

No.

> Now I'm not saying that a more advanced statistics system
> couldn't one-day be written that sees these patterns in the
> data -- but it doesn't seem likely in the near term.  DBA-based
> hints could be a useful interim work-around.

Some others in the hints thread seem to be suggesting additional ways
of teaching the optimiser what to do.  _That_ seems to me to be a
good idea (but I don't think that qualifies as what people usually
think of as hints).  A sufficiently general system of hints sprinkled
on the SQL is a lot of work, and doesn't seem to me to be a whole lot
easier than working out how to make second-order relationship
discovery (of the sort you're talking about) cheaper and automatic. 
Certainly, there's plenty of statistics math kicking around that
allows one to discover such relationships, and they have the benefit
of not being by definition a way to work around the optimiser.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."    --Damien Katz


Hints (Was: Index Tuning Features)

From
Andrew Sullivan
Date:
On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:
> 
> Some statistics are very hard to gather from a sample, e.g. the number
> of distinct values in a column.

Then how can the DBA know it, either?  The problem with this sort of
argument is always that people are claiming some special knowledge is
available to the DBA.  If it's true that the DBA really _can_ know
this stuff, then there must be some way to learn it.  Which means
that you can, in principle, figure out ways to communicate that to
the optimizer.

I like the suggestion, though, that there be ways to codify known
relationships in the system in such a way that the optimizer can
learn to use that information.  _That_ seems to me to be a big
improvement, because it can be taken into consideration along with
relationships that emerge from the statistics, that the DBA may not
know about.


A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.                --Brad Holland


Re: Hints WAS: Index Tuning Features

From
Greg Stark
Date:
Simon Riggs <simon@2ndquadrant.com> writes:

> The *right* place, IMHO, for planner information is to decorate the
> tables, columns and relationships so that *every* SQL statement can pick
> that up. If the world changes, you make one change and all your SQL
> benefits. As the analyzers improve, you may be able to just remove those
> declarations entirely but generally I imagine the DB designer will for
> many years know things that cannot be determined by an analyzer.

Not to say this isn't a good idea -- i think it's a great idea. But note that
it doesn't solve some of the use cases of hints. Consider something like:
 WHERE NOT radius_authenticate(suspected_hacker)

or
 WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt)

There's no way you can decorate the radius_authenticate or verify_pk_signature
functions with any hint that would know when you're using it on a value you
expect it to fail or succeed on. In some cases you pass data you expect to
succeed 99.9% of the time and in others data you expect to fail. Only the
author of the query knows what kind of value he's passing and how selective
the resulting expression is.

And while people seem to be worried about OLTP queries this is one area where
I actually think of DSS queries first. OLTP queries run usually relatively
simple and get optimized well. Also OLTP queries only have to be "fast
enough", not optimal. So the planner usually does anm adequate job.

DSS queries are often dozens of lines of plan -- this is where enable_* is
insufficient to test the query and it's where the planner often goes wrong.
And it's where an incremental speed difference can make a big difference with
a report that takes 8 hours or 4 hours. Often these queries are ad-hoc queries
that never will be run again anyways.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Hints WAS: Index Tuning Features

From
Martijn van Oosterhout
Date:
On Thu, Oct 12, 2006 at 08:50:04AM -0400, Greg Stark wrote:
> Not to say this isn't a good idea -- i think it's a great idea. But note that
> it doesn't solve some of the use cases of hints. Consider something like:
>
>   WHERE NOT radius_authenticate(suspected_hacker)
>
> or
>
>   WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt)

We currently construct histograms for data in columns, there's no
particular reason why we can't do the same for functions. In a similar
vein, I don't see a reason why you couldn't enable a stats-gathering
mode where function calls would be instrumented to collect information
about:

- time of execution
- distribution of outputs

Which could then be used by the planner. Or more directly:

CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent)
AS ( true = 99, false = 1 );

(Perhaps DECLARE is the better phrase?).

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Hints WAS: Index Tuning Features

From
Ron Mayer
Date:
Mark Woodward wrote:
> 
> Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great
> system, and while I completely respect the individuals involved, I think
> the "management" for lack of a better term, is difficult.

'course you're welcome to fork the project as well if your style
and/or priorities are different than the postgresql core team's.

If your approach is that much less frustrating, your project
would gain that much more momentum from developers joining you.

If more developers like your style and/or priorities, they'll
migrate to your project.  I think Bizgres, Mammoth, EnterpriseDB
and RedHat DB and Gentoo's-occasional-bizzaro-patches are both proofs
that it can work as well as proofs that it's difficult.


Re: Hints WAS: Index Tuning Features

From
Simon Riggs
Date:
On Thu, 2006-10-12 at 15:06 +0200, Martijn van Oosterhout wrote:
> On Thu, Oct 12, 2006 at 08:50:04AM -0400, Greg Stark wrote:
> > Not to say this isn't a good idea -- i think it's a great idea. But note that
> > it doesn't solve some of the use cases of hints. Consider something like:
> > 
> >   WHERE NOT radius_authenticate(suspected_hacker)
> > 
> > or
> > 
> >   WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt)
> 
> We currently construct histograms for data in columns, there's no
> particular reason why we can't do the same for functions. In a similar
> vein, I don't see a reason why you couldn't enable a stats-gathering
> mode where function calls would be instrumented to collect information
> about:
> 
> - time of execution
> - distribution of outputs
> 
> Which could then be used by the planner. Or more directly:
> 
> CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent) 
> AS ( true = 99, false = 1 );
> 
> (Perhaps DECLARE is the better phrase?).

The CREATE OPERATOR command already has a RESTRICT=res_proc clause which
provides the ability to attach selectivity functions onto an operator.

So this is already possible if you turn radius_authenticate() into an
operator. The function parameters are passed to the selectivity
function, so you can use that to steer the selectivity.

Perhaps this should be allowed on the CREATE FUNCTION command when a
procedure returns boolean.

Greg is right though, there are some times when the default selectivity
won't match what we know to be the case. His example of a function which
might normally be expected to return 99.9% true being used to evaluate a
list of suspected attempts where the return might well be 20% true is a
good one.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Re: Hints WAS: Index Tuning Features

From
Martijn van Oosterhout
Date:
On Thu, Oct 12, 2006 at 02:25:29PM +0100, Simon Riggs wrote:
> The CREATE OPERATOR command already has a RESTRICT=res_proc clause which
> provides the ability to attach selectivity functions onto an operator.
>
> So this is already possible if you turn radius_authenticate() into an
> operator. The function parameters are passed to the selectivity
> function, so you can use that to steer the selectivity.
>
> Perhaps this should be allowed on the CREATE FUNCTION command when a
> procedure returns boolean.

Why limit it to booleans? For many functions you can get a reasonable
estimate of the resulting data by feeding the keys of the histogram
through the function.

If you know how the data in "field" is distributed, you can take a good
guess at the distribution of upper(field).

> Greg is right though, there are some times when the default selectivity
> won't match what we know to be the case. His example of a function which
> might normally be expected to return 99.9% true being used to evaluate a
> list of suspected attempts where the return might well be 20% true is a
> good one.

In the extreme case you could drop the histogram in a transaction, but
I can see use-case for declaring a histogram for the current session
only, or even having profile to select from. I don't think annotating
the query itself is a particularly good idea.

The hard part is stoing the histograms and getting the planner to use
them, once that happens the really is trivial.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Hints WAS: Index Tuning Features

From
Josh Berkus
Date:
Mark,

> That is sort of the stopping block. None of us "know" what it should look
> like, but leaving the topic as "if you want it, go do the work and submit
> a patch." Isn't going to get it done.
> 
> First we should decide if it is, in fact, something that ought to happen,
> then if that happens, we should think about what it should be.

Well, that's what the *rest* of us are doing on the two threads ... 
here, and "Simple Join Optimized Badly" on performance.  You're the only 
one who seems to want others to do the specification work for him. 
Start making suggestions, and stop criticizing the process.

And, to give you a starting point: the discussion has morphed into: 
"What manual ways can we come up with for the DBA to influence the 
planner and fix planner "bugs" which won't have the fragility of 
query-based hints ala Oracle?"


--Josh Berkus




Re: Hints WAS: Index Tuning Features

From
Greg Stark
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> Which could then be used by the planner. Or more directly:
> 
> CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent) 
> AS ( true = 99, false = 1 );
> 
> (Perhaps DECLARE is the better phrase?).

Except that the distribution is a property of the values you're passing it,
not the function itself. In theory verify_pk_signature() returns false for
99.999...% of its inputs. But of course unless you have a black hat or
hardware problems you're not going to ever pass it any input that makes it
return false. 

The query may be a routine session cookie check where it will virtually always
return true, or it may be a DBA running an ad-hoc query to check suspicious
records for invalid data. It may even be the same query from the same object
method being called from different call sites in the application.

I'm not saying the above isn't a good idea though. I rather like it actually.
But the point of my example originally was specifically to show how at least
sometimes the *only* place the knowledge of the data distribution lies is in
the programmer's head.

-- 
greg



Re: Hints (Was: Index Tuning Features)

From
Casey Duncan
Date:
On Oct 12, 2006, at 4:26 AM, Andrew Sullivan wrote:

> On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:
>>
>> Some statistics are very hard to gather from a sample, e.g. the  
>> number
>> of distinct values in a column.
>
> Then how can the DBA know it, either?  The problem with this sort of
> argument is always that people are claiming some special knowledge is
> available to the DBA.  If it's true that the DBA really _can_ know
> this stuff, then there must be some way to learn it.  Which means
> that you can, in principle, figure out ways to communicate that to
> the optimizer.

Yes, but it may be much more efficient for the human to tell the  
computer than for the computer to introspect things. Take, for  
example, ndisinct as data grows large. I, the database designer, may  
know (or simply see) that a certain foreign key column will have  
roughly a certain cardinality regardless of how big the table gets.  
It's a lot more efficient for me to tell the system that up front  
then have it need to do a full table scan or tens of millions of rows  
periodically to figure it out, or worse--as it is currently--to come  
up with an estimate that is multiple orders of magnitude off, even  
with the stats target turned all the way up.

I realize that this is a case that is possible to do manually now,  
sort of. I can tweak the stats table myself. But it would be nice if  
you could do it in such a way that it would override what analyze  
comes up with on a case-by-case basis.

We could have a perfect query planner, but feed it bad stats and it  
will still make poor decisions.

I'm of the strong opinion that hinting the data is much better than  
hinting the queries. There tends to be many fewer places you need to  
do that, and new queries can automatically take advantage.

> I like the suggestion, though, that there be ways to codify known
> relationships in the system in such a way that the optimizer can
> learn to use that information.  _That_ seems to me to be a big
> improvement, because it can be taken into consideration along with
> relationships that emerge from the statistics, that the DBA may not
> know about.

I'm all for things the computer can do for me automagically. It's  
just good to have the ability to tell the computer about things you  
know about the data that it either can't efficiently figure out or  
can't figure out at all.

-Casey



Re: Hints (Was: Index Tuning Features)

From
Tom Lane
Date:
Casey Duncan <casey@pandora.com> writes:
> Yes, but it may be much more efficient for the human to tell the  
> computer than for the computer to introspect things. Take, for  
> example, ndisinct as data grows large.

Yeah, an override estimate for a column's ndistinct seems a perfect
example of the sort of statistical hint that I'd be in favor of having.
We have also talked about solving the multi-column statistics problem
(which, at its core, is "which combinations of columns are worth
accumulating stats for?" --- you can't possibly store stats for every
combination!) by having what would amount to hints from the DBA saying
"keep stats for these combinations".
        regards, tom lane


Re: Hints (Was: Index Tuning Features)

From
"Andrew Dunstan"
Date:
Tom Lane wrote:

> We have also talked about solving the multi-column statistics problem
> (which, at its core, is "which combinations of columns are worth
> accumulating stats for?" --- you can't possibly store stats for every
> combination!) by having what would amount to hints from the DBA saying
> "keep stats for these combinations".
>

This strikes me intuitively as the most likely candidate so far for
improvement. I'm much more interested in schemes that will improve the
stats system, rather than providing a way around it.

cheers

andrew



Re: Hints WAS: Index Tuning Features

From
"Zeugswetter Andreas ADI SD"
Date:
> And, to give you a starting point: the discussion has morphed into:
> "What manual ways can we come up with for the DBA to
> influence the planner and fix planner "bugs" which won't have
> the fragility of query-based hints ala Oracle?"

I see a few downsides though:

1. it lacks a way to try out different plans in one sessionwithout actually influencing other sessions.
Maybe the way to do this is:begin work;add statistics hintsexplain analyze your selectrollback work;
this is not enough to try a whole program with a new manual correction
though.

2. To try out sensible plans would only work if there where ways that
allow all
nodetypes (on specific tables/where clauses) to be made more/less
expensive individually.

3. the manual entry may negatively influence other queries that
previously were fastthus a typical tuning measure for one bad query plan imposes
high risk

4. sometimes I know as a developer that certain queries only run in case
of
abnormal skew in data (e.g. I expect to get a lot of rows even if
usually i would get few)

Andreas


Re: Hints (Was: Index Tuning Features)

From
Arturo Perez
Date:
In article <20061012112647.GD28443@phlogiston.dyndns.org>,ajs@crankycanuck.ca (Andrew Sullivan) wrote:

> On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote:
> > 
> > Some statistics are very hard to gather from a sample, e.g. the number
> > of distinct values in a column.
> 

> I like the suggestion, though, that there be ways to codify known
> relationships in the system in such a way that the optimizer can
> learn to use that information.  

Since there is already a genetic-algorithm based optimizer, is there any 
way to use that to gather information to improve statistics?

For example, put the GA optimizer into a mode where it tries some of the 
plans it comes up with and collects data on how they perform?

-arturo


Re: Hints (Was: Index Tuning Features)

From
Josh Berkus
Date:
Tom,

> We have also talked about solving the multi-column statistics problem
> (which, at its core, is "which combinations of columns are worth
> accumulating stats for?" --- you can't possibly store stats for every
> combination!) by having what would amount to hints from the DBA saying
> "keep stats for these combinations".

We could start just by keeping stats for multiple columns which are indexed 
together.  That doesnt' cover everything (thanks to bitmapping) but would 
cover many cases.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco