Re: Index Tuning Features - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Index Tuning Features
Date
Msg-id 1160559701.2659.1114.camel@holly
Whole thread Raw
In response to Index Tuning Features  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Index Tuning Features  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-hackers
Thanks everybody for comments so far; this will be a useful discussion.

On Tue, 2006-10-10 at 18:56 -0400, Robert Treat wrote: 
> On Tuesday 10 October 2006 12:06, Tom Lane wrote:
> > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > > list of indexes that need to be added to get the cheapest plan for a
> > > particular query (no explain plan result though).
> >
> > Both of these seem to assume that EXPLAIN results, without EXPLAIN
> > ANALYZE results to back them up, are sufficient for tuning.  I find
> > this idea a bit dubious, particularly for cases of "marginal" indexes.
> >
> 
> While I agree with Tom that generally EXPLAIN is not enough for tuning, I also 
> know that when your dealing with queries that have run times in multiples of 
> hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just 
> isn't an option.  Anything that can be done to wheedle down your choices 
> before you have to run EXPLAIN ANALYZE is a bonus. 

IMHO you need EXPLAIN, EXPLAIN ANALYZE and RECOMMEND

As Robert points out, using EA can make tuning take a long time and that
is the critical factor when you have a whole database/app to tune. 

This discussion helps me to make explicit what my thoughts had been on
what an ideal index tuning process is:

1. Recommendation: Use RECOMMEND to get an 80/20 setting for a
statement. As Peter suggests a "user-space" tool, I also imagine a tool
that would automatically run RECOMMEND on all SQL statements in a
workload and come up with proposals for additional indexes. We would
have a first cut index design in minutes rather than days.

2. Evaluation: We can then create the potential indexes as Virtual ones
and then re-run EXPLAINs to model how a whole workload would behave. We
can begin to prune low-impact indexes out of the mix at this stage.
Again, this can be done automatically.

3. Implementation: We re-create the new indexes as real indexes (perhaps
concurrently)

4. Correction: We then run the workload and then use existing tools to
spot the statements causing the most problems and manually assess them
using EXPLAIN ANALYZE. Manually postulate new Virtual indexes and
re-model the workload again as (2)

Steps (3) and (4) have both been improved for 8.2. Steps (1) and (2) are
completely new steps for 8.3

The above process can be performed without tool support, but its clear
that further automation will help greatly here. I foresee that the
development of both server-side and tools will take more than one
release. Discussion of tool support can begin once we have agreed
server-side capability.


With that as a backdrop, further comments are:

On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: 
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > Anything that can be done to wheedle down your choices 
> > before you have to run EXPLAIN ANALYZE is a bonus. 
> 
> Fair enough, but I prefer Peter's suggestion of attaching the
> hypothetical index definitions to EXPLAIN itself, rather than making
> bogus catalog entries.  Something along the line of
> 
> EXPLAIN <statement>
> ASSUMING INDEX fooi ON foo ....
> [ ASSUMING INDEX ... ]

I do like this, though for step (2) above we would need to attach the
appropriate indexes to each of the SQL statements prior to execution.
Doing this for a single SQL statement is fine, but doing that for a
whole workload of 1000s of statements is not very practical, hence an
externally declarative approach seems better.

I can imagine many other declarative approaches other than the one I
proposed; it just seems pretty neat to me to use almost exactly the same
syntax for a virtual index as for a real index. As I mentioned, ideally
this would not be a full-strength catalog object, but I was thinking
towards implementation also. Another possibility would be to use a local
pg_virtual_indexes table.



On Tue, 2006-10-10 at 18:06 +0200, Peter Eisentraut wrote: 
> Simon Riggs wrote:
> 
> > - RECOMMEND command
> >
> > Similar in usage to an EXPLAIN, the RECOMMEND command would return a
> > list of indexes that need to be added to get the cheapest plan for a
> > particular query (no explain plan result though).
> 
> This functionality also seems useful, but maybe it should be the job of 
> a user-space tool?

So from above, Yes, I see a user-space tool also, but not instead.

The RECOMMEND command is the minimal server functionality required to
enable an (external) automated tuning support tool to be developed.

Possible architectures for this functionality include both user-space
and server-space options. Much thinking has been done on this in the DB
research community, with the general consensus being its easier to
extend the planner to cope with postulation that it is to create an
external postulation tool that acts (accurately) like the planner.

"DB2 advisor: An optimizer smart enough to recommend its own indexes."
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohnman, and
Alan Skelley. 
In The 16th International Conference on Data Engineering (ICDE'00), San
Diego, CA. IEEE Computer Society, February 2000.

A wonderful summary of which is available here, but not sure if the full
paper is publicly available for free.
http://www.andrew.cmu.edu/user/ngm/15-823/summaries/08.pdf

DB2 Design Advisor: Integrated Automatic Physical Database Design

"DB2 Design Advisor: Integrated Automatic Physical Database Design"
Zilio et al
which is available at 
http://www.vldb.org/conf/2004/IND4P1.PDF#search=%22db2%20design%
20advisor%22


On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: 
> > Specifically, multi-column indexes are not considered very heavily in
> > RECOMMEND.
> 
> That seems like a bad idea as well --- multicol indexes are exactly the
> sort of thing a novice DBA might fail to consider.  If you're going to
> do this then you should consider all cases.

Calculating all index cases would follow the combinatorial explosion of
sum(N!/(r!(N-r)!)) though we can argue about exactly what N is in this
case. So we have the same problem as the main optimiser: exhaustive
search is not practical, so we must find a heuristic that allows us to
limit the search space so RECOMMEND doesn't run for too long.

The "no multi-col indexes except FKs" is just a proposed heuristic, so
happy to debate exactly what that heuristic should be. (There are
various research papers available with proposed heuristics). 

Multi-col indexes are also subject to over-fitting, since RECOMMEND
would be liable to return (for example) 7-column indexes as the best
choice for a single query, which would be bad overall. 

I'd been thinking about this for some time: the virtual index concept
fills in the gaps so that taken together, RECOMMEND and virtual indexes
provide a reasonable toolset for both limiting search space and yet
allowing more complex ideas to be tested.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Zeugswetter Andreas ADI SD"
Date:
Subject: Re: Index Tuning Features
Next
From: Csaba Nagy
Date:
Subject: Re: Index Tuning Features