Re: Materialized views WIP patch - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Materialized views WIP patch
Date
Msg-id CA+U5nMKSrF=j54fAns4pYQsrszK7L=Vkbqt2xxVT5pc738qegw@mail.gmail.com
Whole thread Raw
In response to Re: Materialized views WIP patch  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Materialized views WIP patch  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 5 March 2013 12:15, Kevin Grittner <kgrittn@ymail.com> wrote:

> I don't think I disagree with any of what Simon says other than his
> feelings about the planning cost.  Imagine that there are ten MVs
> that might apply to a complex query, but some of them are mutually
> exclusive, so there are a large number of permutations of MVs which
> could be used to replace parts of the original query.  And maybe
> some of base tables have indexes which could reduce execution cost
> which aren't present in some or all of the MVs.  And each MV has a
> number of indexes.  The combinatorial explosion of possible plans
> would make it hard to constrain plan time without resorting to some
> crude rules about what to choose.  That's not an unsolvable
> problem, but I see it as a pretty big problem.

If we are proposing that we shouldn't try to optimise because its not
usefully solvable, then I would disagree.

If we are saying that more plans are possible with MVs, then I'd say,
yes there *could* be - that's the one of the purposes. That represents
more options for optimisation and we should be happy, not sad about
that. Yes, we would need some thought to how those potential
optimisations can be applied without additional planning cost, but I
see that as a long term task, not a problem. The question is will the
potential for additional planning time actually materialise into a
planning problem? (See below).

> I have no doubt that someone could take a big data warehouse and
> add one or two MVs and show a dramatic improvement in the run time
> of a query.  Almost as big as if the query were rewritten to usee
> the MV directly.  It would make for a very nice presentation, and
> as long as they are used sparingly this could be a useful tool for
> a data warehouse environment which is playing with alternative ways
> to optimize slow queries which pass a lot of data.  In other
> environments, I feel that it gets a lot harder to show a big win.

Are there realistically going to be more options to consider? In
practice, no, because in most cases we won't be considering both MVs
and indexes.

Splatting MVs on randomly won't show any more improvement than
splatting indexes on randomly helps. Specific optimisations help in
specific cases only. And of course, adding extra data structures that
have no value certainly does increase planning time. Presumably we'd
need some way of seeing how frequently MVs were picked, so we could
drop unused MVs just like we can indexes.

* Indexes are great at speeding up various kinds of search queries. If
you don't have any queries like that, they help very little.

* MVs help in specific and restricted use cases, but can otherwise be
thought of as a new kind of index structure. MVs help with joins and
aggregations, so if you don't do much of that, you'll see no benefit.

That knowledge also allows us to develop heuristics for sane
optimisation. If the MV has a GROUP BY in it, and a query doesn't,
then it probably won't help much to improve query times. If it
involves a join you aren't using, then that won't help either. My
first guess would be that we don't even bother looking for MV plans
unless it has an aggregated result, or a large scale join. We do
something similar when we look for plans that use indexes when we have
appropriate quals - no quals, no indexes.

As a result, I don't see MVs increasing planning times for requests
that would make little or no use of them. There will be more planning
time on queries that could make use of them and that is good because
we really care about that.

Sure, a badly written planner might cost more time than it saves. All
of this work requires investment from someone with the time and/or
experience to make a good go at it. I'm not pushing Tom towards it,
nor anyone else, but I do want to see the door kept open for someone
to do this when possible (i.e. not GSoC).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [GENERAL] Floating point error
Next
From: Simon Riggs
Date:
Subject: Re: Suggested new CF status: "Pending Discussion"